#15 of 20 Innovations

Modern Open-Source Databases

PostgreSQL has quietly become one of the most versatile pieces of infrastructure in modern software stacks – not because it beat out rivals on a specific benchmark, but because the extension ecosystem turned it into a platform that handles workloads that used to require five separate databases. pgvector for AI embeddings. TimescaleDB for time-series. PostGIS for geospatial. Citus for horizontal sharding. The result: many organisations run a single PostgreSQL deployment covering workloads that previously would have meant spinning up and operating Elasticsearch, InfluxDB, and a dedicated vector store separately. That operational simplicity has real value, and it’s a big part of why PostgreSQL’s dominance keeps growing rather than plateauing.

Choosing the Right Open-Source Database

PostgreSQL with pgvector + TimescaleDB + PostGIS covers 80% of workloads without adding a new database to your stack.

But PostgreSQL isn’t the answer for every workload – and that’s where the newer generation of open-source databases fills important gaps. ClickHouse is now widely deployed for real-time analytics on event and log data. It ingests millions of rows per second and returns aggregation queries on billions of rows in under a second through columnar storage and vectorised execution. Cloudflare, ByteDance, and Uber all run ClickHouse at massive scale. It’s genuinely fast. But it’s not a good fit for transactional workloads with frequent updates – the architecture isn’t designed for it. DuckDB has become the go-to in-process analytical database for data engineers and scientists: it runs embedded in Python or Node.js with no server, reads Parquet and CSV files directly from disk or S3 without loading them first, and handles serious analytical workloads on a laptop that previously required a cluster. It’s one of those tools that feels too good to be true at first. And Neon adds serverless scaling and instant database branching to PostgreSQL, which makes it genuinely useful for development workflows where you want to clone a production database in seconds for a feature branch test.

Database Selection Decision Tree

Most teams reach for a new database before exhausting what PostgreSQL extensions can do — audit your actual requirements first.

The pattern across all these tools is the same: open-source databases in 2025-2026 are capable enough to replace expensive commercial products that used to be the only option at scale. CockroachDB and YugabyteDB offer PostgreSQL-compatible APIs with automatic geographic distribution for teams that need multi-region strong consistency without managing replication themselves. MongoDB 7.x has improved transaction support and added native time-series collections. Cassandra remains the go-to for write-heavy, geographically distributed wide-column workloads where you’re writing millions of events per second and can’t afford the coordination overhead of strong consistency. Depending on your setup, you may need one of these – or you may find that PostgreSQL with the right extensions covers everything you need without the operational overhead of running a heterogeneous database stack.

Frequently Asked Questions

Why has PostgreSQL become so dominant among open-source databases?

PostgreSQL combines strong ACID compliance, a rich SQL dialect, excellent performance, and the most active extension ecosystem of any open-source database. Extensions like pgvector, TimescaleDB, and PostGIS add specialised capabilities without switching databases. Its licence (similar to MIT) has no commercial restrictions – making it the safe default for any organisation concerned about open-source licence changes like HashiCorp’s 2023 BSL move.

What is ClickHouse and when should you use it?

ClickHouse is an open-source columnar analytical database optimised for very high-throughput inserts and extremely fast aggregation queries on billions of rows. Use it for real-time analytics on event streams, log data, user behaviour, or telemetry at a scale where row-oriented databases like PostgreSQL become too slow. It’s not well suited for transactional workloads with frequent updates or complex joins across normalised schemas.

What makes DuckDB different from other analytical databases?

DuckDB runs in-process – embedded inside your Python, R, Java, or Node.js application with no server to manage. It reads Parquet, CSV, and JSON files directly from disk or object storage without loading them first. Fast enough for serious analytical workloads on a single machine, which makes it perfect for data science exploration, local pipeline testing, and analytical queries in applications that don’t need a full data warehouse.

Should you use a dedicated time-series database or extend PostgreSQL with TimescaleDB?

For most teams, TimescaleDB is the right first choice – it adds time-series capabilities (hypertables, continuous aggregates, compression, retention policies) on top of PostgreSQL you may already operate. You keep the full SQL ecosystem and existing tooling. Dedicated databases like InfluxDB or QuestDB offer better write throughput at extreme scale but add operational complexity. Start with TimescaleDB and migrate only if you hit its limits.

References

← Back to All 20 Innovations

Modern Open-Source Databases

Modern Open-Source Databases

Frequently Asked Questions

Why has PostgreSQL become so dominant among open-source databases?

What is ClickHouse and when should you use it?

What makes DuckDB different from other analytical databases?

Should you use a dedicated time-series database or extend PostgreSQL with TimescaleDB?

Related Articles

References

Quick Links

Contact