Show HN: MyDuck Server – Supercharge MySQL and Postgres Analytics with DuckDB
32 points ·
fanyang01
·
We're excited to announce MyDuck Server, an open-source project that seamlessly integrates the analytical power of DuckDB with your existing MySQL & Postgres databases.
*Backstory*
Currently, there are no fully satisfactory open-source OLAP solutions for MySQL & Postgres. In the MySQL ecosystem, HeatWave offers close integration, but it's a proprietary, commercial product from Oracle. The Postgres community has seen promising DuckDB-based extensions emerge, including the official pg_duckdb. However, extensions can introduce isolation concerns in mission-critical environments.
Consequently, many organizations resort to setting up complex and costly data movement pipelines using tools like Debezium, Flink, or other commercial solutions to replicate data from MySQL & Postgres to OLAP systems (e.g., Snowflake, BigQuery, ClickHouse) or Lakehouses (e.g., Delta Lake + Spark). This approach introduces significant operational overhead and expense.
Another emerging strategy is the zero-ETL approach, increasingly advocated by cloud providers. This model simplifies data integration by allowing the cloud provider to manage ETL pipelines, while necessitating reliance on specific cloud ecosystems and services.
*Key features*
MyDuck Server offers a real-time analytical replica that leverages DuckDB's native columnar storage and processing capabilities. It operates as a separate server, ensuring isolation and minimizing impact on your primary database. Key features include:
- Easy Zero-ETL: Built-in real-time replication from MySQL & Postgres with no complex pipelines to manage. It feels like a standard MySQL replica or Postgres standby. With the Docker image, passing a connection string is enough.
- MySQL & Postgres Protocol Compatibility: We take this seriously and are working to make this project integrate well with the existing ecosystem around MySQL & Postgres. Currently, it is already possible to connect to MyDuck with standard MySQL & PostgreSQL clients in many programming languages.
- HTAP Support: A standard database proxy can be deployed in front of a MySQL/Postgres primary and its MyDuck replica to route write operations to the primary and read operations to the replica. It just works.
- DuckDB SQL & Columnar I/O over Postgres Protocol: It's unnecessary to restrict ourselves to MySQL/Postgres's SQL expressiveness and row-oriented data transfer. The Postgres port accepts all DuckDB-valid SQL queries, and you can retrieve query results in columnar format via `COPY (SELECT ...) TO STDOUT (FORMAT parquet/arrow)`.
- Standalone Mode: It does not need to be run as a replica. It can also act as a primary server that brings DuckDB into server mode and accepts updates from multiple connections, breaking DuckDB's single-process limitation.
*Relevant Previous HN Threads*
- pg_duckdb [1] (https://news.ycombinator.com/item?id=41275751) is the official Postgres extension for DuckDB. It uses DuckDB as an execution engine to accelerate analytical queries by scanning Postgres tables directly.
- pg_mooncake [2] (https://news.ycombinator.com/item?id=41998247) is a Postgres extension that adds columnstore tables for PG. It uses pg_duckdb under the hood but stores data in Lakehouse formats (Iceberg & Delta Lake).
- BemiDB [3] (https://news.ycombinator.com/item?id=42078067) is also a DuckDB-based Postgres replica. Unlike us, they focus on storing data in Lakehouse format.
We believe MyDuck Server offers a compelling solution for those seeking high-performance analytics on their MySQL & Postgres data without the complexities and costs of traditional approaches. We're eager to hear your feedback and answer any questions you might have. Let me know what you think!
[0] https://github.com/apecloud/myduckserver
[1] https://github.com/duckdb/pg_duckdb
fanyang01 ·8 days ago
We have successfully tested this project with the official Model Context Protocol (MCP) server for Postgres. For those interested in integrating this project with LLMs, you can find the guide here: https://github.com/apecloud/myduckserver/blob/main/docs/tuto...
We were pleasantly surprised that the integration worked seamlessly on our first attempt. Moreover, we observed some fascinating behavior: DuckDB's clear and suggestive error messages proved incredibly helpful for the model. During our testing, the free-tier Claude Haiku model initially hallucinated table and column names, but it impressively corrected them based on DuckDB's suggestions.
This integration highlights the conveniences offered by this project:
- DuckDB’s native SQL support: We can create a table directly from a CSV URL without manually determining the schema.
- Standard Postgres protocol support: The official Postgres MCP server works out-of-the-box.
We’d love to hear your thoughts and questions about LLM + MyDuck integration!
aradox66 ·8 days ago
Can you explain a bit about the difference between how this project works and what pg_duckdb offers?
Show replies
vwagen ·8 days ago
robertclaus ·8 days ago
Show replies