Show HN: MyDuck Server – Supercharge MySQL and Postgres Analytics with DuckDB

github.com

32 points · fanyang01 · 9 days ago

Hello HN!

We're excited to announce MyDuck Server, an open-source project that seamlessly integrates the analytical power of DuckDB with your existing MySQL & Postgres databases.

*Backstory*

Currently, there are no fully satisfactory open-source OLAP solutions for MySQL & Postgres. In the MySQL ecosystem, HeatWave offers close integration, but it's a proprietary, commercial product from Oracle. The Postgres community has seen promising DuckDB-based extensions emerge, including the official pg_duckdb. However, extensions can introduce isolation concerns in mission-critical environments.

Consequently, many organizations resort to setting up complex and costly data movement pipelines using tools like Debezium, Flink, or other commercial solutions to replicate data from MySQL & Postgres to OLAP systems (e.g., Snowflake, BigQuery, ClickHouse) or Lakehouses (e.g., Delta Lake + Spark). This approach introduces significant operational overhead and expense.

Another emerging strategy is the zero-ETL approach, increasingly advocated by cloud providers. This model simplifies data integration by allowing the cloud provider to manage ETL pipelines, while necessitating reliance on specific cloud ecosystems and services.

*Key features*

MyDuck Server offers a real-time analytical replica that leverages DuckDB's native columnar storage and processing capabilities. It operates as a separate server, ensuring isolation and minimizing impact on your primary database. Key features include:

- Easy Zero-ETL: Built-in real-time replication from MySQL & Postgres with no complex pipelines to manage. It feels like a standard MySQL replica or Postgres standby. With the Docker image, passing a connection string is enough.

- MySQL & Postgres Protocol Compatibility: We take this seriously and are working to make this project integrate well with the existing ecosystem around MySQL & Postgres. Currently, it is already possible to connect to MyDuck with standard MySQL & PostgreSQL clients in many programming languages.

- HTAP Support: A standard database proxy can be deployed in front of a MySQL/Postgres primary and its MyDuck replica to route write operations to the primary and read operations to the replica. It just works.

- DuckDB SQL & Columnar I/O over Postgres Protocol: It's unnecessary to restrict ourselves to MySQL/Postgres's SQL expressiveness and row-oriented data transfer. The Postgres port accepts all DuckDB-valid SQL queries, and you can retrieve query results in columnar format via `COPY (SELECT ...) TO STDOUT (FORMAT parquet/arrow)`.

- Standalone Mode: It does not need to be run as a replica. It can also act as a primary server that brings DuckDB into server mode and accepts updates from multiple connections, breaking DuckDB's single-process limitation.

*Relevant Previous HN Threads*

- pg_duckdb [1] (https://news.ycombinator.com/item?id=41275751) is the official Postgres extension for DuckDB. It uses DuckDB as an execution engine to accelerate analytical queries by scanning Postgres tables directly.

- pg_mooncake [2] (https://news.ycombinator.com/item?id=41998247) is a Postgres extension that adds columnstore tables for PG. It uses pg_duckdb under the hood but stores data in Lakehouse formats (Iceberg & Delta Lake).

- BemiDB [3] (https://news.ycombinator.com/item?id=42078067) is also a DuckDB-based Postgres replica. Unlike us, they focus on storing data in Lakehouse format.

We believe MyDuck Server offers a compelling solution for those seeking high-performance analytics on their MySQL & Postgres data without the complexities and costs of traditional approaches. We're eager to hear your feedback and answer any questions you might have. Let me know what you think!

[0] https://github.com/apecloud/myduckserver

[1] https://github.com/duckdb/pg_duckdb

[2] https://github.com/Mooncake-Labs/pg_mooncake

[3] https://github.com/BemiHQ/BemiDB


6 comments
fanyang01 · 8 days ago
*Update from the Authors:*

We have successfully tested this project with the official Model Context Protocol (MCP) server for Postgres. For those interested in integrating this project with LLMs, you can find the guide here: https://github.com/apecloud/myduckserver/blob/main/docs/tuto...

We were pleasantly surprised that the integration worked seamlessly on our first attempt. Moreover, we observed some fascinating behavior: DuckDB's clear and suggestive error messages proved incredibly helpful for the model. During our testing, the free-tier Claude Haiku model initially hallucinated table and column names, but it impressively corrected them based on DuckDB's suggestions.

This integration highlights the conveniences offered by this project:

- DuckDB’s native SQL support: We can create a table directly from a CSV URL without manually determining the schema.

- Standard Postgres protocol support: The official Postgres MCP server works out-of-the-box.

We’d love to hear your thoughts and questions about LLM + MyDuck integration!

aradox66 · 8 days ago
I recently learned about pg_duckdb and I'm excited about this developing ecosystem.

Can you explain a bit about the difference between how this project works and what pg_duckdb offers?

Show replies

vwagen · 8 days ago
Bridging the gap between TP databases like PostgreSQL and AP database DuckDB is really amazing. Maybe the next step is building a serverless AP service on top of this and the object storage.
robertclaus · 8 days ago
The current system diagram implies using duckdb's default storage format directly. I wonder how well this would actually work with the proposed zero-ETL design of basically treating this as a live replica. I was under the impression that as an OLAP solution DuckDB makes performance compromises when it comes to writes - so wouldn't live replication become problematic?

Show replies