Show HN: Melchi – Open-Source Snowflake to DuckDB Replication with CDC Support https://ift.tt/kyArRO6

Show HN: Melchi – Open-Source Snowflake to DuckDB Replication with CDC Support Hey hacker news! I built Melchi, an open-source tool that handles Snowflake to DuckDB replication with proper CDC support. I'd love your feedback on the approach and potential use cases. *Why I built it:* When I worked at Redshift I saw two common scenarios that were painfully difficult to solve: Teams needed to query and join data from other organizations' Snowflake instances with their own data stored in different warehouse types, or they wanted to experiment with different warehouse technologies but the overhead of building and maintaining data pipelines was too high. With DuckDB's growing popularity for local analytics, I built this to make warehouse-to-warehouse data movement simpler. *How it works:* * Supports three CDC strategies: standard streams (full change tracking), append-only streams (insert-only tables), and full refresh * Handles schema matching and type conversion automatically * Manages all the change tracking metadata * Uses DataFrames for efficient data movement instead of CSV dumps * Provides transactional consistency with automatic rollback on failures * Processes data in configurable batch sizes for memory efficiency Quick setup example: ```yaml source: type: snowflake account: ${SNOWFLAKE_ACCOUNT} warehouse: YOUR_WAREHOUSE change_tracking_schema: streams target: type: duckdb database: output/local.duckdb ``` *Current limitations:* * Geography/Geometry columns not supported with standard streams (Snowflake limitation) * Primary keys must be defined in Snowflake (or a row ID will be auto-generated) * All tables must be replaced together when modifying transfer configuration * Cannot replicate tables with identical schema/column names into DuckDB, even from different Snowflake databases *Questions for the community:* 1. What use cases do you see for this kind of tool? 2. What features would make this more useful for your workflow? 3. Any concerns about the approach to CDC? 4. What other source/target databases would be valuable to support? GitHub: https://ift.tt/HYpTsmR Discord: https://ift.tt/1vUEwSC Looking forward to your thoughts and feedback! https://ift.tt/HYpTsmR November 6, 2024 at 12:01AM

Comments