Olake (v0.1.0 – v0.1.1)
June 13 – June 18, 2025
🎯 What's New
Sources
- New Source Introduction -
Introduced MongoDB, Postgres and MySQL as new source connectors for OLake.
Destinations
-
New Writer Introduction -
Introduced Iceberg Writer and Parquet Writer as new destinations for OLake. -
Parquet Writer Enhancement -
Implemented Parquet writer to write to both local storage and S3. Added folder partitioning in Parquet writer.
Platform Features
-
Driver Releaser -
Launches the OLake Driver Releaser tool for packaging and distributing OLake connectors, making driver updates seamless across environments. -
Strict CDC Sync Mode -
Adds a new mode that applies only change events and skips any full‑refresh backfill during syncs, guaranteeing CDC‑only behaviour. This reduces load on sources/targets and avoids accidental re‑snapshots in continuous pipelines. -
Discover with Merge -
Schema discovery now merges results into an existing streams.json so prior selections and settings are preserved while new streams are added. This minimizes manual edits when onboarding new tables or evolving schemas. -
Catalog Restructure & Autosave -
Restructures catalog/state file layout for clarity and durability, and autosaves after key operations to prevent metadata loss. -
Normalization Option -
Introduces a writer setting to choose between normalized (atomic typed columns) or denormalized (nested JSON) output formats, giving flexibility for downstream consumers.
🔧 Bug Fixes & Stability
-
Revert MongoDB Changes -
Rolls back previously unstable MongoDB updates to restore predictable behavior while new fixes incubate. -
Writer Fixes -
Resolves core writer issues around buffering, retries, and error propagation to improve durability and end‑to‑end write stability. -
MongoDB Sync Fixes -
1) Resumable full-load support so interrupted imports restart where they left off.
2) Exponential backoff on read error.
3) Split-vector chunking strategy for large collections.
4) Correct delete-record handling in CDC.
5) Enforce username in connection URI.
6) Backoff logic in chunk-splitting for throttled sources.
Together these improve large‑collection ingest performance and robustness when sources throttle or network hiccups occur. -
Iceberg Writer Fixes -
Includes upsert and timestamp compatibility for Spark, hotfixes, an Avro→Parquet vulnerability patch, clearer logging colors, partitioning logic corrections, and schema de‑duplication. These changes reduce read‑time errors in Spark and ensure cleaner, consistent Iceberg tables and logs. -
Postgres Fixes -
Standardizes destination folder naming and improves type conversion for Postgres types while fixing replication‑slot/dependency handling for CDC reliability.