OLake (v0.1.6 – v0.1.8)
July 17 – July 30, 2025
🎯 What's New
Sources
-
Incremental Sync: MongoDB and Oracle -
Added incremental synchronisation support for MongoDB and Oracle sources. This adds change‑only replication for both the sources so OLake transfers new/updated documents since the last run, reducing latency and data volume for recurring pipelines. -
Oracle Connector Filter & Chunking -
Added filter support and optimised chunking strategy for the Oracle connector. This ensures query-level filtering and an optimized chunking strategy to the Oracle connector, ensuring only relevant rows are fetched and evenly sized data chunks maximize parallel throughput. -
Oracle multi cursor support for incremental Sync -
OLake incremental sync can now be configured with a primary and secondary cursor column of the same datatype, where the secondary is used only if the primary cursor value is NULL, reducing missed changes in sparse or null‑heavy tables. -
MySQL Binlog Permissions Check -
Automatically validates that the MySQL user has the required binlog privileges before CDC starts, preventing mid‑run failures due to missing permissions. -
Postgres CDC Improvement -
The core improvement ensures that when Postgres CDC detects an LSN position problem requiring a full reload, it properly repositions the replication slot rather than attempting to read from outdated cached WAL data. -
Universal Filter Option -
Offers a consistent filter parameter across all source drivers, letting you apply the same include/exclude rules in MongoDB, Oracle, MySQL, Postgres, and more without driver-specific syntax.
Destinations
-
Clear Destination Flag -
Provides a flag to clear destination datasets before a full refresh, ensuring the target only contains the latest snapshot. This is useful when resetting tables or removing stale records ahead of a new load. -
Added support for custom s3_endpoint in Parquet writer config -
Added an optional s3_endpoint configuration in the Parquet writer allowing users to specify a custom S3-compatible endpoint for writing Parquet files to S3.
🔧 Bug Fixes & Stability
-
Credential Parsing Fix -
Corrects parsing of complex connection strings for Postgres and MongoDB so special characters and URI parameters are handled reliably. This reduces connection errors during job setup and discovery. -
Discovery Cursor Fix -
Fixes merging of cursor fields in the new discover flow so schema and cursor metadata are recorded consistently. This avoids missing or duplicated cursor information when building stream definitions. -
Postgres CDC Reliability -
Improved Postgres CDC behaviour by advancing LSN during full load when cache is invalid.