OLake (v0.2.5 - v0.2.7)
September 20 – October 11, 2025
🎯 What's New
Sources
-
PgOutput Plugin Implementation -
Implemented PgOutput plugin for PostgreSQL CDC operations, delivering faster change data capture performance. Enhanced normalization logic with concurrency support and integrated batch variable processing from command line parameters for optimized throughput and resource utilization. -
MongoDB IAM Authentication Support -
Enables AWS IAM–based authentication for MongoDB connections. WhenIamUser=true
and machine-level IAM configurations are in place, users no longer need to provide a username and password for connecting OLake to Mongo DB Source.
Platform Features
- Documentation Link Validation Workflow -
Added automated GitHub workflow to validate documentation links in pull requests for new features. This workflow automatically checks that all documentation links are valid and accessible, preventing broken links from being merged and ensuring documentation quality across new feature releases.
🔧 Bug Fixes & Stability
-
gRPC port binding retry mechanism -
Added backoff retry mechanism to resolve intermittent java.io.IOException: Failed to bind to address 0.0.0.0/0.0.0.0:50080 errors when creating the Iceberg writer gRPC server. This fix addresses port binding conflicts that occur when the port remains occupied during golang dialTimeout operations, improving connection reliability and reducing startup failures. -
MySQL CDC binlog position tracking -
Enhanced state cursor handling for MySQL CDC sync to include both start and end binlog positions, ensuring the cursor captures the full range of positions. -
MongoDB CDC for Sharded Clusters -
Enhanced PBRT (Post-Batch Resume Token) handling to resolve resume token inconsistencies across shards. This update ensures that after a failover or shard migration, CDC resume tokens are correctly retrieved and applied for each shard’s stream, preventing data gaps or duplicate event processing when resuming change streams. -
MongoDB _id Multiple Types Detection Fix -
Fixed sync failures in collections where the _id field exists in more than one data type. OLake now accurately detects each _id type and applies a unified handling approach to prevent type mismatch errors during synchronization. -
MongoDB bucketAuto Strategy Disk Usage Fix -
Fixed MongoDB syncs failing on collections with string-type _id fields when sort operations exceed server memory limits. The fix enables allowDiskUse=true for the bucketAuto partitioning strategy, ensuring large collections can be synced reliably by allowing MongoDB to spill sort operations to disk when memory is exhausted. -
Oracle SCN Dependency Removal and Cursor Consistency Fix -
Removed SCN dependency from the Oracle driver to avoid reliance on short-lived SCNs and reduce database load. Ensured stable cursor handling by determining the initial maximum cursor value before chunking, preventing skipped records during incremental sync. -
Parquet Writer Default Fields Handling Fix -
Fixed an issue where default fields were omitted in the Parquet writer when normalization was disabled, ensuring all required fields are included in output files. -
MySQL GlobalState Variable Shadowing and Server ID Range Fix -
Fixed a variable shadowing issue in MySQLGlobalState that could cause inconsistent state handling during binlog syncs, and expanded server ID generation to use the full uint32 range (1000–4294967295) for better distribution and MySQL compatibility.