OLake Go (v0.7.0 - v0.7.3)
April 21, 2026 β May 15, 2026
π― What's Newβ
Sourcesβ
-
MySQL chunking optimisation -
Replaced repeated database lookups during chunk discovery with mathematical range splitting β arithmetic progression for numeric primary keys and Unicode-encoded range splitting for string keys β significantly reducing chunk generation time for large tables while ensuring correct collation-aware ordering. -
SSH tunnel support for DB2 and MSSQL -
Added SSH tunnel configuration for the DB2 and MSSQL drivers. DB2 uses a local TCP proxy onlocalhost:0forwarded through the SSH client (since go_ibm_dbhas no Go-level dial hook), while MSSQL routes connections viago-mssqldb'sConnector.DialerandHostDialerinterfaces with remote-side DNS resolution. -
Schema filtering for PostgreSQL discovery -
Added an optional schemas config field to restrict the discover operation to user-specified PostgreSQL schemas. When omitted, existing behaviour is preserved and all non-system schemas are discovered. -
MSSQL read replica support -
Added optionaljdbc_url_paramsto the MSSQL source so you can target Always On read replicas (for example with read-intent), and updated CDC to use replica-safe paths that avoid primary-only agent/msdb and capture-instance management on secondaries.
Destinationsβ
- Skip equality deletes for CDC inserts post-backfill -
Equality deletes are now skipped for CDC inserts once the backfillβCDC overlap window is complete, reducing unnecessary write overhead. A newdedup_insertsflag on the Icebergolake_2pctable property tracks this β Java sets it totrueon backfill commit, and Go clears it tofalseafter the first successful CDC commit. This applies to both the Arrow and legacy gRPC writers.
π§ Bug Fixes & Stabilityβ
-
Upgrade pgx/v5 to v5.9.2 for security fixes -
Upgradedgithub.com/jackc/pgx/v5fromv5.7.3tov5.9.2to remediate two security vulnerabilities: a critical memory-safety flaw (CVE-2026-33816) that could allow memory corruption and a low-severity SQL injection advisory (GHSA-j88v-2chj-qfwx). No existing functionality is affected by this upgrade. -
Oracle chunk boundary query optimisation -
Replaced N+1 sequential database round trips insplitViaTableIterationwith a singleNTILE-based query to fetch all chunk boundaries at once, with a fallback to the original loop when table stats are unavailable. -
Iceberg positional delete file fix for CDC upserts -
Compaction was failing when multiple changes for the same_olake_idarrived in a single batch, caused by a positional delete file referencing multiple data files. Fixed by creating one positional delete file per data file reference. -
PostgreSQL primary key discovery fix via pg_catalog -
information_schema.key_column_usageincorrectly included foreign key columns as primary keys, causing wrong_olake_idhashes, missed equality deletes, and duplicate rows in Iceberg on CDC upserts. Replaced with apg_catalog-based query that returns only true primary keys and works correctly for read-only roles on managed databases like RDS, Supabase, and Render. -
MySQL CDC charset corruption fix for non-UTF8 columns -
ENUM and string columns using non-UTF8 charsets (utf16,ucs2,latin1) were silently corrupted during CDC due to blind[]byteβstringcasts. Fixed by adding collation-aware decoding usingTableMapEvent.CollationMap()andEnumSetCollationMap(). -
MongoDB primary key pinning for deterministic deduplication -
Previously, all indexed fields were treated as primary keys, so updates to non-unique indexed fields changed the_olake_idand broke Iceberg equality deletes, creating duplicate rows. The primary key is now pinned strictly to MongoDBβs guaranteed-unique_id, ensuring stable hashes and correct deduplicated upserts. -
DB2 driver download fix in integration tests -
DB2 integration tests now reuse the already-installedclidriverby copying it into the workspace, so Docker containers find it locally instead of repeatedly hitting the flaky IBM CDN download path.