Skip to main content

OLake Go (v0.7.0 - v0.7.3)

April 21, 2026 – May 15, 2026

🎯 What's New​

Sources​

  1. MySQL chunking optimisation -
    Replaced repeated database lookups during chunk discovery with mathematical range splitting β€” arithmetic progression for numeric primary keys and Unicode-encoded range splitting for string keys β€” significantly reducing chunk generation time for large tables while ensuring correct collation-aware ordering.

  2. SSH tunnel support for DB2 and MSSQL -
    Added SSH tunnel configuration for the DB2 and MSSQL drivers. DB2 uses a local TCP proxy on localhost:0 forwarded through the SSH client (since go_ibm_db has no Go-level dial hook), while MSSQL routes connections via go-mssqldb's Connector.Dialer and HostDialer interfaces with remote-side DNS resolution.

  3. Schema filtering for PostgreSQL discovery -
    Added an optional schemas config field to restrict the discover operation to user-specified PostgreSQL schemas. When omitted, existing behaviour is preserved and all non-system schemas are discovered.

  4. MSSQL read replica support -
    Added optional jdbc_url_params to the MSSQL source so you can target Always On read replicas (for example with read-intent), and updated CDC to use replica-safe paths that avoid primary-only agent/msdb and capture-instance management on secondaries.

Destinations​

  1. Skip equality deletes for CDC inserts post-backfill -
    Equality deletes are now skipped for CDC inserts once the backfill→CDC overlap window is complete, reducing unnecessary write overhead. A new dedup_inserts flag on the Iceberg olake_2pc table property tracks this — Java sets it to true on backfill commit, and Go clears it to false after the first successful CDC commit. This applies to both the Arrow and legacy gRPC writers.

πŸ”§ Bug Fixes & Stability​

  1. Upgrade pgx/v5 to v5.9.2 for security fixes -
    Upgraded github.com/jackc/pgx/v5 from v5.7.3 to v5.9.2 to remediate two security vulnerabilities: a critical memory-safety flaw (CVE-2026-33816) that could allow memory corruption and a low-severity SQL injection advisory (GHSA-j88v-2chj-qfwx). No existing functionality is affected by this upgrade.

  2. Oracle chunk boundary query optimisation -
    Replaced N+1 sequential database round trips in splitViaTableIteration with a single NTILE-based query to fetch all chunk boundaries at once, with a fallback to the original loop when table stats are unavailable.

  3. Iceberg positional delete file fix for CDC upserts -
    Compaction was failing when multiple changes for the same _olake_id arrived in a single batch, caused by a positional delete file referencing multiple data files. Fixed by creating one positional delete file per data file reference.

  4. PostgreSQL primary key discovery fix via pg_catalog -
    information_schema.key_column_usage incorrectly included foreign key columns as primary keys, causing wrong _olake_id hashes, missed equality deletes, and duplicate rows in Iceberg on CDC upserts. Replaced with a pg_catalog-based query that returns only true primary keys and works correctly for read-only roles on managed databases like RDS, Supabase, and Render.

  5. MySQL CDC charset corruption fix for non-UTF8 columns -
    ENUM and string columns using non-UTF8 charsets (utf16, ucs2, latin1) were silently corrupted during CDC due to blind []byte β†’ string casts. Fixed by adding collation-aware decoding using TableMapEvent.CollationMap() and EnumSetCollationMap().

  6. MongoDB primary key pinning for deterministic deduplication -
    Previously, all indexed fields were treated as primary keys, so updates to non-unique indexed fields changed the _olake_id and broke Iceberg equality deletes, creating duplicate rows. The primary key is now pinned strictly to MongoDB’s guaranteed-unique _id, ensuring stable hashes and correct deduplicated upserts.

  7. DB2 driver download fix in integration tests -
    DB2 integration tests now reuse the already-installed clidriver by copying it into the workspace, so Docker containers find it locally instead of repeatedly hitting the flaky IBM CDN download path.



πŸ’‘ Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
πŸ‘‰ Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. πŸš€

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!