OLake (v0.3.14 - v0.3.16)
January 13, 2026 – February 09, 2026
🎯 What's New
Sources
-
New Source Introduction -
Introduced MSSQL as new source connector for OLake. -
New Source Introduction -
Introduced DB2 LUW as new source connector for OLake. -
MySQL connector JDBC params and SSL support -
Added support for JDBC URL parameters and SSL/TLS configuration in the MySQL connector, enabling more flexible and secure connections. -
Kafka Avro schema registry support -
Added Schema Registry support for Kafka sync for Avro data using the Schema Registry API (currently Confluent Schema Registry); requires Confluent Wire Format, otherwise falls back to JSON.
Platform Features
-
OLake full refresh sync performance -
Improved OLake sync performance for full refresh operations. -
Manual approval for integration tests -
Added a manual approval gate for integration tests so they only run when explicitly approved. -
OLake version in telemetry -
Added OLake version as a telemetry property so it’s visible in the Mixpanel dashboard for filtering and analysis.
Destinations
-
Dedup handled by Iceberg Java engine -
Removed Go-side de-duplication logic since the Iceberg Java writer already handles row-level changes via positional delete files. -
Driver-specific CDC ordering metadata columns -
Added driver-specific CDC position columns (MySQL binlog file/pos, PostgreSQL LSN, MongoDB resume token, MSSQL start LSN/seq) so events can be reliably ordered even when_cdc_timestampis identical within a transaction, and emit them consistently in both normalized and non-normalized outputs.
🔧 Bug Fixes & Stability
-
Dynamic RPS benchmarking -
Improved performance tests to compare requests-per-second (RPS) against the average of the last 5 runs instead of static benchmarks for more reliable performance tracking. -
Parquet schema field name normalization -
Normalized schema field names to lowercase before Parquet schema creation so mixed-case definitions likeCOL_INTEGERandcol_integermap deterministically to a single destination column, preventing non-deterministic type overrides and intermittent Parquet write failures during schema evolution. -
compare.go unit test coverage -
Added unit tests forolake/utils/typeutils/compare.go, covering comparisons across nil, signed/unsigned integers, floats, booleans, time.Time, custom time types, and fallback string comparisons to harden core comparison logic used in incremental sync, backfill, and binlog processing. -
MongoDB connection extra parameters -
Allowed passing arbitrary key–value connection options via the MongoDB config so they are appended to the MongoDB URI as additional parameters. -
Iceberg dedup commit ordering fix -
Updated the Iceberg writer to commit each writer completion’s data files and equality delete files together in a single atomic commit, preserving Iceberg sequence-number semantics so deletes apply to the intended data files and preventing duplicate results during schema evolution + concurrent updates. -
Arrow Iceberg writer improvements -
Added support fornow()in partition transformations, fixed int-to-long handling in the Arrow Iceberg writer, and added integration tests. -
New writer instance on retry -
On retry, the writer now starts with a fresh instance and new artifacts instead of reusing the previous failed writer state, preventing unexpected failures and avoiding stale/partial data from being pushed after a retry. -
Kafka flatten json.Number handling -
Updatedflatten.goto treatjson.Numberas a supported primitive type so numeric values decoded withUseNumber()pass through as numbers (not JSON strings), preventing schema detection type mismatches for Kafka JSON messages. -
MySQL binlog FLOAT precision alignment -
Ensured FLOAT values read from MySQL binlog follow MySQL’s float32 semantics by casting binlog float values back to float32, preventing extra-precision float64 output mismatches versus MySQLSELECTresults. -
IBM Db2 LUW integration tests -
Added integration tests for IBM Db2 LUW to improve end-to-end validation.
11. MSSQL CDC capture instance selection for schema evolution -
With schema changes, CDC may create a new capture instance and streaming could keep reading from an older one. Fix: Always pick the newest valid capture instance (based on start_lsn vs fromLSN) and clamp reads to the new instance’s start_lsn when switching.
- MySQL CDC TIMESTAMP timezone alignment -
Resolve the effective MySQL timezone (session → global → system, or jdbc_url_params.time_zone) and use it in the CDC binlog syncer so CDC writes TIMESTAMP in the DB/server timezone.