OLake (v0.3.9 - v0.3.13)
December 29, 2025 β January 12, 2026
π― What's Newβ
Sourcesβ
-
New Source Introduction -
Introduced S3 as new source connector for OLake. -
S3 source driver support -
Added S3 source driver spec supporting JSON, CSV, and Parquet file formats, with schema inference for JSON/CSV for consistent type handling across sources.
Platform Featuresβ
- Spark + Iceberg playground setup -
Added a Spark and Iceberg playground with a Jupyter notebook, full Docker Compose setup, and configuration/docs so users can quickly spin up an environment and run interactive data analysis.β
π§ Bug Fixes & Stabilityβ
-
MySQL CDC timestamp precision update -
Updated the_cdc_timestampvalue for MySQL to store time with millisecond precision for more accurate change tracking. -
MySQL CDC ENUM update handling -
Fixed MySQL CDC update events for ENUM columns by resolving int64 enum index values from the binlog to their actual ENUM strings before writing them into Parquet. -
Kafka topic schema discovery fix -
Fixed Kafka topic-to-streams schema discovery to produce streams with correct data types. -
Parquet file naming for proper sorting -
Updated Parquet file naming convention to use zero-padded date and time components so files sort correctly by timestamp, for e.g., previous =2026-1-3_8-27-56_01KE1FGTKPFDMN79ZN9P47KYY0.parquetand current =2026-01-03_08-26-26_01KE1FE28V82MCTY1M02DBM3G7.parquet. -
Catalog default streams tracking -
Added adefault_streamsproperty to the catalog type to record all streams discovered initially, providing a clear baseline list for stream selection and management. -
String time parsing fix -
Fixed incorrect conversion of string time values to epoch start time, added missing parsing cases in reformat.go, ensured consistent array value output, and added version tracking in state files for backward compatibility. -
Kafka JSON integer/float conversion fix -
Fixed incorrect type conversion for integer and float values within JSON messages from Kafka source. -
PostgreSQL CDC replication slot validation -
Fixed CDC connection check to validate replication slot existence specifically within the current database, preventing false positives from slots existing in other databases. -
CDC idle checkpoint evaluation fix -
Now evaluates idle checkpoint after every processed change instead of only during empty iterations, enabling faster and more predictable CDC termination when fully caught up.