OLake (v0.2.0 – v0.2.1)
August 15 – August 27, 2025
🎯 What's New
Sources
-
Spec commands for Drivers and Destinations -
The spec command outputs a machine-readable JSON schema describing the configuration options and requirements for each OLake driver (source or destination). -
Implemented Namespace feature: Database, Table and Column names with Normalization -
Introduces Namespace Normalization that standardizes database, table, and column identifiers to a consistent, engine‑safe format, and adds a --job-name flag that’s incorporated during normalization to generate unique database names per job, preventing collisions across environments and making downstream querying predictable.
Destinations
- Destination Refactor -
Refactored the OLake destination writer components, this includes a major overhaul of the Java writer code, a refactor of the Go side of the writer pipeline, and optimization of the protobuf record structure to improve serialization efficiency and resource usage.
🔧 Bug Fixes & Stability
-
Environment not passed to Iceberg; JAR breaking IRSA in Kubernetes -
This fix ensures that critical AWS IRSA environment variables (AWS_ROLE_ARN
,AWS_WEB_IDENTITY_TOKEN_FILE
) and related JVM options (JAVA_TOOL_OPTS
) are correctly passed to the Iceberg writer’s child process. -
MongoDB Integration Test -
Updated the MongoDB integration tests to use a larger test runner (32GB memory) enabling all drivers’ tests to run in parallel reliably. This improved test stability and reduced flaky failures during multi-driver parallel execution. -
Type conversion in Icerberg data types -
Added type hierarchy checks in Iceberg type conversions to ensure compatibility and correctness during schema evolution and data writes. -
Removed Segment dependency; telemetry events now sent directly to Mixpanel -
Updated OLake to send telemetry events directly to Mixpanel, removing the previous dependency on Segment. -
Respect --state flag when writing OLake sync state to file -
Corrected OLake behavior where specifying--state /path/to/file.json
now ensures the sync state is both read from and written to the specified file path, instead of incorrectly writing only to<config_folder>/state.json
. -
gRPC auto code dependency error -
Resolved gRPC auto‑code dependency error by reverting the gRPC version on the Java side and regenerating RPC stubs with protoc 21, restoring sync compatibility.