Skip to main content

OLake (v0.2.0 – v0.2.1)

August 15 – August 27, 2025

🎯 What's New

Sources

  1. Spec commands for Drivers and Destinations -
    The spec command outputs a machine-readable JSON schema describing the configuration options and requirements for each OLake driver (source or destination).

  2. Implemented Namespace feature: Database, Table and Column names with Normalization -
    Introduces Namespace Normalization that standardizes database, table, and column identifiers to a consistent, engine‑safe format, and adds a --job-name flag that’s incorporated during normalization to generate unique database names per job, preventing collisions across environments and making downstream querying predictable.

Destinations

  1. Destination Refactor -
    Refactored the OLake destination writer components, this includes a major overhaul of the Java writer code, a refactor of the Go side of the writer pipeline, and optimization of the protobuf record structure to improve serialization efficiency and resource usage.

🔧 Bug Fixes & Stability

  1. Environment not passed to Iceberg; JAR breaking IRSA in Kubernetes -
    This fix ensures that critical AWS IRSA environment variables (AWS_ROLE_ARN, AWS_WEB_IDENTITY_TOKEN_FILE) and related JVM options (JAVA_TOOL_OPTS) are correctly passed to the Iceberg writer’s child process.

  2. MongoDB Integration Test -
    Updated the MongoDB integration tests to use a larger test runner (32GB memory) enabling all drivers’ tests to run in parallel reliably. This improved test stability and reduced flaky failures during multi-driver parallel execution.

  3. Type conversion in Icerberg data types -
    Added type hierarchy checks in Iceberg type conversions to ensure compatibility and correctness during schema evolution and data writes.

  4. Removed Segment dependency; telemetry events now sent directly to Mixpanel -
    Updated OLake to send telemetry events directly to Mixpanel, removing the previous dependency on Segment.

  5. Respect --state flag when writing OLake sync state to file -
    Corrected OLake behavior where specifying --state /path/to/file.json now ensures the sync state is both read from and written to the specified file path, instead of incorrectly writing only to <config_folder>/state.json.

  6. gRPC auto code dependency error -
    Resolved gRPC auto‑code dependency error by reverting the gRPC version on the Java side and regenerating RPC stubs with protoc 21, restoring sync compatibility.



💡 Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
👉 Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. 🚀

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!