Skip to main content

OLake (v0.2.8 - v0.2.9)

October 11 – October 30, 2025

🎯 What's New​

Sources​

  1. Implement sync progress tracking for Oracle driver -
    Added sync progress tracking to the Oracle driver, enabling users to view real-time sync progress statistics and estimated time of completion.

Destinations​

  1. Clear destination in Iceberg (CLI only)-
    Introduced a clear destination feature for Iceberg that enables selective stream cleanup. This feature deletes all the data available in the destination for the selected streams of a particular job. It’s useful when stream configurations like normalization, filters, or partitions are misconfigured and needs to be changed and a fresh sync is needed.

πŸ”§ Bug Fixes & Stability​

  1. Process timeout and database prefix configuration -
    Fixed process timeout handling and database prefix configuration for GitHub Actions runners. Added the --destination-database-prefix flag to both sync and discover commands to resolve database naming conflicts with GitHub runner restrictions that require the performance_ prefix. Additionally, fixed sync timeout behavior where containerized jobs continued running after timeout by implementing manual process termination, ensuring complete cleanup and preventing resource leaks in CI/CD environments.

  2. gRPC port binding race condition -
    Fixed gRPC port binding failures caused by race conditions between processes. When one Java process finished with a gRPC port and another quickly attempted to bind to it, the OS required up to several minutes to fully release the port, causing binding failures. This fix implements proper handling to avoid port conflicts during rapid process restarts, improving reliability in containerized environments.

  3. Duplicate table creation by threads -
    Previously, multiple threads attempted to create table simultaneously, intermittently failing when the table already existed and throwing errors. The logic now catches and handles the β€œalready exists” condition, preventing threads failures during parallel workloads.

  4. Azure ADLS Lakekeeper Rest Catalog Issue -
    Fixed an issue where OLake was unable to handle ADLSFileio even when the ADLS Endpoint was passed. Removed an unnecessary conditional check, allowing org.apache.iceberg.io.ResolvingFileIO to properly resolve and handle FileIO for GCS, AWS, and Azure cloud providers.

  5. Integration test Spark error fixed -
    Resolved an intermittent race condition in integration tests where Spark queries failed with "Cannot check and eventually update SQL schema" errors. This occurred when data wasn't properly updated in the catalog before query execution. Implemented retry logic with exponential backoff (5 retry attempts at 2-second intervals) to allow the catalog to sync before query validation, improving test reliability and reducing flaky failures.

  6. Add context while querying (sql/sqlx) -
    Added context parameter to all database query methods in driver files. This ensures that query cancellation, deadlines, and timeouts are properly respected throughout the codebase, improving reliability and resource management during database operations.

  7. MySQL geospatial type -
    Fixed handling of MySQL geospatial data types during sync operations, ensuring data integrity and proper type conversion when transferring geospatial columns from MySQL sources.

  8. MySQL invalid date breaks sync -
    Fixed sync failures caused by invalid dates such as 0000-00-00 or invalid month/day values in MySQL databases. These issues are now handled by replacing invalid dates with the epoch start date (1970-01-01).

  9. Handle chunk generation for partitioned PostgreSQL tables using max page ID -
    Fixed chunk generation for PostgreSQL tables to ensure correct CTID range calculation when handling partitioned or large tables. Implemented a new query to compute maxPageID and partitionCount and added a fallback to relpages when maxPageID is invalid (≀ 0), improving sync reliability for complex table structures.

  10. Added flatten, resolver and catalog tests and CI -
    Expanded test coverage with comprehensive unit tests for flatten, resolver, and catalog file logic. Tests cover various scenarios and are integrated into the CI pipeline, improving code reliability.



πŸ’‘ Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
πŸ‘‰ Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. πŸš€

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!