Summary
The tenth OLake community meetup marked a milestone as the community moved into double digits. Hosted by Akshay Kumar Sharma and the OLake team, the call covered new source integrations (S3, MSSQL, DB2), MOR to COW architecture improvements for query engine compatibility, Kubernetes and job execution enhancements, contributor spotlights from the Social Winter of Code (SWOC) program, and upcoming community events. The session was an open discussion about where OLake is headed next and how to bring in more contributors.
Chapters & Topics
New Sources Added
OLake expanded its source ecosystem with production-ready integrations. S3 Source Integration allows reading data directly from S3-compatible storage (AWS S3, MinIO, LocalStack) in CSV, JSON, and Parquet formats, with IAM-based authentication and glob patterns for file discovery. MSSQL Source provides native support for Microsoft SQL Server, enabling teams to ingest data from existing MSSQL deployments into Apache Iceberg. DB2 Source offers enterprise-grade support for IBM DB2, enabling seamless ingestion into Iceberg-backed lakehouse architectures. Documentation has been added for all new sources so teams can plug them into existing architectures.
S3 Connector Architecture Deep Dive by Ankit Singhal — Contributor at OLake
Ankit Singhal, Contributor at OLake, presented a deep dive into the S3 connector architecture, covering how the connector reads data from S3-compatible storage, supports multiple formats (CSV, JSON, Parquet), and integrates with AWS S3, MinIO, and LocalStack. The session explored the design, file discovery with glob patterns, and IAM-based authentication for production use.
MOR → COW Architecture Improvements
Olake ingests CDC data using Merge-on-Read (MOR) with equality deletes. Many query engines (Databricks, Snowflake) do not fully support equality deletes, which can lead to incorrect query results. To address this, Olake introduced a MOR to COW compaction script that periodically converts MOR tables into Copy-on-Write (COW), produces clean query-ready Iceberg tables, uses WAP (Write-Audit-Publish) for atomic checkpointing, supports idempotent re-runs and automatic failure recovery, and ensures correctness without sacrificing ingestion performance.
Kubernetes & Job Execution Enhancements
Major improvements to job execution and scheduling were introduced: transition from Job Mapping to Job Profiles, zero-based mapping support, and full Kubernetes scheduling control using NodeSelector, Tolerations, and Affinity. The changes maintain backward compatibility with existing job mappings and provide better scalability, flexibility, and control in Kubernetes-based deployments.
Community Highlights
The call focused on the people behind Olake: contributor spotlights and shoutouts, updates from the Social Winter of Code (SWOC) program, recognition of new contributors and their impact, and highlights from recent community blogs and company case studies.
Action Items
- Teams can adopt the new S3, MSSQL, and DB2 sources using the added documentation.
- Use the MOR to COW compaction script where query engines do not support equality deletes.
- Explore Job Profiles and Kubernetes scheduling options (NodeSelector, Tolerations, Affinity) for deployments.
