Skip to main content
Community Meetup

OLake 10th Community Meetup



Details

  • Date:

    January 28, 2026

  • Time:

    04:30 PM - 05:30 PM IST

  • Duration:

    1 hour

Summary

OLake Community Call | Engineers, Contributors & What's Next. We moved into double digits with these calls. We talked openly about where OLake is headed next, how we're thinking about bringing in more contributors, and what new integrations we've been working on. We also shared updates around our documentation and recent blogs so you can stay in sync.

  • New Sources: S3 Source (CSV, JSON, Parquet; AWS S3, MinIO, LocalStack; IAM auth, glob patterns), MSSQL Source (native Microsoft SQL Server to Iceberg), DB2 Source (IBM DB2 to Iceberg). Documentation added for all.
  • S3 Connector Architecture Deep Dive by Ankit Singhal — Contributor at OLake.
  • MOR → COW: Compaction script to convert Merge-on-Read tables to Copy-on-Write for engines that don't support equality deletes (e.g. Databricks, Snowflake). WAP checkpointing, idempotent re-runs, failure recovery.
  • Kubernetes & Jobs: Transition to Job Profiles, zero-based mapping, full control via NodeSelector, Tolerations, Affinity. Backward compatible with existing job mappings.
  • Community: Contributor spotlights, SWOC updates, new contributor recognition, recent blogs and case studies.

Related blogs

Documentation for the new sources covered in this call:



Hosted By

Akshay Kumar Sharma's profile picture

Akshay Kumar Sharma

DevRel @ OLake

OLake DevRel and community advocate, passionate about open-source data engineering and lakehouse architectures.

Summary

The tenth OLake community meetup marked a milestone as the community moved into double digits. Hosted by Akshay Kumar Sharma and the OLake team, the call covered new source integrations (S3, MSSQL, DB2), MOR to COW architecture improvements for query engine compatibility, Kubernetes and job execution enhancements, contributor spotlights from the Social Winter of Code (SWOC) program, and upcoming community events. The session was an open discussion about where OLake is headed next and how to bring in more contributors.

Chapters & Topics

New Sources Added

OLake expanded its source ecosystem with production-ready integrations. S3 Source Integration allows reading data directly from S3-compatible storage (AWS S3, MinIO, LocalStack) in CSV, JSON, and Parquet formats, with IAM-based authentication and glob patterns for file discovery. MSSQL Source provides native support for Microsoft SQL Server, enabling teams to ingest data from existing MSSQL deployments into Apache Iceberg. DB2 Source offers enterprise-grade support for IBM DB2, enabling seamless ingestion into Iceberg-backed lakehouse architectures. Documentation has been added for all new sources so teams can plug them into existing architectures.

S3 Connector Architecture Deep Dive by Ankit Singhal — Contributor at OLake

Ankit Singhal, Contributor at OLake, presented a deep dive into the S3 connector architecture, covering how the connector reads data from S3-compatible storage, supports multiple formats (CSV, JSON, Parquet), and integrates with AWS S3, MinIO, and LocalStack. The session explored the design, file discovery with glob patterns, and IAM-based authentication for production use.

MOR → COW Architecture Improvements

Olake ingests CDC data using Merge-on-Read (MOR) with equality deletes. Many query engines (Databricks, Snowflake) do not fully support equality deletes, which can lead to incorrect query results. To address this, Olake introduced a MOR to COW compaction script that periodically converts MOR tables into Copy-on-Write (COW), produces clean query-ready Iceberg tables, uses WAP (Write-Audit-Publish) for atomic checkpointing, supports idempotent re-runs and automatic failure recovery, and ensures correctness without sacrificing ingestion performance.

Kubernetes & Job Execution Enhancements

Major improvements to job execution and scheduling were introduced: transition from Job Mapping to Job Profiles, zero-based mapping support, and full Kubernetes scheduling control using NodeSelector, Tolerations, and Affinity. The changes maintain backward compatibility with existing job mappings and provide better scalability, flexibility, and control in Kubernetes-based deployments.

Community Highlights

The call focused on the people behind Olake: contributor spotlights and shoutouts, updates from the Social Winter of Code (SWOC) program, recognition of new contributors and their impact, and highlights from recent community blogs and company case studies.

Action Items

  • Teams can adopt the new S3, MSSQL, and DB2 sources using the added documentation.
  • Use the MOR to COW compaction script where query engines do not support equality deletes.
  • Explore Job Profiles and Kubernetes scheduling options (NodeSelector, Tolerations, Affinity) for deployments.

Ready to Join our next OLake community meetup?

Secure your spot by registering below.