Skip to main content
Community Meetup

OLake 6th Community Meetup



Details

  • Date:

    April 28, 2025

  • Time:

    04:30 PM - 05:30 PM IST

  • Duration:

    1 hours

Summary

OLake 6th Community Meetup



    Hosted By

    Priyansh Khodiyar's profile picture

    Priyansh Khodiyar

    DevRel and OLake Maintainer

    Shubham Satish Baldava's profile picture

    Shubham Satish Baldava

    Co-founder @ Datazip and OLake Maintainer

    Summary

    The sixth OLake community meetup (28 April 2025) centred on a real-world production story from PhysicsWallah and a deeper dive into OLake’s roadmap. Guest speaker Adish Jain walked the community through PhysicsWallah migration from a Redshift warehouse to an Iceberg-based lakehouse, the pains they faced with Debezium, and how OLake solved them with faster, resumable full loads, direct Iceberg ingestion, and automatic schema evolution. A live demo showed MongoDB-to-Iceberg ingestion running in Kubernetes. Shubham Baldava then unpacked OLake’s Golang + Java architecture, explained plans to shift the Iceberg writer to Go/Rust for lower memory use, previewed an upcoming UI, and announced mid-level SMT transformations arriving within three months.

    Chapters & Topics

    Introduction and Agenda

    Priyansh Khodiyar opened the sixth community meetup, introduced Adish Jain (long-time design partner), and outlined a two-part agenda: a production story from PhysicsWallah followed by OLake updates and roadmap.

    PhysicsWallah Data Infrastructure Journey

    Adish Jain described PhysicsWallah move from a Redshift warehouse to an S3-backed lakehouse with bronze, silver, and gold layers. Their stack now combines open-source tools (AI/Spark/Iceberg) with in-house services such as Dataset API and Lagoon for orchestration.

    Challenges with Debezium

    Adish detailed 18 months of friction with Debezium: complex schema evolution, no direct data-lake writes, incremental-snapshot limitations, slow multi-billion-row full loads, lack of heterogeneous-array support, and no resume support for failed jobs.

    How OLake Addresses the Pain Points

    Key OLake features—configurable full loads, Kafka-free CDC, automatic schema evolution, direct source-to-Iceberg ingestion, resumable loads, and built-in Iceberg partitioning—were mapped to each Debezium pain point.

    OLake Demo (MongoDB → Iceberg)

    Using a staging MongoDB collection (~1.5 million rows), Adish demonstrated OLake’s ingestion in a Kubernetes cluster. The demo showed as-is replication, automatic schema evolution, and Iceberg table creation with partitioning.

    OLake Architecture Q&A

    Shubham Baldava explained the custom framework: Golang workers pull data; a lightweight Java gRPC service writes equality-delete files to Iceberg. The team plans to replace the Java layer with a Go or Rust writer for better performance and memory efficiency.

    Future Features and UI Sneak Peek

    Shubham previewed two upcoming transformation layers—SMT (Simple Message Transformations) during ingest and heavier, post-ingest transforms—plus a first look at OLake’s new UI (designs ready, repo private for now).

    Action Items

    • Shubham Baldava to launch the SMT (mid-level transformation) feature within the next 2–3 months.

    Ready to Join our next OLake community meetup?

    Secure your spot by registering below.