Skip to main content
Community Meetup

OLake 6th Community Meetup



Details

  • Date:

    April 28, 2025

  • Time:

    04:30 PM - 05:30 PM IST

  • Duration:

    1 hours

Summary

OLake 6th Community Meetup



    Hosted By

    Priyansh Khodiyar's profile picture

    Priyansh Khodiyar

    Ex DevRel

    Shubham Satish Baldava's profile picture

    Shubham Satish Baldava

    CTO, OLake

    Shubham oversees OLake's engineering roadmap, focusing on CDC-first architectures and production-ready Iceberg deployments. He collaborates with customers to remove bottlenecks in data lake operations and accelerates modern data platform migrations.

    Summary

    The sixth OLake community meetup (28 April 2025) centred on a real-world production story from PhysicsWallah and a deeper dive into OLake’s roadmap. Guest speaker Adish Jain walked the community through PhysicsWallah migration from a Redshift warehouse to an Iceberg-based lakehouse, the pains they faced with Debezium, and how OLake solved them with faster, resumable full loads, direct Iceberg ingestion, and automatic schema evolution. A live demo showed MongoDB-to-Iceberg ingestion running in Kubernetes. Shubham Baldava then unpacked OLake’s Golang + Java architecture, explained plans to shift the Iceberg writer to Go/Rust for lower memory use, previewed an upcoming UI, and announced mid-level SMT transformations arriving within three months.

    Chapters & Topics

    Introduction and Agenda

    Priyansh Khodiyar opened the sixth community meetup, introduced Adish Jain (long-time design partner), and outlined a two-part agenda: a production story from PhysicsWallah followed by OLake updates and roadmap.

    PhysicsWallah Data Infrastructure Journey

    Adish Jain described PhysicsWallah move from a Redshift warehouse to an S3-backed lakehouse with bronze, silver, and gold layers. Their stack now combines open-source tools (AI/Spark/Iceberg) with in-house services such as Dataset API and Lagoon for orchestration.

    Challenges with Debezium

    Adish detailed 18 months of friction with Debezium: complex schema evolution, no direct data-lake writes, incremental-snapshot limitations, slow multi-billion-row full loads, lack of heterogeneous-array support, and no resume support for failed jobs.

    How OLake Addresses the Pain Points

    Key OLake features—configurable full loads, Kafka-free CDC, automatic schema evolution, direct source-to-Iceberg ingestion, resumable loads, and built-in Iceberg partitioning—were mapped to each Debezium pain point.

    OLake Demo (MongoDB → Iceberg)

    Using a staging MongoDB collection (~1.5 million rows), Adish demonstrated OLake’s ingestion in a Kubernetes cluster. The demo showed as-is replication, automatic schema evolution, and Iceberg table creation with partitioning.

    OLake Architecture Q&A

    Shubham Baldava explained the custom framework: Golang workers pull data; a lightweight Java gRPC service writes equality-delete files to Iceberg. The team plans to replace the Java layer with a Go or Rust writer for better performance and memory efficiency.

    Future Features and UI Sneak Peek

    Shubham previewed two upcoming transformation layers—SMT (Simple Message Transformations) during ingest and heavier, post-ingest transforms—plus a first look at OLake’s new UI (designs ready, repo private for now).

    Action Items

    • Shubham Baldava to launch the SMT (mid-level transformation) feature within the next 2–3 months.

    Ready to Join our next OLake community meetup?

    Secure your spot by registering below.