OLake 6th Community Meetup

Summary

The sixth OLake community meetup (28 April 2025) centred on a real-world production story from PhysicsWallah and a deeper dive into OLake’s roadmap. Guest speaker Adish Jain walked the community through PhysicsWallah migration from a Redshift warehouse to an Iceberg-based lakehouse, the pains they faced with Debezium, and how OLake solved them with faster, resumable full loads, direct Iceberg ingestion, and automatic schema evolution. A live demo showed MongoDB-to-Iceberg ingestion running in Kubernetes. Shubham Baldava then unpacked OLake’s Golang + Java architecture, explained plans to shift the Iceberg writer to Go/Rust for lower memory use, previewed an upcoming UI, and announced mid-level SMT transformations arriving within three months.

Chapters & Topics

Introduction and Agenda

Priyansh Khodiyar opened the sixth community meetup, introduced Adish Jain (long-time design partner), and outlined a two-part agenda: a production story from PhysicsWallah followed by OLake updates and roadmap.

PhysicsWallah Data Infrastructure Journey

Adish Jain described PhysicsWallah move from a Redshift warehouse to an S3-backed lakehouse with bronze, silver, and gold layers. Their stack now combines open-source tools (AI/Spark/Iceberg) with in-house services such as Dataset API and Lagoon for orchestration.

Challenges with Debezium

Adish detailed 18 months of friction with Debezium: complex schema evolution, no direct data-lake writes, incremental-snapshot limitations, slow multi-billion-row full loads, lack of heterogeneous-array support, and no resume support for failed jobs.

How OLake Addresses the Pain Points

Key OLake features—configurable full loads, Kafka-free CDC, automatic schema evolution, direct source-to-Iceberg ingestion, resumable loads, and built-in Iceberg partitioning—were mapped to each Debezium pain point.

OLake Demo (MongoDB → Iceberg)

Using a staging MongoDB collection (~1.5 million rows), Adish demonstrated OLake’s ingestion in a Kubernetes cluster. The demo showed as-is replication, automatic schema evolution, and Iceberg table creation with partitioning.

OLake Architecture Q&A

Shubham Baldava explained the custom framework: Golang workers pull data; a lightweight Java gRPC service writes equality-delete files to Iceberg. The team plans to replace the Java layer with a Go or Rust writer for better performance and memory efficiency.

Future Features and UI Sneak Peek

Shubham previewed two upcoming transformation layers—SMT (Simple Message Transformations) during ingest and heavier, post-ingest transforms—plus a first look at OLake’s new UI (designs ready, repo private for now).

Action Items

Shubham Baldava to launch the SMT (mid-level transformation) feature within the next 2–3 months.

OLake 6th Community Meetup

Details

Summary

Slides

6th Community Meetup

PhysicsWallah with OLake

Hosted By

Priyansh Khodiyar

Shubham Satish Baldava

Summary

Chapters & Topics

Introduction and Agenda

PhysicsWallah Data Infrastructure Journey

Challenges with Debezium

How OLake Addresses the Pain Points

OLake Demo (MongoDB → Iceberg)

OLake Architecture Q&A

Future Features and UI Sneak Peek

Action Items

Ready to Join our next OLake community meetup?