Summary
The sixth OLake community meetup (28 April 2025) centred on a real-world production story from PhysicsWallah and a deeper dive into OLake’s roadmap. Guest speaker Adish Jain walked the community through PhysicsWallah migration from a Redshift warehouse to an Iceberg-based lakehouse, the pains they faced with Debezium, and how OLake solved them with faster, resumable full loads, direct Iceberg ingestion, and automatic schema evolution. A live demo showed MongoDB-to-Iceberg ingestion running in Kubernetes. Shubham Baldava then unpacked OLake’s Golang + Java architecture, explained plans to shift the Iceberg writer to Go/Rust for lower memory use, previewed an upcoming UI, and announced mid-level SMT transformations arriving within three months.
Chapters & Topics
Introduction and Agenda
Priyansh Khodiyar opened the sixth community meetup, introduced Adish Jain (long-time design partner), and outlined a two-part agenda: a production story from PhysicsWallah followed by OLake updates and roadmap.
PhysicsWallah Data Infrastructure Journey
Adish Jain described PhysicsWallah move from a Redshift warehouse to an S3-backed lakehouse with bronze, silver, and gold layers. Their stack now combines open-source tools (AI/Spark/Iceberg) with in-house services such as Dataset API and Lagoon for orchestration.
Challenges with Debezium
Adish detailed 18 months of friction with Debezium: complex schema evolution, no direct data-lake writes, incremental-snapshot limitations, slow multi-billion-row full loads, lack of heterogeneous-array support, and no resume support for failed jobs.
How OLake Addresses the Pain Points
Key OLake features—configurable full loads, Kafka-free CDC, automatic schema evolution, direct source-to-Iceberg ingestion, resumable loads, and built-in Iceberg partitioning—were mapped to each Debezium pain point.
OLake Demo (MongoDB → Iceberg)
Using a staging MongoDB collection (~1.5 million rows), Adish demonstrated OLake’s ingestion in a Kubernetes cluster. The demo showed as-is replication, automatic schema evolution, and Iceberg table creation with partitioning.
OLake Architecture Q&A
Shubham Baldava explained the custom framework: Golang workers pull data; a lightweight Java gRPC service writes equality-delete files to Iceberg. The team plans to replace the Java layer with a Go or Rust writer for better performance and memory efficiency.
Future Features and UI Sneak Peek
Shubham previewed two upcoming transformation layers—SMT (Simple Message Transformations) during ingest and heavier, post-ingest transforms—plus a first look at OLake’s new UI (designs ready, repo private for now).
Action Items
- Shubham Baldava to launch the SMT (mid-level transformation) feature within the next 2–3 months.