Summary
The community meetup focused on updates and developments within the OLake project, highlighting contributions from various members and upcoming features. Priyansh Khodiyar introduced Shubham Baldava, who shared insights from his experience in data engineering and discussed the significance of new features like the parquet writer and MongoDB 2.0. Contributions from community members, such as DeepRip's work on the packet writer and Magic Potato's performance improvements, were acknowledged. Priyansh also emphasized the importance of community engagement in enhancing the project and outlined the roadmap for OLake, including the integration of the Apache Iceberg Writer and the ongoing development of a Postgres Writer. The session included a demonstration of OLake's CLI functionality and upcoming UI enhancements, with detailed explanations on syncing MongoDB data to S3 and managing schema discovery. Ankit Kumar provided updates on various open pull requests and discussed challenges faced with the Parquet Writer, while Shubham proposed enhancements to the MongoDB backfill strategy. The team encouraged community contributions to improve code quality and documentation, with plans for a follow-up meetup to showcase new developments. Overall, the focus remains on expanding functionality and making OLake more user-friendly for data engineers, with a commitment to sharing resources and fostering community interaction.
Chapters & Topics
Community Introductions and Feature Updates
Priyansh Khodiyar opened the meetup by welcoming attendees and introducing Shubham Baldava, the CTO, who discussed his experience in data engineering and the evolution of data lake houses. Priyansh presented the agenda, which included updates on new features such as a parquet writer and MongoDB connector. He also acknowledged contributions from community members Deepak, Magic Potato, and Kuldeep, who are working on various improvements.
OLake Demo and Configuration Overview
Priyansh Khodiyar demonstrated how to get started with OLake, emphasizing the importance of having Docker installed and providing a step-by-step guide for syncing MongoDB data to S3. He discussed the configuration files required, such as writer.json and config.json, and mentioned the creation of a state.json file to track sync progress.
Updates on Apache Iceberg Writer and Documentation
Priyansh Khodiyar shared that the Apache Iceberg Writer is set to be merged by the end of February, aligning with industry trends as companies like Apple transition to Iceberg. Shubham Baldava praised the new documentation website created by Priyansh and provided updates on the Iceberg Writer's features, including schema evolution. He also mentioned that the Postgres Writer is in demand and expected to be published in two to three weeks, with MySQL following shortly after.
OLake Roadmap and Community Contributions
Priyansh Khodiyar provided an overview of the OLake roadmap and invited community contributions through documentation and issue resolution. He detailed how parquet files are created, emphasizing the impact of normalization settings on data structure. Ankit Kumar then took the stage to discuss ongoing pull requests and issues, highlighting specific features and improvements being worked on.
Progress Updates on MySQL and Parquet Writers
Ankit Kumar discussed the progress of the MySQL source and the Iceberg Writer, which Sugam is handling. Sebin shared challenges he is facing with the Parquet Writer, particularly regarding the normalization process and errors linked to the MongoDB setup. Ankit suggested that the issue may be resolved by pulling the latest code from the master branch.
Feature Development and Data Source Integration
Shubham Baldava highlighted the importance of adding a MongoDB backfill strategy and a data filtering feature to optimize data synchronization. He also mentioned the upcoming UI and API work led by Swati, which is expected to be published in the next few weeks. Ankit Kumar emphasized the need for community contributions to various data sources, including S3 and Kafka.
Community Contributions and Documentation Updates
Ankit Kumar emphasized the need for community involvement in enhancing code readability through PRs. Priyansh Khodiyar shared updates on documentation improvements, such as the inclusion of sample data sets and tools for reporting discrepancies. He also proposed the next community meetup to discuss ongoing projects and gather feedback.
Action Items
- Create a new OLake bucket and configure writer.json and config.json files with necessary database connection details.
- Shubham Baldava will ensure that the new documentation website is maintained and updated for community use.
- Ankit Kumar will review the open pull requests and encourage community members to contribute to the review process.
- Ankit Kumar will provide documentation on how to run a Docker Compose for testing purposes.
- Swati will publish the first level of the UI for OLake in the next two to three weeks.
- Vikash, Priyansh, and Ankit will assist anyone interested in contributing to the Iceberg Writer and related issues.
- Priyansh Khodiyar will ensure that the next community meetup is scheduled to occur in at least two weeks.
- Set up the config path, destination path, and catalog path for data syncing after running the schema generation command.
Key Questions
- What is the expected timeline for the Iceberg Writer to be merged?
- How can community members report issues with the documentation?