Last updated:11/24/2025|... min read

Welcome to OLake

Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Visit olake.io for the full documentation, and benchmarks

Introduction to OLake

OLake is a blazing-fast open-source tool that replicates data from diverse sources into Apache Iceberg and Parquet, delivering real-time lakehouse analytics without the pain of ETL scripts or vendor lock-in.

GitHub Repository: https://github.com/datazip-inc/olake

What is OLake?

OLake is an open-source ELT framework, fully written in Golang for memory efficiency and high performance. It replicates data from sources like PostgreSQL, MySQL, MongoDB, Oracle and Kafka directly into open lakehouse formats such as Apache Iceberg and Parquet. Using Incremental Sync and Change Data Capture (CDC), OLake keeps data continuously in sync while minimizing infrastructure overhead—offering a simple, reliable, and scalable path to building a modern lakehouse.

This allows organizations to:

Replicate data at scale
Power near real-time analytics
Transform data lakes into fully functional lakehouses without the overhead of complex ETL tools

Why OLake?

Fastest Path to a Lakehouse → Achieve high throughput with parallelized chunking and resumable historical snapshots and blazing-fast incremental updates, even on massive datasets with exactly-once delivery.
Efficient Data Capture → Capture data efficiently with a full snapshot of your tables or collections, then keep them in sync through near real-time CDC using native database logs (pgoutput, binlogs, oplogs).
Schema-Aware Replication → Automatically detect schema changes to keep your pipelines consistent and reliable.
Open by Design → Store data in open formats like Parquet and Iceberg, enabling engine-agnostic analytics and eliminating vendor lock-in.

OLake Features Overview

Source-Level Features

Supported Connectors

PostgreSQL → Supports Full Refresh, Incremental Sync, and Pgoutput-based Full Refresh + CDC and Strict CDC (RDS, Aurora, Supabase, etc.)
MySQL → Supports Full Refresh, Incremental Sync, and Binlog-based Full Refresh + CDC and Strict CDC (MySQL RDS, Aurora, older community versions)
MongoDB → Supports Full Refresh, Incremental Sync, and Oplog-based Full Refresh + CDC and Strict CDC (sharded or replica-set clusters)
Oracle → Supports Full Refresh and Incremental Sync
Kafka → Supports Consumer Group Based Streaming (Append Only)

Optimized Chunking Strategies

PostgreSQL → CTID ranges, batch-size splits, next-query paging
MySQL → Range splits with LIMIT/OFFSET
MongoDB → Split-Vector, Bucket-Auto, Timestamp
Oracle → DBMS Parallel Execute
Kafka → Partition level reading via consumer group, each consumer processes the partitions assigned to it

Destination-Level Features

Supported Connectors

S3 Parquet Writer → MinIO, S3, GCS
Apache Iceberg :

Catalog Integrations
- AWS Glue
- REST Catalog (Nessie, Polaris, Unity, LakeKeeper, S3 Tables)
- Hive Metastore
- JDBC Catalog
To know more, read OLake Catalog Integration.

Query Engine Compatibility

OLake outputs are immediately queryable in any Iceberg v2-compatible engine, including:

AWS Athena
Trino
Spark
Flink
Presto
Hive
Snowflake

To know more, read OLake Query Engines Compatibility.

Know More About OLake

🚀 Curious how OLake performs? Check out our Benchmarks

🔍 Dive deeper into OLake Features, read here Features

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!

Welcome to OLake

Introduction to OLake

What is OLake?

Why OLake?

OLake Features Overview

Source-Level Features

Supported Connectors

Optimized Chunking Strategies

Destination-Level Features

Supported Connectors

Catalog Integrations

Query Engine Compatibility

🚀 Curious how OLake performs? Check out our Benchmarks

🔍 Dive deeper into OLake Features, read here Features

💡 Join the OLake Community!

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube

Introduction to OLake​

What is OLake?​

Why OLake?​

OLake Features Overview​

Source-Level Features​

Supported Connectors​

Optimized Chunking Strategies​

Destination-Level Features​

Supported Connectors​

Catalog Integrations​

Query Engine Compatibility​

🚀 Curious how OLake performs? Check out our Benchmarks​

🔍 Dive deeper into OLake Features, read here Features​

💡 Join the OLake Community!

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube

Introduction to OLake

What is OLake?

Why OLake?

OLake Features Overview

Source-Level Features

Supported Connectors

Optimized Chunking Strategies

Destination-Level Features

Supported Connectors

Catalog Integrations

Query Engine Compatibility

🚀 Curious how OLake performs? Check out our Benchmarks

🔍 Dive deeper into OLake Features, read here Features