OLakeFeaturesTLDR

Last updated:10/30/2025|... min read

Edit this page

Open OLake issues

Open OLake doc issue

1. OLake native features

Open data formats – OLake writes raw Parquet and fully ACID snapshots in Apache Iceberg so your lakehouse stays engine-agnostic.
Iceberg writer – The dedicated Iceberg writer produces exactly-once, rollback-ready Iceberg v2 tables.
Smart partitioning – We support both Iceberg partitioning rules andAWS S3 partitioning for Parquet for fast scans and efficient querying.
Parallelised chunking – OLake splits big tables into smaller chunks, slashing total faster processing and parallel execution.
Change Data Capture – We capture WALs for Postgres, binlogs for MySQL and oplogs for MongoDB in near real-time to keep the lake fresh without reloads.
Schema evolution & datatype changes – Column adds, drops and type promotions are auto-detected and written per Iceberg v2 spec, so pipelines never break.
Stateful, resumable syncs – If a job crashes (or is paused), OLake resumes from the last committed checkpoint—no manual fixes needed.
Back-off & retries – OLake supports backoff retry count, meaning, if a sync fails, it will retry the sync after a certain period of time.
Synchronization Modes: OLake supports both full, CDC (Change Data Capture) and strict CDC (Tracks only new changes from the current position in the MongoDB change stream, without performing an initial backfill) synchronization modes. Incremental sync is a work in progress and will be released soon.
Level-1 JSON flattening – Nested JSON fields are optionally expanded into top-level columns for easier SQL.
Airflow-first orchestration – Drop our Docker images into your DAGs (EC2 or Kubernetes) and drive syncs via Airflow.
Developer playground – A one-click OLake + Iceberg sandbox lets you experiment locally with Trino, Spark, Flink or Snowflake readers.

2. Source-level features

PostgreSQL – Full loads and pgoutput-based CDC for RDS, Aurora, Supabase, etc. See the Postgres connector.
MySQL – Full loads plus binlog CDC for MySQL RDS, Aurora and older community versions. See the MySQL connector.
MongoDB – High-throughput oplog capture for sharded or replica-set clusters. See the MongoDB connector.
Optimised chunking strategies –
- MongoDB – Split-Vector, Bucket-Auto & Timestamp; details in the blog What Makes OLake Fast.
- MySQL – Range splits driven by LIMIT/OFFSET next-query logic.
- Postgres – CTID ranges, batch-size column splits or next-query paging.
Work-in-progress connectors –
- S3 source – GitHub issue #86
- Kafka – PR #339
We map each of database datatypes to respective Iceberg datatypes so your source schema remains more or less unaffected by the type conversion from a source database to Iceberg. See for:

3. Destination-level features

Apache Iceberg – Primary target format, can write to AWS S3, Azure, GCS: see Iceberg Writer docs.
Catalog options:
- AWS Glue – Glue catalog guide
- REST catalog – REST guide
- Hive Metastore – Hive catalog guide
- JDBC – JDBC catalog guide
- S3 Table Bucket (SigV4) – S3 Tables Guide
- Unity Catalog (Databricks) – Unity Catalog Guide
Plain Parquet – Write partitioned Parquet to S3 or Google Cloud Storage (GCS) with the Parquet Writer.
Query engines – Any Iceberg v2-aware tool (Trino, Spark, Flink, Snowflake, etc.) can query OLake outputs immediately. See here for more

4. Upcoming features

Telemetry hooks – Segment IO & Mixpanel integration (PR #290).
Universal MySQL CDC – MariaDB, Percona & TiDB support (PR #359).
Incremental sync – Rolling out for MongoDB PR #268, then Postgres and MySQL.
Roadmap tracker – See the live OLake roadmap.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!

1. OLake native features

2. Source-level features

3. Destination-level features

4. Upcoming features

💡 Join the OLake Community!

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube

1. OLake native features​

2. Source-level features​

3. Destination-level features​

4. Upcoming features​

💡 Join the OLake Community!

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube

1. OLake native features

2. Source-level features

3. Destination-level features

4. Upcoming features