Last updated:11/20/2025|... min read

Open OLake doc issue

Benchmarks

Use the tabs below to view detailed benchmarks per connector. Each tab has a unique URL you can copy/share.

Postgres
MongoDB
MySQL
Oracle

PostgreSQL to Apache Iceberg Connector Benchmark

Benchmark Environment

This benchmark uses standard NYC Taxi Data trips and fhv_trips tables.
The original repo ingests data into local postgres, so we have modified it to ingest into remote cloud postgres (Azure Flexible DB) NYC Taxi Data
Total rows 4,008,587,913 rows including both tables.
The average row size is 144 bytes for trips and 121 bytes for fhv_trips.
OLake & Debezium were run on Azure Standard D64ls v5 VM (64 vCPUs, 128 GiB memory), other platforms are used as a cloud offering (Fivetran, Estuary, Airbyte)
Database instance: Azure Standard_D32ads_v5 (32 vCores, 128 GiB Memory, 51200 max IOPS)

(OLake vs. Popular Data-Movement Tools)

1. Speed Comparison – Full-Load Performance

Tool	Rows Synced	Throughput (rows / sec)	Relative to OLake
OLake (as of 14th Sept 2025)	4.01 B	3,19,562 RPS	–
Fivetran (as of 30th Apr 2025)	4.01 B	46,395 RPS	6.8 × slower
Debezium (memiiso) (as of 30th Apr 2025)	1.28 B	14,839 RPS	21.5 × slower
Estuary (as of 30th Apr 2025)	0.34 B	3,982 RPS	80.2 × slower
Airbyte Cloud (as of 30th Apr 2025)	12.7 M	457 RPS	699.2 × slower

¹ Estuary ran the same 24-hour window but processed a ~10× smaller dataset, so its throughput looks even lower when normalized.

Memory usage (OLake) - Standard D64ls v5 (64 vcpus, 128 GiB Memory)

Memory Stats	Usage (GB)
Min	3.54
Max	58.85
Mean	44.10

OLake maintains high throughput while keeping memory usage efficient.

info

The time elapsed for all the tools was 24 hours, but OLake and Fivetran were able to process the entire dataset in that time. Airbyte failed with a sync after 7.5 hours, so we only have throughput for the first part of the test.

Key takeaway: OLake now delivers upto 7x faster bulk-load performance than Fivetran, while outpacing every other open-source alternative by 21x to over 600x.

2. Speed Comparison – Change-Data-Capture (CDC)

Tool	CDC Window	Throughput (rows / sec)	Relative to OLake
OLake	20.1 min	41,390 RPS	–
Fivetran	31 min	26,910 RPS	1.5 × slower
Debezium (memiiso)	60 min	13,808 RPS	2.9 × slower
Estuary	4.5 h	3,085 RPS	13.4 × slower
Airbyte Cloud	23 h	585 RPS	70.7 × slower

info

The rows synced in the CDC test were the same 50 million changes that OLake processed in 20.1 minutes. The other tools were tested on the same dataset, but they had different CDC windows (timings).

Key takeaway: For incremental workloads OLake leads the pack, moving 50 million PostgreSQL changes into Iceberg 53 % faster than Fivetran and 10-70× faster than other OSS connectors.

3. Cost Comparison (Vendor List Prices)

Tool	Scenario	Spend (USD)	Rows Synced
OLake	Full Load / CDC	Cost of a `Standard D64ls v5 (64 vcpus, 128 GiB memory)` running for 3.5 hours < $ 10	4.01 B / 50M
Fivetran	Full Load	$ 0 (free full sync)	4.01 B
Estuary	Full Load	$ 1,668	0.34 B
Airbyte Cloud	Full Load	$ 5,560	12.7 M
Fivetran	CDC	$ 2, 375.80	50 M
Estuary	CDC	$ 17.63	50 M
Airbyte Cloud	CDC	$ 148.95	50 M

OLake is open-source and can be deployed on your own Kubernetes cluster or cloud VMs; you pay only for the compute and storage you provision.

Dataset and Table Schemas

Please refer to this GitHub repository for the dataset we used to conduct these benchmarks.

note

We first performed a full-load sync of empty dummy tables. Afterwards, we inserted the top 25 million records from both trips and fhv_trips into these tables and ran a CDC sync.

`trips` table

CREATE TABLE trips (
	id bigserial NOT NULL,
	cab_type_id int4 NULL,
	vendor_id int4 NULL,
	pickup_datetime timestamp NULL,
	dropoff_datetime timestamp NULL,
	store_and_fwd_flag bool NULL,
	rate_code_id int4 NULL,
	pickup_longitude numeric NULL,
	pickup_latitude numeric NULL,
	dropoff_longitude numeric NULL,
	dropoff_latitude numeric NULL,
	passenger_count int4 NULL,
	trip_distance numeric NULL,
	fare_amount numeric NULL,
	extra numeric NULL,
	mta_tax numeric NULL,
	tip_amount numeric NULL,
	tolls_amount numeric NULL,
	ehail_fee numeric NULL,
	improvement_surcharge numeric NULL,
	congestion_surcharge numeric NULL,
	airport_fee numeric NULL,
	total_amount numeric NULL,
	payment_type int4 NULL,
	trip_type int4 NULL,
	pickup_nyct2010_gid int4 NULL,
	dropoff_nyct2010_gid int4 NULL,
	pickup_location_id int4 NULL,
	dropoff_location_id int4 NULL,
	CONSTRAINT trips_pkey PRIMARY KEY (id)
);

`fhv_trips` table

CREATE TABLE fhv_trips (
	id bigserial NOT NULL,
	hvfhs_license_num text NULL,
	dispatching_base_num text NULL,
	originating_base_num text NULL,
	request_datetime timestamp NULL,
	on_scene_datetime timestamp NULL,
	pickup_datetime timestamp NULL,
	dropoff_datetime timestamp NULL,
	pickup_location_id int4 NULL,
	dropoff_location_id int4 NULL,
	trip_miles numeric NULL,
	trip_time numeric NULL,
	base_passenger_fare numeric NULL,
	tolls numeric NULL,
	black_car_fund numeric NULL,
	sales_tax numeric NULL,
	congestion_surcharge numeric NULL,
	airport_fee numeric NULL,
	tips numeric NULL,
	driver_pay numeric NULL,
	shared_request bool NULL,
	shared_match bool NULL,
	access_a_ride bool NULL,
	wav_request bool NULL,
	wav_match bool NULL,
	legacy_shared_ride int4 NULL,
	affiliated_base_num text NULL,
	CONSTRAINT fhv_trips_pkey PRIMARY KEY (id)
);

note

We used AWS Glue as Iceberg catalog and AWS S3 as the storage layer on the destination side for this benchmarks.

Bottom line: If you need to land terabytes of PostgreSQL data into Apache Iceberg quickly—and keep it continually up-to-date—OLake delivers enterprise-grade speed without the enterprise-grade bill.

Oracle → Apache Iceberg Connector Benchmark

Oracle data powers your business—so migrations should be fast and seamless. OLake helps you move massive Oracle datasets to Apache Iceberg at high speed, with predictable performance and no vendor lock-in.

Benchmark Environment

This benchmark uses standard NYC Taxi Data trips and fhv_trips tables.
Since the original repo supports only PostgreSQL, we first ingested the NYC Taxi Data in the cloud PostgreSQL database (Azure Flexible DB), and then transferred the tables from there to our Oracle database.
Total rows 4,008,587,913 rows including both tables.
The average row size is 144 bytes for trips and 121 bytes for fhv_trips.
OLake was run on Azure Standard D64ls v5 VM (64 vCPUs, 128 GiB memory)
Database instance: AWS RDS db.r6i.4xlarge (8 vCPUs, 32 GiB Memory)

(OLake's Performance with Oracle Database)

1. Speed Test – Full-Load Performance

Tool	Rows Synced	Throughput (rows / sec)
OLake	4.01 B	2,61,793 RPS

Memory usage (OLake) - Standard D64ls v5 (64 vCPUs, 128 GiB Memory)

Memory Stats	Usage (GB)
Min	4.78
Max	70.88
Mean	52.60

OLake sustains high ingest rates from Oracle with low, predictable memory overhead.

2. Cost at a Glance

Tool	Scenario	Spend (USD)	Rows Synced
OLake	Full Load	Cost of a `Standard D64ls v5 (64 vCPUs, 128 GiB memory)` running for 4.25 hours < $ 12	4.01 B

Dataset and Table Schemas

Please refer to this GitHub repository for the dataset we used to conduct these benchmarks.

note

For Oracle, we performed a full-load sync of the trips and fhv_trips tables into Apache Iceberg.

`trips` table

CREATE TABLE trips (
	id bigserial NOT NULL,
	cab_type_id int4 NULL,
	vendor_id int4 NULL,
	pickup_datetime timestamp NULL,
	dropoff_datetime timestamp NULL,
	store_and_fwd_flag bool NULL,
	rate_code_id int4 NULL,
	pickup_longitude numeric NULL,
	pickup_latitude numeric NULL,
	dropoff_longitude numeric NULL,
	dropoff_latitude numeric NULL,
	passenger_count int4 NULL,
	trip_distance numeric NULL,
	fare_amount numeric NULL,
	extra numeric NULL,
	mta_tax numeric NULL,
	tip_amount numeric NULL,
	tolls_amount numeric NULL,
	ehail_fee numeric NULL,
	improvement_surcharge numeric NULL,
	congestion_surcharge numeric NULL,
	airport_fee numeric NULL,
	total_amount numeric NULL,
	payment_type int4 NULL,
	trip_type int4 NULL,
	pickup_nyct2010_gid int4 NULL,
	dropoff_nyct2010_gid int4 NULL,
	pickup_location_id int4 NULL,
	dropoff_location_id int4 NULL,
	CONSTRAINT trips_pkey PRIMARY KEY (id)
);

`fhv_trips` table

CREATE TABLE fhv_trips (
	id bigserial NOT NULL,
	hvfhs_license_num text NULL,
	dispatching_base_num text NULL,
	originating_base_num text NULL,
	request_datetime timestamp NULL,
	on_scene_datetime timestamp NULL,
	pickup_datetime timestamp NULL,
	dropoff_datetime timestamp NULL,
	pickup_location_id int4 NULL,
	dropoff_location_id int4 NULL,
	trip_miles numeric NULL,
	trip_time numeric NULL,
	base_passenger_fare numeric NULL,
	tolls numeric NULL,
	black_car_fund numeric NULL,
	sales_tax numeric NULL,
	congestion_surcharge numeric NULL,
	airport_fee numeric NULL,
	tips numeric NULL,
	driver_pay numeric NULL,
	shared_request bool NULL,
	shared_match bool NULL,
	access_a_ride bool NULL,
	wav_request bool NULL,
	wav_match bool NULL,
	legacy_shared_ride int4 NULL,
	affiliated_base_num text NULL,
	CONSTRAINT fhv_trips_pkey PRIMARY KEY (id)
);

note

We used AWS Glue as Iceberg catalog and AWS S3 as the storage layer on the destination side for this benchmarks.

Bottom line: If you need to land terabytes of Oracle data into Apache Iceberg quickly—OLake delivers enterprise-grade speed without the enterprise-grade bill.

MongoDB Benchmarks

In the fast-paced world of data management, every second counts. When it comes to syncing massive datasets from MongoDB into a data warehouse or even a lakehouse, you need a tool that is not just reliable but also blazing fast and cost-effective.

This is where OLake comes into picture.

Speed Comparison: Full Load Performance

For a collection of 230 million rows (664.81GB) from Twitter data*, here's how OLake compares to leading competitors:

Tool	Full Load Time	Performance
OLake	46 mins	X times
Fivetran	4 hours 39 mins (279 mins)	6x slower
Airbyte	16 hours (960 mins)	20x slower
Debezium (Embedded)	11.65 hours (699 mins)	15x slower

OLake is up to 20x faster than competitors like Airbyte, significantly reducing the time and resources required for full data syncs.

No more waiting for hours or even days for your data to be loaded into your warehouse. 46 minutes is all you need to process 230 million rows with OLake.

note

We used Debezium server version v2.6.2 for carrying out these benchmarks.

CDC Sync Performance

Testing with 1 Million Rows (2.88GB, 999450 records) across 10 collections showed how efficiently we do it:

Tool	CDC Sync Time	Records per Second (r/s)	Performance
OLake	28.3 sec	35,694 r/s	X times
Fivetran	3 min 10 sec	5,260 r/s	6.7x slower
Airbyte	12 min 44 sec	1,308 r/s	27.3x slower
Debezium (Embedded)	12 min 44 sec	1,308 r/s	27.3x slower

OLake processes 1 million records in just 28.3 seconds, achieving 35,694 records per second (r/s), which is 6.7x faster than Fivetran and a 27.3x faster than Airbyte and Debezium (Embedded).

Cost Comparison (Considering 230M first full load & 50M CDC rows per month as of 30th Sep)

When it comes to pricing, OLake is not just faster; it's cost-efficient. Here's the breakdown based on a typical use case involving a 230 million-row first full sync and 50 million CDC rows per month:

Tool	First Full Sync Cost	CDC Sync Cost (Monthly)	Total Monthly Cost	Info	Factor
OLake	10-50 USD	250 USD	300 USD	Heavier instance required only for 1-2 hours	X times
Fivetran	Free	6000 USD	6000 USD	15 min sync frequency, pricing for 50M rows & standard plan	20x costlier
Airbyte	6000 USD	1408 USD	7400 USD	First Load - 1.15 TB data synced	24.6x costlier
Debezium MSK connect + AWS MSK serverless	-	-	100 USD + 800 USD = 900 USD	1.2 TB total data (CDC & first full sync)	3x costlier

OLake offers a total cost of just 300 USD per month, compared to 6000 USD for Fivetran and a staggering 7400 USD for Airbyte.

That's 20x more cost-effective than Fivetran and 24x cheaper than Airbyte.

Why Choose OLake?

Speed: OLake is up to 20x faster than competitors for full data syncs and 27.3x faster for CDC syncs.
Stability: No failed syncs, no downtime. OLake delivers a reliable experience even for the largest datasets.
Cost-Effective: At 300 USD per month, OLake is 20x cheaper than Fivetran and 24x more affordable than Airbyte, with 3x savings against Debezium MSK connect + AWS MSK serverless setup without sacrificing performance.

Testing Infrastructure

The impressive performance metrics of OLake were achieved using a robust infrastructure setup, which included:

Virtual Machine: Standard_D64as_v5
CPU: 64 vCPUs
Memory: 256 GiB RAM
Storage: 250 GB of shared storage

MongoDB Setup:

3 Nodes running in a replica set configuration:
- 1 Primary Node (Master) that handles all write operations.
- 2 Secondary Nodes (Replicas) that replicate data from the primary node.

*Twitter dataset - Archive.org (This JSON dataset has 4 levels of complex nesting).

MySQL → Apache Iceberg Connector Benchmark

Benchmark Environment

This benchmark uses standard NYC Taxi Data trips and fhv_trips tables.
Since the original repo supports only PostgreSQL, we first ingested the NYC Taxi Data in the cloud PostgreSQL database (Azure Flexible DB), and then transferred the tables from there to our MySQL database.
Total rows 4,001,991,536 rows including both tables.
The average row size is 144 bytes for trips and 121 bytes for fhv_trips.
OLake & Debezium were run on AWS EC2 c6i.16xlarge (64 vCPUs, 128 GiB memory)
Database instance: Azure Standard D32as v6 (32 vCPUs, 128 GiB Memory)

(OLake vs. Popular Data-Movement Tool)

1. Speed Comparison – Full-Load Performance

Tool	Rows Synced	Throughput (rows / sec)	Relative to OLake
OLake (as of 14th Nov 2025)	4.0 B	3,38,005 RPS	–
Fivetran (as of 14th Nov 2025)	4.0 B	119,106 RPS	2.83 × slower

Memory usage (OLake) - c6i.16xlarge (64 vCPUs, 128 GiB memory)

Memory Stats	Usage (GB)
Min	3.24
Max	75.1
Mean	48.95

OLake maintains high throughput while keeping memory usage efficient.

2. Speed Comparison – Change-Data-Capture (CDC)

Tool	CDC Window	Throughput (rows / sec)	Relative to OLake
OLake	16.06 min	51,867 RPS	–
Fivetran	29.86 min	27,901 RPS	1.85 × slower

Key takeaway: For incremental workloads OLake leads the pack, moving 50 million MySQL changes into Iceberg 85.9 % faster than Fivetran

3. Cost Comparison (Vendor List Prices)

Tool	Scenario	Spend (USD)	Rows Synced
OLake	Full Load / CDC	Cost of a `c6i.16xlarge (64 vCPUs, 128 GiB memory)` running for 3.3 hours < $ 11	4.0 B / 50 M
Fivetran	Full Load	$ 0 (free full sync)	4.0 B
Fivetran	CDC	$ 2, 375.80	50 M

OLake is open-source and can be deployed on your own Kubernetes cluster or cloud VMs; you pay only for the compute and storage you provision.

Dataset and Table Schemas

Please refer to this GitHub repository for the dataset we used to conduct these benchmarks.

note

For MySQL, we performed a full-load and CDC sync of the trips and fhv_trips tables into Apache Iceberg.

`trips` table

CREATE TABLE trips (
    id BIGINT NOT NULL AUTO_INCREMENT,
    cab_type_id INT NULL,
    vendor_id INT NULL,
    pickup_datetime DATETIME NULL,
    dropoff_datetime DATETIME NULL,
    store_and_fwd_flag TINYINT(1) NULL,
    rate_code_id INT NULL,
    pickup_longitude DECIMAL(10,2) NULL,
    pickup_latitude DECIMAL(10,2) NULL,
    dropoff_longitude DECIMAL(10,2) NULL,
    dropoff_latitude DECIMAL(10,2) NULL,
    passenger_count INT NULL,
    trip_distance DECIMAL(10,2) NULL,
    fare_amount DECIMAL(10,2) NULL,
    extra DECIMAL(10,2) NULL,
    mta_tax DECIMAL(10,2) NULL,
    tip_amount DECIMAL(10,2) NULL,
    tolls_amount DECIMAL(10,2) NULL,
    ehail_fee DECIMAL(10,2) NULL,
    improvement_surcharge DECIMAL(10,2) NULL,
    congestion_surcharge DECIMAL(10,2) NULL,
    airport_fee DECIMAL(10,2) NULL,
    total_amount DECIMAL(10,2) NULL,
    payment_type INT NULL,
    trip_type INT NULL,
    pickup_nyct2010_gid INT NULL,
    dropoff_nyct2010_gid INT NULL,
    pickup_location_id INT NULL,
    dropoff_location_id INT NULL,
    PRIMARY KEY (id)
);

`fhv_trips` table

CREATE TABLE fhv_trips (
    id BIGINT NOT NULL AUTO_INCREMENT,
    hvfhs_license_num TEXT NULL,
    dispatching_base_num TEXT NULL,
    originating_base_num TEXT NULL,
    request_datetime DATETIME NULL,
    on_scene_datetime DATETIME NULL,
    pickup_datetime DATETIME NULL,
    dropoff_datetime DATETIME NULL,
    pickup_location_id INT NULL,
    dropoff_location_id INT NULL,
    trip_miles DECIMAL(10,2) NULL,
    trip_time DECIMAL(10,2) NULL,
    base_passenger_fare DECIMAL(10,2) NULL,
    tolls DECIMAL(10,2) NULL,
    black_car_fund DECIMAL(10,2) NULL,
    sales_tax DECIMAL(10,2) NULL,
    congestion_surcharge DECIMAL(10,2) NULL,
    airport_fee DECIMAL(10,2) NULL,
    tips DECIMAL(10,2) NULL,
    driver_pay DECIMAL(10,2) NULL,
    shared_request TINYINT(1) NULL,
    shared_match TINYINT(1) NULL,
    access_a_ride TINYINT(1) NULL,
    wav_request TINYINT(1) NULL,
    wav_match TINYINT(1) NULL,
    legacy_shared_ride INT NULL,
    affiliated_base_num TEXT NULL,
    PRIMARY KEY (id)
);

note

We used AWS Glue as Iceberg catalog and AWS S3 as the storage layer on the destination side for this benchmarks.

Bottom line: If you need to land terabytes of MySQL data into Apache Iceberg quickly—OLake delivers enterprise-grade speed without the enterprise-grade bill.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!

Benchmarks

PostgreSQL to Apache Iceberg Connector Benchmark

1. Speed Comparison – Full-Load Performance

2. Speed Comparison – Change-Data-Capture (CDC)

3. Cost Comparison (Vendor List Prices)

Dataset and Table Schemas

`trips` table

`fhv_trips` table

Oracle → Apache Iceberg Connector Benchmark

1. Speed Test – Full-Load Performance

2. Cost at a Glance

Dataset and Table Schemas

`trips` table

`fhv_trips` table

MongoDB Benchmarks

Speed Comparison: Full Load Performance

CDC Sync Performance

Cost Comparison (Considering 230M first full load & 50M CDC rows per month as of 30th Sep)

Why Choose OLake?

Testing Infrastructure

MongoDB Setup:

MySQL → Apache Iceberg Connector Benchmark

1. Speed Comparison – Full-Load Performance

2. Speed Comparison – Change-Data-Capture (CDC)

3. Cost Comparison (Vendor List Prices)

Dataset and Table Schemas

`trips` table

`fhv_trips` table

💡 Join the OLake Community!

GitHub

Slack

Twitter

LinkedIn

YouTube

PostgreSQL to Apache Iceberg Connector Benchmark​

1. Speed Comparison – Full-Load Performance​

2. Speed Comparison – Change-Data-Capture (CDC)​

3. Cost Comparison (Vendor List Prices)​

Dataset and Table Schemas​

trips table​

fhv_trips table​

Oracle → Apache Iceberg Connector Benchmark​

1. Speed Test – Full-Load Performance​

2. Cost at a Glance​

Dataset and Table Schemas​

trips table​

fhv_trips table​

MongoDB Benchmarks​

Speed Comparison: Full Load Performance​

CDC Sync Performance​

Cost Comparison (Considering 230M first full load & 50M CDC rows per month as of 30th Sep)​

Why Choose OLake?​

Testing Infrastructure​

MongoDB Setup:​

MySQL → Apache Iceberg Connector Benchmark​

1. Speed Comparison – Full-Load Performance​

2. Speed Comparison – Change-Data-Capture (CDC)​

3. Cost Comparison (Vendor List Prices)​

Dataset and Table Schemas​

trips table​

fhv_trips table​

💡 Join the OLake Community!

GitHub

Slack

Twitter

LinkedIn

YouTube

PostgreSQL to Apache Iceberg Connector Benchmark

1. Speed Comparison – Full-Load Performance

2. Speed Comparison – Change-Data-Capture (CDC)

3. Cost Comparison (Vendor List Prices)

Dataset and Table Schemas

`trips` table

`fhv_trips` table

Oracle → Apache Iceberg Connector Benchmark

1. Speed Test – Full-Load Performance

2. Cost at a Glance

Dataset and Table Schemas

`trips` table

`fhv_trips` table

MongoDB Benchmarks

Speed Comparison: Full Load Performance

CDC Sync Performance

Cost Comparison (Considering 230M first full load & 50M CDC rows per month as of 30th Sep)

Why Choose OLake?

Testing Infrastructure

MongoDB Setup:

MySQL → Apache Iceberg Connector Benchmark

1. Speed Comparison – Full-Load Performance

2. Speed Comparison – Change-Data-Capture (CDC)

3. Cost Comparison (Vendor List Prices)

Dataset and Table Schemas

`trips` table

`fhv_trips` table