Skip to main content

Benchmarks

Use the tabs below to view detailed benchmarks per connector. Each tab has a unique URL you can copy/share.

PostgreSQL โ†’ Apache Iceberg Connector Benchmarkโ€‹

(OLake vs. Popular Data-Movement Tools)

1. Speed Comparison โ€“ Full-Load Performanceโ€‹

ToolRows SyncedThroughput (rows / sec)Relative to OLake
OLake4.01 B46,262 RPSโ€“
Fivetran4.01 B46,395 RPSParity (โ‰ค1 % faster)
Debezium (memiiso)1.28 B14,839 RPS3.1 ร— slower
Estuary0.34 B3,982 RPS11.6 ร— slowerยน
Airbyte Cloud12.7 M457 RPS101 ร— slower

ยน Estuary ran the same 24-hour window but processed a ~10ร— smaller dataset, so its throughput looks even lower when normalized.

info
  1. The time elapsed for all the tools was 24 hours, but OLake, Debezium, Estuary and Fivetran were able to process the entire dataset in that time. Airbyte failed with a sync after 7.5 hours, so we only have throughput for the first part of the test.

Key takeaway: OLake sustains the same top-tier bulk-load speed as Fivetran while outpacing every other open-source option by 3-to-100ร—.

2. Speed Comparison โ€“ Change-Data-Capture (CDC)โ€‹

ToolCDC WindowThroughput (rows / sec)Relative to OLake
OLake22.5 min36 982 RPSโ€“
Fivetran31 min26,910 RPS1.4 ร— slower
Debezium (memiiso)60 min13,808 RPS2.7 ร— slower
Estuary4.5 h3,085 RPS12 ร— slower
Airbyte Cloud23 h585 RPS63 ร— slower
info

The rows synced in the CDC test were the same 50 million changes that OLake processed in 22.5 minutes. The other tools were tested on the same dataset, but they had different CDC windows (timings).

Key takeaway: For incremental workloads OLake leads the pack, moving 50 million PostgreSQL changes into Iceberg 40 % faster than Fivetran and 10-60ร— faster than other OSS connectors.

3. Cost Comparison (Vendor List Prices)โ€‹

ToolScenarioSpend (USD)Rows Synced
OLakeFull Load / CDCCost of a Standard D64ls v5 (64 vcpus, 128 GiB memory) running for 24 hours < $754.01 B / 50M
FivetranFull Load$ 0 (free full sync)4.01 B
EstuaryFull Load$ 1,6680.34 B
Airbyte CloudFull Load$ 5,56012.7 M
FivetranCDC$ 2, 375.8050 M
EstuaryCDC$ 17.6350 M
Airbyte CloudCDC$ 148.9550 M
  • OLake is open-source and can be deployed on your own Kubernetes cluster or cloud VMs; you pay only for the compute and storage you provision.

Dataset and Table Schemasโ€‹

Please refer to this GitHub repository for the dataset we used to conduct these benchmarks.

note

We first performed a full-load sync of empty dummy tables. Afterwards, we inserted the top 25 million records from both trips and fhv_trips into these tables and ran a CDC sync.

trips tableโ€‹

CREATE TABLE trips (
id bigserial NOT NULL,
cab_type_id int4 NULL,
vendor_id int4 NULL,
pickup_datetime timestamp NULL,
dropoff_datetime timestamp NULL,
store_and_fwd_flag bool NULL,
rate_code_id int4 NULL,
pickup_longitude numeric NULL,
pickup_latitude numeric NULL,
dropoff_longitude numeric NULL,
dropoff_latitude numeric NULL,
passenger_count int4 NULL,
trip_distance numeric NULL,
fare_amount numeric NULL,
extra numeric NULL,
mta_tax numeric NULL,
tip_amount numeric NULL,
tolls_amount numeric NULL,
ehail_fee numeric NULL,
improvement_surcharge numeric NULL,
congestion_surcharge numeric NULL,
airport_fee numeric NULL,
total_amount numeric NULL,
payment_type int4 NULL,
trip_type int4 NULL,
pickup_nyct2010_gid int4 NULL,
dropoff_nyct2010_gid int4 NULL,
pickup_location_id int4 NULL,
dropoff_location_id int4 NULL,
CONSTRAINT trips_pkey PRIMARY KEY (id)
);

fhv_trips tableโ€‹

CREATE TABLE fhv_trips (
id bigserial NOT NULL,
hvfhs_license_num text NULL,
dispatching_base_num text NULL,
originating_base_num text NULL,
request_datetime timestamp NULL,
on_scene_datetime timestamp NULL,
pickup_datetime timestamp NULL,
dropoff_datetime timestamp NULL,
pickup_location_id int4 NULL,
dropoff_location_id int4 NULL,
trip_miles numeric NULL,
trip_time numeric NULL,
base_passenger_fare numeric NULL,
tolls numeric NULL,
black_car_fund numeric NULL,
sales_tax numeric NULL,
congestion_surcharge numeric NULL,
airport_fee numeric NULL,
tips numeric NULL,
driver_pay numeric NULL,
shared_request bool NULL,
shared_match bool NULL,
access_a_ride bool NULL,
wav_request bool NULL,
wav_match bool NULL,
legacy_shared_ride int4 NULL,
affiliated_base_num text NULL,
CONSTRAINT fhv_trips_pkey PRIMARY KEY (id)
);
note

We used AWS Glue as Iceberg catalog and AWS S3 as the storage layer on the destination side for this benchmarks.

Bottom line: If you need to land terabytes of PostgreSQL data into Apache Iceberg quicklyโ€”and keep it continually up-to-dateโ€”OLake delivers enterprise-grade speed without the enterprise-grade bill.



๐Ÿ’ก Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
๐Ÿ‘‰ Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. ๐Ÿš€

Your success with OLake is our priority. Donโ€™t hesitate to contact us if you need any help or further clarification!