Skip to main content

Benchmarks

PostgreSQL → Apache Iceberg Connector Benchmark

(OLake vs. Popular Data-Movement Tools)

1. Speed Comparison – Full-Load Performance

ToolRows SyncedThroughput (rows / sec)Relative to OLake
OLake4.01 B46,262 RPS
Fivetran4.01 B46,395 RPSParity (≤1 % faster)
Debezium (memiiso)1.28 B14,839 RPS3.1 × slower
Estuary0.34 B3,982 RPS11.6 × slower¹
Airbyte Cloud12.7 M457 RPS101 × slower

¹ Estuary ran the same 24-hour window but processed a ~10× smaller dataset, so its throughput looks even lower when normalized.

info
  1. The time elapsed for all the tools was 24 hours, but OLake, Debezium, Estuary and Fivetran were able to process the entire dataset in that time. Airbyte failed with a sync after 7.5 hours, so we only have throughput for the first part of the test.

Key takeaway: OLake sustains the same top-tier bulk-load speed as Fivetran while outpacing every other open-source option by 3-to-100×.

2. Speed Comparison – Change-Data-Capture (CDC)

ToolCDC WindowThroughput (rows / sec)Relative to OLake
OLake22.5 min36 982 RPS
Fivetran31 min26,910 RPS1.4 × slower
Debezium (memiiso)60 min13,808 RPS2.7 × slower
Estuary4.5 h3,085 RPS12 × slower
Airbyte Cloud23 h585 RPS63 × slower
info

The rows synced in the CDC test were the same 50 million changes that OLake processed in 22.5 minutes. The other tools were tested on the same dataset, but they had different CDC windows (timings).

Key takeaway: For incremental workloads OLake leads the pack, moving 50 million PostgreSQL changes into Iceberg 40 % faster than Fivetran and 10-60× faster than other OSS connectors.

3. Cost Comparison (Vendor List Prices)

ToolScenarioSpend (USD)Rows Synced
OLakeFull Load / CDCCost of a Standard D64ls v5 (64 vcpus, 128 GiB memory) running for 24 hours < $754.01 B / 50M
FivetranFull Load$ 0 (free full sync)4.01 B
EstuaryFull Load$ 1,6680.34 B
Airbyte CloudFull Load$ 5,56012.7 M
FivetranCDC$ 2, 375.8050 M
EstuaryCDC$ 17.6350 M
Airbyte CloudCDC$ 148.9550 M
  • OLake is open-source and can be deployed on your own Kubernetes cluster or cloud VMs; you pay only for the compute and storage you provision.

Footnotes

  1. Airbyte: Please find attached data for the Airbyte issues we faced during the test - here.

Dataset and Table Schemas

Please refer to this GitHub repository for the dataset we used to conduct these benchmarks.

note

We first performed a full-load sync of empty dummy tables. Afterwards, we inserted the top 25 million records from both trips and fhv_trips into these tables and ran a CDC sync.

trips table

CREATE TABLE trips (
id bigserial NOT NULL,
cab_type_id int4 NULL,
vendor_id int4 NULL,
pickup_datetime timestamp NULL,
dropoff_datetime timestamp NULL,
store_and_fwd_flag bool NULL,
rate_code_id int4 NULL,
pickup_longitude numeric NULL,
pickup_latitude numeric NULL,
dropoff_longitude numeric NULL,
dropoff_latitude numeric NULL,
passenger_count int4 NULL,
trip_distance numeric NULL,
fare_amount numeric NULL,
extra numeric NULL,
mta_tax numeric NULL,
tip_amount numeric NULL,
tolls_amount numeric NULL,
ehail_fee numeric NULL,
improvement_surcharge numeric NULL,
congestion_surcharge numeric NULL,
airport_fee numeric NULL,
total_amount numeric NULL,
payment_type int4 NULL,
trip_type int4 NULL,
pickup_nyct2010_gid int4 NULL,
dropoff_nyct2010_gid int4 NULL,
pickup_location_id int4 NULL,
dropoff_location_id int4 NULL,
CONSTRAINT trips_pkey PRIMARY KEY (id)
);
  • Column count: 29
  • Type mix: 1 x bigserial , 10 x int4 , 2 x timestamp , 1 x bool , 15 x numeric

fhv_trips table

CREATE TABLE fhv_trips (
id bigserial NOT NULL,
hvfhs_license_num text NULL,
dispatching_base_num text NULL,
originating_base_num text NULL,
request_datetime timestamp NULL,
on_scene_datetime timestamp NULL,
pickup_datetime timestamp NULL,
dropoff_datetime timestamp NULL,
pickup_location_id int4 NULL,
dropoff_location_id int4 NULL,
trip_miles numeric NULL,
trip_time numeric NULL,
base_passenger_fare numeric NULL,
tolls numeric NULL,
black_car_fund numeric NULL,
sales_tax numeric NULL,
congestion_surcharge numeric NULL,
airport_fee numeric NULL,
tips numeric NULL,
driver_pay numeric NULL,
shared_request bool NULL,
shared_match bool NULL,
access_a_ride bool NULL,
wav_request bool NULL,
wav_match bool NULL,
legacy_shared_ride int4 NULL,
affiliated_base_num text NULL,
CONSTRAINT fhv_trips_pkey PRIMARY KEY (id)
);
  • Column count: 27
  • Type mix: 1 x bigserial , 4 x text , 4 x timestamp , 3 x int4 , 10 x numeric , 5 x bool

Average row size & storage footprint

Sync ModeTableRowsRaw CSV sizeSize ÷ rows
Full Loadtrips + fhv_trips≈ 3.96 B≈ 585 GB un-compressed≈ 158.62 Byte/row
CDCtrips + fhv_trips≈ 50 M≈ 6.8 GB un-compressed≈ 60.13 Byte/row
note

We used AWS Glue as Iceberg catalog and AWS S3 as the storage layer on the destination side for this benchmarks.

What These Numbers Mean for You

  • Peak throughput without lock-in: OLake matches or beats proprietary SaaS speeds while letting you keep data and infrastructure in your own account.
  • Superior CDC latency: Faster change propagation means fresher downstream analytics and near-real-time feature generation for ML.
  • Predictable TCO: Because OLake is self-hosted, you scale resources up or down to hit your desired SLA at the lowest cloud cost—no opaque credit systems.
  • Resource profile: In these tests OLake used 57.6 GB RAM (roughly one Standard D64ls v5 (64 vcpus, 128 GiB memory) VM) for both full-load and CDC runs; adjust sizing linearly with your workload.

Bottom line: If you need to land terabytes of PostgreSQL data into Apache Iceberg quickly—and keep it continually up-to-date—OLake delivers enterprise-grade speed without the enterprise-grade bill.


Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!