OLake vs AWS DMS | Data Migration Performance & Cost 2025

Last updated:10/17/2025|... min read

OLake vs AWS DMS Benchmark

In today's data-driven world, organizations are migrating massive datasets from traditional databases to cloud storage. But here's the million-dollar question: Which tool can handle multi billion records efficiently without breaking the bank?

The following benchmark evaluates performance, environment configuration, and cost considerations for migrating PostgreSQL data to Parquet (AWS S3) using AWS Database Migration Service (DMS) and OLake.

Workload

Dataset: This benchmark uses two large tables from NYC Taxi Data trips2 and fhv_trips. For details on how the data is generated, check here: How the data is generated!
Total rows: 4,008,587,913 rows including both tables
Modes evaluated: We are performing Full Refresh and CDC workload (50 million rows)

Environment

Source database: Azure PostgreSQL (32 vCores, 128 GB RAM)
Destination: Amazon S3 (Parquet)
Migration compute: c6i.16xlarge (64 vCPU, 128 GB RAM)
Reference compute price: ~$3.07/hour

note

The same dataset and compute were used for both the tools.

Full Refresh Results

The full-load test evaluates the time and throughput required to transfer the complete dataset from the source PostgreSQL database to Parquet files in S3, establishing a baseline for bulk data movement under sustained load conditions.

Tool	Rows Processed	Total Time	Avg Throughput (rows/sec)
OLake	4,008,587,913	1 h 59 m	558,765
AWS DMS	4,008,587,913	9 h 8 m	122,000

Key observations:

OLake achieved a 4.6× faster full refresh performance compared to AWS DMS.
OLake eliminates the need for manual partition-boundary scripting, while DMS relies on manual boundary generation to achieve PostgreSQL parallelization.

Parallelism used:

OLake ran with 32 threads.
DMS ran with: 40 parallel tasks.

CDC Results (Insert-only; 50M rows)

The CDC test evaluates the sustained ingestion of incremental changes from PostgreSQL into Parquet, validating plugin configuration, replication slots, and overall throughput. Each of the tables trips2 and fhv_trips will ingest 25 million records.

Tool	Rows Processed	Total Time	Avg Throughput (rows/sec)
AWS DMS	50,000,000	22 m 41 s	36,738
OLake	50,000,000	16 m 24 s	50,812

Key Observations:

OLake performed the CDC workload approximately 1.38× faster than AWS DMS.
OLake requires minimal configuration , selecting the Full Refresh + CDC Sync Mode option is sufficient, whereas DMS requires enabling pglogical via Azure extensions, performing a detailed pglogical setup, and specifying the starting LSN for replication tasks.

Parallelism used:

OLake ran with 32 threads.
DMS ran with: 40 parallel tasks.

Resource Utilization

We’re focusing on resource utilization for the Full Refresh process, as it’s significantly more resource-intensive than CDC.

The memory utilization shown corresponds to a transfer of ~4 billion records. As the data volumes increase, being memory efficiency becomes highly crucial — an area where OLake excels.

Memory Stats	OLake	DMS
Min	4.19 GB	20 GB
Max	50.55 GB	40 GB
Mean	31.76 GB	30 GB

OLake's memory usage ranged from 4.19 GB to 50.55 GB, averaging 31.76 GB throughout the transfer. For DMS, the memory usage ranged from 20 GB to 40 GB, averaging 30 GB throughout the transfer.

Cost Comparison (Compute Only)

Compute costs scale linearly with runtime at a given instance class for both OLake and DMS.

Here is the cost comparison for Full Refresh process.

Scenario	Instance	Runtime	Approx. Cost
DMS Full Refresh	c6i.16xlarge	9h 08m	~$28.03
OLake Full Refresh	c6i.16xlarge	1h 59m	~$6.08

Important-

OLake is delivering 4.61x cost savings as compared to DMS on compute alone. When you're moving terabytes monthly, those savings add up fast!

Scaling Impact-

Let us take a closer look at how the costs evolve over time for both the tools.

Assume that the same dataset is migrated "once daily" for the durations specified below.

Tool	1 Month Cost	2 Months Cost	6 Months Cost
DMS	~$840	~$1,681	~$5045
OLake	~$182	~$364	~$1094

The cost trends over time make it evident which tool is more cost-efficient. Considering, typical workflows involve more than one daily sync, so the costs would increase from the baseline shown here.

Point to Remember

OLake is open source and incurs no licensing fees ; costs depend solely on the user’s infrastructure and storage consumption

Dataset and Table Schemas

The OLake benchmarks page provides NYC Taxi table schemas designed to support both bulk transfer and CDC scenarios at scale, ensuring comparability across different tools.

The dataset and reproducible setup are available in the GitHub repository: NYC Taxi Data Benchmark.

The repository includes scripts and instructions to generate the NYC trips tables used in this benchmark, enabling users to replicate the setup and results.

`trips2` table

Schema for the trips2 table used in this benchmark:

CREATE TABLE public.trips2 (
    id BIGSERIAL NOT NULL,
    cab_type_id INT NULL,
    vendor_id INT NULL,
    pickup_datetime TIMESTAMP NULL,
    dropoff_datetime TIMESTAMP NULL,
    store_and_fwd_flag BOOLEAN NULL,
    rate_code_id INT NULL,
    pickup_longitude NUMERIC NULL,
    pickup_latitude NUMERIC NULL,
    dropoff_longitude NUMERIC NULL,
    dropoff_latitude NUMERIC NULL,
    passenger_count INT NULL,
    trip_distance NUMERIC NULL,
    fare_amount NUMERIC NULL,
    extra NUMERIC NULL,
    mta_tax NUMERIC NULL,
    tip_amount NUMERIC NULL,
    tolls_amount NUMERIC NULL,
    ehail_fee NUMERIC NULL,
    improvement_surcharge NUMERIC NULL,
    congestion_surcharge NUMERIC NULL,
    airport_fee NUMERIC NULL,
    total_amount NUMERIC NULL,
    payment_type INT NULL,
    trip_type INT NULL,
    pickup_nyct2010_gid INT NULL,
    dropoff_nyct2010_gid INT NULL,
    pickup_location_id INT NULL,
    dropoff_location_id INT NULL,
    CONSTRAINT trips2_pkey PRIMARY KEY (id)
);

`fhv_trips` table

Schema for the fhv_trips table used in this benchmark:

CREATE TABLE fhv_trips (
id bigserial NOT NULL,
hvfhs_license_num text NULL,
dispatching_base_num text NULL,
originating_base_num text NULL,
request_datetime timestamp NULL,
on_scene_datetime timestamp NULL,
pickup_datetime timestamp NULL,
dropoff_datetime timestamp NULL,
pickup_location_id int4 NULL,
dropoff_location_id int4 NULL,
trip_miles numeric NULL,
trip_time numeric NULL,
base_passenger_fare numeric NULL,
tolls numeric NULL,
black_car_fund numeric NULL,
sales_tax numeric NULL,
congestion_surcharge numeric NULL,
airport_fee numeric NULL,
tips numeric NULL,
driver_pay numeric NULL,
shared_request bool NULL,
shared_match bool NULL,
access_a_ride bool NULL,
wav_request bool NULL,
wav_match bool NULL,
legacy_shared_ride int4 NULL,
affiliated_base_num text NULL,
CONSTRAINT fhv_trips_pkey PRIMARY KEY (id)
);

Implementation Considerations

DMS PostgreSQL full loads may require manual task mapping with partition boundaries to achieve higher parallelism.
DMS CDC for Azure PostgreSQL required enabling pglogical via azure.extensions and configuring replication sets with aligned start LSN.
OLake parallelization and schema handling are automatic; no manual boundary generation was required in this benchmark.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!

OLake vs AWS DMS Benchmark

Workload

Environment

Full Refresh Results

CDC Results (Insert-only; 50M rows)

Resource Utilization

Cost Comparison (Compute Only)

Dataset and Table Schemas

`trips2` table

`fhv_trips` table

Implementation Considerations

💡 Join the OLake Community!

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube

OLake vs AWS DMS Benchmark​

Workload​

Environment​

Full Refresh Results​

CDC Results (Insert-only; 50M rows)​

Resource Utilization​

Cost Comparison (Compute Only)​

Dataset and Table Schemas​

trips2 table​

fhv_trips table​

Implementation Considerations​

💡 Join the OLake Community!

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube

OLake vs AWS DMS Benchmark

Workload

Environment

Full Refresh Results

CDC Results (Insert-only; 50M rows)

Resource Utilization

Cost Comparison (Compute Only)

Dataset and Table Schemas

`trips2` table

`fhv_trips` table

Implementation Considerations