Last updated:6/27/2025|... min read

Postgres Source

Postgres Source enables data synchronization from Postgres to your desired destination.

info

OLake UI is live (beta)! You can now use the UI to configure your Postgres source, discover streams, and sync data. Check it out at OLake UI regarding how to setup using Docker Compose and running it locally.

olake-source-postgres

Now, you can use the UI to configure your Postgres source, discover streams, and sync data.

Use OLake UI for Postgres
Use OLake CLI for Postgres

Create a Postgres Source in OLake UI

Follow the steps below to get started with the Postgres Source using the OLake UI (assuming the OLake UI is running locally on localhost:8000):

Navigate to Sources Tab.
Click on + Create Source.
Select Postgres as the source type from Connector type.
Fill in the required connection details in the form. For details regarding the connection details, refer to the Postgres Source Configuration section on the right side of UI.
Click on Create ->
OLake will test the source connection and display the results. If the connection is successful, you will see a success message. If there are any issues, OLake will provide error messages to help you troubleshoot.

This will create a Postgres source in OLake, now you can use this source in your Jobs Pipeline to sync data from Postgres to Apache Iceberg or AWS S3.

Edit Postgres Source in OLake UI

To edit an existing Postgres source in OLake UI, follow these steps:

Navigate to the Sources Tab.
Locate the Postgres source you want to edit from Active Sources or Inactive Sources tabs or using the search bar.
Click on the Edit button next to the source from the Actions tab (3 dots).
Update the connection details as needed in the form and Click on Save Changes.

caution

Editing a source can break pipeline.

You will see a notification saying "Due to the editing, the jobs are going to get affected".

Editing this source will affect the following jobs that are associated with this source and as a result will fail immediately. Do you still want to edit the source?

olake-source-edit-2

OLake will test the updated source connection once you hit confirm on the Source Editing Caution Modal. If the connection is successful, you will see a success message. If there are any issues, we will provide error messages to help you troubleshoot.

Jobs Associated with Postgres Source

In the Source Edit page, you can see the list of jobs that are associated with this source. You can also see the status of each job, whether it is running, failed, or completed and can pause the job from the same screen as well.

olake-source-associated-job-1

Delete Postgres Source in OLake UI

To delete an existing Postgres source in OLake UI, follow these steps:

Navigate to the Sources Tab.
Locate the Postgres source you want to delete from Active Sources or Inactive Sources tabs or using the search bar.
Click on the Delete button next to the source from the Actions tab (3 dots).

olake-source-delete-2

A confirmation dialog will appear asking you to confirm the deletion.
Click on Delete to confirm the deletion.

olake-source-delete-1

This will remove the Postgres source from OLake.

note

You can also delete a source from the Source Edit page by clicking on the Delete button at the bottom of the page.

To sync data TLDR:

Create a source.json with your Postgres connection details.
Create a destination.json with your Writer (Apache Iceberg / AWS S3 / Azure ADLS / Google Cloud Storage) connection details.
Run discover to generate a streams.json of available streams.
Run sync to replicate data to your specified destination.

Features

Resumable Full Load: Supports restarting full loads without reprocessing all data.
Fast Full Load via Chunking: • CTID-based chunking • Single split-column chunking • Evenly distributed chunks (for int/float PKs) • Uneven chunking support
WAL2JSON-based CDC Sync for incremental updates.

caution

Currently plugin of pgoutput as replication slot is not supported, only WAL2JSON is supported.

New streams are automatically synced when added.
LSN mismatch handling triggers a full resync.

Postgres Driver

The Postgres Driver enables data synchronization from Postgres to your desired destination.

Below is an overview of the supported modes and writers for Postgres data replication, along with tables summarizing the details.

Supported Modes

Our replication process supports various modes to fit different data ingestion needs.

The Full Refresh mode retrieves the entire dataset from Postgres and is ideal for initial data loads or when a complete dataset copy is required.

In contrast, CDC (Change Data Capture) continuously tracks and synchronizes incremental changes in real time, making sure that your destination remains updated with minimal latency.

The Incremental mode is currently under development (WIP).

Mode	Description
Full Refresh	Fetches the complete dataset from Postgres.
CDC (Change Data Capture)	Tracks and syncs incremental changes from Postgres in real time.
Strict CDC (Change Data Capture)	Tracks only new changes from the current position in the Postgres WAL, without performing an initial backfill.
Incremental	No, WIP

Supported Destinations

Destination	Supported	Docs	Comments
Apache Iceberg	Yes	Link
AWS S3	Yes	Link	Supports both plain-Parquet and Iceberg format writes; requires `aws_access_key` / IAM role.
Azure	Yes	Link
Google Cloud Storage	Yes	Link (Iceberg) Link (Parquet)	Any S3 protocol compliant object store can work with OLake
Local Filesystem	Yes	Link

Setup and Configuration

To run the Postgres Driver, configure the following files with your specific credentials and settings:

source.json: Postgres connection details.
streams.json: List of collections and fields to sync (generated using the Discover command).
write.json: Configuration for the destination where the data will be written.

Place these files in your project directory before running the commands.

Source File

Add Postgres credentials in following format in source.json file as shown here.

Commands

Discover Command

The Discover command generates json content for streams.json file, which defines the schema of the collections to be synced.

Usage

To run the Discover command, use the following syntax

OLake Docker
Locally run OLake

macOS / Linux
CMD
Powershell

docker run --pull=always  \
  -v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
  olakego/source-postgres:latest \
  discover \
  --config /mnt/config/source.json

docker run --pull=always  ^
  -v "%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY:/mnt/config" ^
  olakego/source-postgres:latest ^
  discover ^
  --config /mnt/config/source.json

docker run --pull=always  `
  -v "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY:/mnt/config" `
  olakego/source-postgres:latest `
  discover `
  --config /mnt/config/source.json

macOS / Linux
CMD
Powershell

OLAKE_BASE_PATH="$HOME/PATH_TO_OLAKE_DIRECTORY/olake/drivers/postgres/config" && \
./build.sh driver-postgres discover \
  --config "$OLAKE_BASE_PATH/source.json"

set "OLAKE_BASE_PATH=%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY\olake\drivers\postgres\config" && ^
./build.sh driver-postgres discover ^
  --config "%OLAKE_BASE_PATH%\source.json"

$OLAKE_BASE_PATH = "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY\olake\drivers\postgres\config"; `
./build.sh driver-postgres discover `
  --config "$OLAKE_BASE_PATH\source.json"

Streams File

After executing the Discover command, a streams.json file is created. Read more about Streams File here.

Writer File

Read about about :

Sync Command

The Sync command fetches data from Postgres and ingests it into the destination.

OLake Docker
Locally run OLake

macOS / Linux
CMD
Powershell

docker run --pull=always  \
  -v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
  olakego/source-postgres:latest \
  sync \
  --config /mnt/config/source.json \
  --catalog /mnt/config/streams.json \
  --destination /mnt/config/destination.json

docker run --pull=always  ^
  -v "%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY:/mnt/config" ^
  olakego/source-postgres:latest ^
  sync ^
  --config /mnt/config/source.json ^
  --catalog /mnt/config/streams.json ^
  --destination /mnt/config/destination.json

docker run --pull=always  `
  -v "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY:/mnt/config" `
  olakego/source-postgres:latest `
  sync `
  --config /mnt/config/source.json `
  --catalog /mnt/config/streams.json `
  --destination /mnt/config/destination.json

macOS / Linux
CMD
Powershell

OLAKE_BASE_PATH="$HOME/PATH_TO_OLAKE_DIRECTORY/olake/drivers/postgres/config" && \
./build.sh driver-postgres sync \
  --config "$OLAKE_BASE_PATH/source.json" \
  --catalog "$OLAKE_BASE_PATH/streams.json" \
  --destination "$OLAKE_BASE_PATH/destination.json"

set "OLAKE_BASE_PATH=%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY\olake\drivers\postgres\config" && ^
./build.sh driver-postgres sync ^
  --config "%OLAKE_BASE_PATH%\source.json" ^
  --catalog "%OLAKE_BASE_PATH%\streams.json" ^
  --destination "%OLAKE_BASE_PATH%\destination.json"

$OLAKE_BASE_PATH = "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY\olake\drivers\postgres\config"; `
./build.sh driver-postgres sync `
  --config "$OLAKE_BASE_PATH\source.json" `
  --catalog "$OLAKE_BASE_PATH\streams.json" `
  --destination "$OLAKE_BASE_PATH\destination.json"

To run sync with state:

OLake Docker
Locally run OLake

macOS / Linux
CMD
Powershell

docker run --pull=always  \
  -v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
  olakego/source-postgres:latest \
  sync \
  --config /mnt/config/source.json \
  --catalog /mnt/config/streams.json \
  --destination /mnt/config/destination.json \
  --state /mnt/config/state.json

docker run --pull=always  ^
  -v "%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY:/mnt/config" ^
  olakego/source-postgres:latest ^
  sync ^
  --config /mnt/config/source.json ^
  --catalog /mnt/config/streams.json ^
  --destination /mnt/config/destination.json ^
  --state /mnt/config/state.json

docker run --pull=always  `
  -v "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY:/mnt/config" `
  olakego/source-postgres:latest `
  sync `
  --config /mnt/config/source.json `
  --catalog /mnt/config/streams.json `
  --destination /mnt/config/destination.json `
  --state /mnt/config/state.json

macOS / Linux
CMD
Powershell

OLAKE_BASE_PATH="$HOME/PATH_TO_OLAKE_DIRECTORY/olake/drivers/postgres/config" && \
./build.sh driver-postgres sync \
  --config "$OLAKE_BASE_PATH/source.json" \
  --catalog "$OLAKE_BASE_PATH/streams.json" \
  --destination "$OLAKE_BASE_PATH/destination.json" \
  --state "$OLAKE_BASE_PATH/state.json"

set "OLAKE_BASE_PATH=%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY\olake\drivers\postgres\config" && ^
./build.sh driver-postgres sync ^
  --config "%OLAKE_BASE_PATH%\source.json" ^
  --catalog "%OLAKE_BASE_PATH%\streams.json" ^
  --destination "%OLAKE_BASE_PATH%\destination.json" ^
  --state "%OLAKE_BASE_PATH%\state.json"

$OLAKE_BASE_PATH = "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY\olake\drivers\postgres\config"; `
./build.sh driver-postgres sync `
  --config "$OLAKE_BASE_PATH\source.json" `
  --catalog "$OLAKE_BASE_PATH\streams.json" `
  --destination "$OLAKE_BASE_PATH\destination.json" `
  --state "$OLAKE_BASE_PATH\state.json"

Find more about state file and its configuration here.

PostgreSQL to Iceberg Data Type Mapping

When syncing data from PostgreSQL to Iceberg, OLake handles data type conversions to ensure compatibility. Below is a table that outlines how PostgreSQL data types are mapped to Iceberg data types:

PostgreSQL Data Types	Iceberg Data Type
`int`, `int2`, `int4`, `smallint`, `integer`, `serial`, `serial2`, `serial4`	`int`
`bigint`, `bigserial`, `int8`, `serial8`	`bigint`
`float`, `float4`, `real`, `numeric`	`float`
`double precision`, `float8`	`double`
`boolean`	`boolean`
`date`, `timestamp without time zone`, `timestamp with time zone`	`timestamp`
`box`, `bpchar`, `char`, `character`, `character varying`, `json`, `jsonb`, `jsonpath`, `name`, `numrange`, `path`, `text`, `tid`, `tsquery`, `tsrange`, `tstzrange`, `uuid`, `varbit`, `varchar`, `xml`, `inet`, `bit`, `int2vector`, `daterange`, `time with time zone`, `time without time zone`	`string`

Changelog

Expand to review

Version	Date	Pull Request	Subject
v0.0.3	14.04.2025	https://github.com/datazip-inc/olake/pull/203
v0.0.4	31.04.2025	https://github.com/datazip-inc/olake/pull/250

Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!

Postgres Source

Create a Postgres Source in OLake UI

Edit Postgres Source in OLake UI

Jobs Associated with Postgres Source

Delete Postgres Source in OLake UI

To sync data TLDR:

Features

Postgres Driver

Supported Modes

Supported Destinations

Setup and Configuration

Source File

Commands

Discover Command

Usage

Streams File

Writer File

Sync Command

PostgreSQL to Iceberg Data Type Mapping

Changelog

Need Assistance?

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube

Create a Postgres Source in OLake UI​

Edit Postgres Source in OLake UI​

Jobs Associated with Postgres Source​

Delete Postgres Source in OLake UI​

To sync data TLDR:​

Features​

Postgres Driver​

Supported Modes​

Supported Destinations​

Setup and Configuration​

Source File​

Commands​

Discover Command​

Usage​

Streams File​

Writer File​

Sync Command​

PostgreSQL to Iceberg Data Type Mapping​

Changelog​

Need Assistance?

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube

Create a Postgres Source in OLake UI

Edit Postgres Source in OLake UI

Jobs Associated with Postgres Source

Delete Postgres Source in OLake UI

To sync data TLDR:

Features

Postgres Driver

Supported Modes

Supported Destinations

Setup and Configuration

Source File

Commands

Discover Command

Usage

Streams File

Writer File

Sync Command

PostgreSQL to Iceberg Data Type Mapping

Changelog