Skip to main content

Postgres Overview

To sync data TLDR:

  1. Create a config.json with your Postgres connection details.
  2. Create a writer.json with your Writer (Apache Iceberg or AWS S3) connection details.
  3. Run discover to generate a catalog.json of available streams.
  4. Run sync to replicate data to your specified destination.

Features

  • Resumable Full Load: Supports restarting full loads without reprocessing all data.
  • Fast Full Load via Chunking: • CTID-based chunking • Single split-column chunking • Evenly distributed chunks (for int/float PKs) • Uneven chunking support
  • WAL2JSON-based CDC Sync for incremental updates.
caution

Currently plugin of pgoutput as replication slot is not supported, only WAL2JSON is supported.

  • New streams are automatically synced when added.
  • LSN mismatch handling triggers a full resync.

Postgres Driver

The Postgres Driver enables data synchronization from Postgres to your desired destination.

Below is an overview of the supported modes and writers for Postgres data replication, along with tables summarizing the details.

Supported Modes

Our replication process supports various modes to fit different data ingestion needs.

The Full Refresh mode retrieves the entire dataset from Postgres and is ideal for initial data loads or when a complete dataset copy is required.

In contrast, CDC (Change Data Capture) continuously tracks and synchronizes incremental changes in real time, making sure that your destination remains updated with minimal latency.

The Incremental mode is currently under development (WIP).

ModeDescription
Full RefreshFetches the complete dataset from Postgres.
CDC (Change Data Capture)Tracks and syncs incremental changes from Postgres in real time.
IncrementalNo, WIP

Supported Writers

OLake replicates data to multiple destinations to cater to a variety of deployment scenarios.

Whether you're storing data locally for quick access or using cloud storage services like S3 for scalability, our system is designed to integrate seamlessly.

We are also working on adding support for Iceberg to facilitate advanced analytics and data lake management.

DestinationSupportedDocs
Local FilesystemYesLink
S3YesLink
IcebergYesLink

Setup and Configuration

To run the Postgres Driver, configure the following files with your specific credentials and settings:

  • config.json: Postgres connection details.
  • catalog.json: List of collections and fields to sync (generated using the Discover command).
  • write.json: Configuration for the destination where the data will be written.

Place these files in your project directory before running the commands.

Config File

Add Postgres credentials in following format in config.json file as shown here.

Commands

Discover Command

The Discover command generates json content for catalog.json file, which defines the schema of the collections to be synced.

Usage

To run the Discover command, use the following syntax

docker run --pull=always  \
-v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
olakego/source-postgres:latest \
discover \
--config /mnt/config/config.json

Catalog File

After executing the Discover command, a catalog.json file is created. Read more about Catalog File here.

Writer File

Read about about

  1. Apache Iceberg Writer config
  2. S3 writer here
  3. local writer

Sync Command

The Sync command fetches data from Postgres and ingests it into the destination.

docker run --pull=always  \
-v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
olakego/source-postgres:latest \
sync \
--config /mnt/config/config.json \
--catalog /mnt/config/catalog.json \
--destination /mnt/config/writer.json

To run sync with state:

docker run --pull=always  \
-v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
olakego/source-postgres:latest \
sync \
--config /mnt/config/config.json \
--catalog /mnt/config/catalog.json \
--destination /mnt/config/writer.json \
--state /mnt/config/state.json

Find more about state file and its configuration here.

Changelog

Expand to review
VersionDatePull RequestSubject

Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!