Last updated:5/2/2025|... min read

Setup Postgres Replica Set via Docker Compose (local)

This guide explains how to spawn a Postgres replica set using Docker Compose. It also covers instructions for ingesting sample data, and verifying the setup.

note

This compose file is not for production use. It is meant for local development and testing purposes only. It spawns a single Postgres instance with a replica set configuration. For production use, consider using a more robust setup with multiple nodes and proper replication configurations.

Navigate to ./drivers/postgres/config (if building locally) OR just create a directory (say OLAKE_DIRECTORY) anywhere in your system if you want to use Dockerzied OLake and create these files:

docker-compose.yml
source.json
destination.json -> Refer the additional-references for details on writer file config.

discover-sync

1. Starting `docker-compose.yml`

This compose file does the following:

Primary Postgres Service: Runs a Postgres container with the wal2json plugin installed for logical replication.
Data Loader Service: Loads sample data into the Postgres database and creates a logical replication slot.
Network: Sets up a custom network for the Postgres services.
Volume: Uses a named volume to persist Postgres data across container restarts.

docker-compose.yml
version: "3.9"  # Specify the Compose file format version.

services:
  # Primary Postgres service acting as the main database server.
  primary_postgres:
    container_name: primary_postgres               # Set an explicit name for the container.
    image: postgres:15                             # Use the official Postgres version 15 image.
    hostname: primary_postgres                     # Define the hostname within the network.
    ports:
      - "5431:5432"                               # Map port 5432 inside the container to 5431 on the host.
    environment:
      POSTGRES_USER: main                          # Set the default Postgres username.
      POSTGRES_PASSWORD: password                  # Set the password for the Postgres user.
      POSTGRES_DB: main                            # Create a default database named "main".
    # Custom command to install the wal2json plugin for logical decoding and then start Postgres.
    command: >
      bash -c "apt-get update && apt-get install -y postgresql-15-wal2json && exec docker-entrypoint.sh postgres -c wal_level=logical -c max_wal_senders=10 -c max_replication_slots=10"
    # Explanation of the command:
      # 1. Update the package list.
      # 2. Install the postgresql-15-wal2json package, which is used to convert WAL (Write-Ahead Logging)
      #    into JSON format for logical replication.
      # 3. Use exec to run the default docker-entrypoint script provided by the Postgres image.
      # 4. Start Postgres with additional configuration parameters:
      #    - wal_level=logical: Enable logical replication.
      #    - max_wal_senders=10: Allow up to 10 concurrent WAL sender processes.
      #    - max_replication_slots=10: Allow up to 10 replication slots for logical decoding.
    volumes:
      - pg-data:/var/lib/postgresql/data          # Use a named volume (pg-data) to persist Postgres data.
    networks:
      - pg-cluster                                 # Connect the container to the custom network "pg-cluster".
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "main", "-d", "main"]  # Healthcheck command to check if Postgres is ready.
      interval: 10s                                # Run the healthcheck every 10 seconds.
      timeout: 5s                                  # Set a timeout of 5 seconds for the healthcheck command.
      retries: 10                                  # Retry the healthcheck up to 10 times before declaring unhealthy.

  # Data loader service to perform operations against the Postgres database.
  data-loader:
    image: postgres:15                             # Use the same version of Postgres to ensure compatible client tools.
    container_name: sample_data_loader             # Set a custom name for clarity.
    environment:
      PGUSER: main                                 # Set the default Postgres user for the client.
      PGPASSWORD: password                         # Set the password for the Postgres user.
      PGDATABASE: main                             # Connect to the "main" database.
    depends_on:
      primary_postgres:
        condition: service_healthy               # Wait until the primary_postgres service passes its health check.
    entrypoint: >
      bash -c " 
        echo \"Waiting for Postgres to be ready...\"; 
        # Poll until Postgres is reachable.
        until pg_isready -h primary_postgres -p 5432 -U main -d main; do 
          echo \"Waiting...\"; 
          sleep 2; 
        done; 
        echo \"Creating test table sample_data...\"; 
        # Execute a SQL command to create a table if it doesn't exist.
        psql -h primary_postgres -U main -d main -c \"CREATE TABLE IF NOT EXISTS sample_data (id SERIAL PRIMARY KEY, str_col TEXT, num_col INT);\"; 
        echo \"Inserting one test row...\"; 
        # Insert a sample row into the table.
        psql -h primary_postgres -U main -d main -c \"INSERT INTO sample_data (str_col, num_col) VALUES ('Hello world', 123);\"; 
        echo \"Creating logical replication slot...\"; 
        # Create a logical replication slot using the wal2json output plugin.
        psql -h primary_postgres -U main -d main -c \"SELECT * FROM pg_create_logical_replication_slot('postgres_slot', 'wal2json');\"; 
        echo \"Done. Data and replication slot should now exist.\" 
      "
    # Explanation of the entrypoint:
    # - Wait until Postgres is ready to accept connections using pg_isready.
    # - Create a table named "sample_data" with three columns: id, str_col, and num_col.
    # - Insert a sample row into the table.
    # - Create a logical replication slot named "postgres_slot" using the wal2json plugin.
    restart: "no"                                  # Do not automatically restart the container after completion.
    networks:
      - pg-cluster                                # Connect to the same custom network "pg-cluster".

# Define a custom network for Postgres services.
networks:
  pg-cluster:

# Define a volume to persist Postgres data.
volumes:
  pg-data:

Using Dockerized OLake
Using Local build OLake

2. Starting Postgres

To start the Postgres container, run the following command from your project directory:

docker compose up -d

3. OLake Integration

Update your source configuration file (source.json) to connect to Postgres as follows:

source.json
{
    "host": "host.docker.internal",
    "port": 5431,
    "database": "main",
    "username": "main",
    "password": "password",
    "jdbc_url_params": {},
    "ssl": {
        "mode": "disable"
    },
    "update_method": {
        "replication_slot": "postgres_slot",
        "intial_wait_time": 10
    },
    "reader_batch_size": 100000,
    "default_mode": "cdc",
    "max_threads": 5
}

1. Copy the content from the above `docker-compose.yml` file to your local directory.

info

Clone the OLake repository.

GitHub repository:

git clone git@github.com:datazip-inc/olake.git

2. Starting Postgres

To start the Postgres container, run the following command from your project directory:

docker compose -f ./drivers/postgres/config/docker-compose.yml up -d

3. OLake Integration

Update your source configuration file (source.json) to connect to Postgres as follows:

source.json
{
    "host": "localhost",
    "port": 5431,
    "database": "main",
    "username": "main",
    "password": "password",
    "jdbc_url_params": {},
    "ssl": {
        "mode": "disable"
    },
    "update_method": {
        "replication_slot": "postgres_slot",
        "intial_wait_time": 10
    },
    "reader_batch_size": 100000,
    "default_mode": "cdc",
    "max_threads": 5
}

info

Now that you are setup with a local database setup, head over to the getting started guide to start syncing data with OLake.

4. Perform DDL and DML operations to test OLake (optional)

You can perform some basic DDL and DML operations to test your OLake setup. Here are some commands you can run in the Postgres container.

To check the status of the container, perform:
```
docker ps
```
This will list all running containers. Look for the container with the name primary_postgres.
Container Logs:
Check container logs for Postgres using:
```
docker logs primary_postgres 
```
To bash into the Postgres container, perform:
```
docker exec -it primary_postgres /bin/bash
```
To connect to the Postgres database, perform:
```
psql -h localhost -U main
```

Enter the password when prompted, in this case, its password and then you can use any of the below commands to interract.

Some handy PostgreSQL commands

Click to expand

Below are additional PostgreSQL commands—in the familiar psql style—for both Data Definition Language (DDL) and Data Manipulation Language (DML) operations. These examples use your configuration with the database main and, for instance, the table sample_data. You can run these commands in your psql shell.

-- List all databases on the PostgreSQL server
\l;

-- Switch to the "main" database
\c main;

-- List all tables in the "main" database
\dt;

-- Describe the schema of the "sample_data" table
\d sample_data;

-- List all users and their roles
\du;

-- Show help on all available psql meta-commands
\?;

-- Quit the psql interactive terminal
\q;

-- Toggle expanded display mode for query results (useful for wide outputs)
\x;

-- Display current connection information for the session
\conninfo;

-- Toggle query execution timing display (show how long each query takes)
\timing on;      -- Enable timing
-- \timing off;  -- Disable timing

-- Execute a shell command from within psql (e.g., list files in the current directory)
\! ls;

-- Export query results from the "sample_data" table to a CSV file named output.csv
\copy (SELECT * FROM sample_data) TO 'output.csv' CSV HEADER;

DDL (Data Definition Language) Commands

-- Create a new table named sample_data_new with example columns.
CREATE TABLE sample_data_new (
    id SERIAL PRIMARY KEY,         -- Auto-incrementing primary key.
    description TEXT,              -- A text column for descriptions.
    created_at TIMESTAMP DEFAULT NOW()  -- Automatically record the creation time.
);

-- Alter the existing sample_data table by adding a new column named updated_at.
ALTER TABLE sample_data ADD COLUMN updated_at TIMESTAMP;

-- Create an index on the sample_data table to speed up queries on the str_col column.
CREATE INDEX idx_sample_data_str ON sample_data (str_col);

-- Drop the table sample_data_new if it is no longer needed.
DROP TABLE IF EXISTS sample_data_new;

-- List all indexes on tables matching sample_data* using the meta-command.
\di sample_data*

DML (Data Manipulation Language) Commands

-- Insert a new row into the sample_data table.
INSERT INTO sample_data (str_col, num_col) VALUES ('New entry', 456);

-- Update the num_col value for the row in sample_data with id = 1.
UPDATE sample_data SET num_col = 789 WHERE id = 1;

-- Delete the row from sample_data where id equals 1.
DELETE FROM sample_data WHERE id = 1;

-- Retrieve all data from sample_data.
SELECT * FROM sample_data;

-- Display the count of records in sample_data.
SELECT COUNT(*) FROM sample_data;

Enable Logical Replication Settings via SQL

Instead of modifying postgresql.conf, use SQL commands to apply runtime settings (if your Postgres setup allows it):

-- Enable logical WAL level 
ALTER SYSTEM SET wal_level = 'logical';  
-- Set max replication slots and senders 
ALTER SYSTEM SET max_replication_slots = 4; 
ALTER SYSTEM SET max_wal_senders = 4;  
-- Apply the changes 
SELECT pg_reload_conf(); 

Note: Some cloud-hosted databases (like RDS or Cloud SQL) require you to modify these settings via the provider’s console or parameter groups.

Grant Replication Permissions

If you are using separate role for OLake with name “olake_user”, ensure this user has the required privileges:

ALTER ROLE olake_user WITH REPLICATION; 

Or you can use any other Superuser or role with Replication permissions.

Create a Logical Replication Slot

OLake reads changes via a logical replication slot. Create one using:

SELECT * FROM pg_create_logical_replication_slot('olake_slot', 'wal2json'); 

This slot starts tracking changes from the current WAL position, allowing OLake to stream inserts, updates, and deletes.

5. Stop the Container

To stop the container, perform:

docker compose -f ./drivers/postgres/config/docker-compose.yml down --remove-orphans -v

To stop the container, perform:
```
docker compose down --remove-orphans -v
```

depending upon the directory you are in. This will stop and remove all containers defined in the docker-compose.yml file, along with any orphaned containers and associated volumes.

To remove the container, perform:
```
docker rm -f primary_postgres
```

This will forcefully remove the specified container by its ID.

To remove the image, perform:
```
docker rmi -f IMAGE_ID
```

This will forcefully remove the specified image by its ID.

Additional References

Configuration Files: Refer to the following documentation for configuration details:
Viewing Parquet Files: : Use the VS Code extension - Parquet Explorer to view Parquet files in your editor.

Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!

Setup Postgres Replica Set via Docker Compose (local)

1. Starting `docker-compose.yml`

2. Starting Postgres

3. OLake Integration

1. Copy the content from the above `docker-compose.yml` file to your local directory.

2. Starting Postgres

3. OLake Integration

4. Perform DDL and DML operations to test OLake (optional)

Some handy PostgreSQL commands

DDL (Data Definition Language) Commands

DML (Data Manipulation Language) Commands

Enable Logical Replication Settings via SQL

Grant Replication Permissions

Create a Logical Replication Slot

5. Stop the Container

Additional References

Need Assistance?

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube

1. Starting docker-compose.yml​

2. Starting Postgres​

3. OLake Integration​

1. Copy the content from the above docker-compose.yml file to your local directory.​

2. Starting Postgres​

3. OLake Integration​

4. Perform DDL and DML operations to test OLake (optional)​

Some handy PostgreSQL commands​

DDL (Data Definition Language) Commands​

DML (Data Manipulation Language) Commands​

Enable Logical Replication Settings via SQL​

Grant Replication Permissions​

Create a Logical Replication Slot​

5. Stop the Container​

Additional References​

Need Assistance?

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube

1. Starting `docker-compose.yml`

2. Starting Postgres

3. OLake Integration

1. Copy the content from the above `docker-compose.yml` file to your local directory.

2. Starting Postgres

3. OLake Integration

4. Perform DDL and DML operations to test OLake (optional)

Some handy PostgreSQL commands

DDL (Data Definition Language) Commands

DML (Data Manipulation Language) Commands

Enable Logical Replication Settings via SQL

Grant Replication Permissions

Create a Logical Replication Slot

5. Stop the Container

Additional References