Skip to main content

Setup Postgres Replica Set via Docker Compose

This guide explains how to spawn a Postgres replica set using Docker Compose for unit tests. It also covers instructions for ingesting sample data, and verifying the setup.

info

Clone the OLake repository if you want to build OLake locally, or skip to the part to use Dockerized OLake.

GitHub repository

git clone git@github.com:datazip-inc/olake.git
tip

Enable wal2json

To enable the wal2json plugin, you need to add the following line to the postgresql.conf file:

wal_level = logical
max_wal_senders = 10
max_replication_slots = 10

Navigate to ./drivers/postgres/config (if building locally) OR just create a directory (say olake-docker) anywhere in your system if you want to use Dockerzied OLake and create these files:

  1. docker-compose.yaml
  2. config.json
  3. writer.json -> Refer the additional-references for details on writer file config.

discover-sync

1. docker-compose.yaml for syncing Data with Dockerized OLake

docker-compose.yaml
version: "3.9"
services:
primary:
container_name: primary
image: postgres:15
hostname: primary
ports:
- "5432:5432"
environment:
POSTGRES_USER: main
POSTGRES_PASSWORD: password
POSTGRES_DB: main
# Install wal2json and then start Postgres via the official entrypoint script
command: >
bash -c "apt-get update && apt-get install -y postgresql-15-wal2json && exec docker-entrypoint.sh postgres -c wal_level=logical -c max_wal_senders=10 -c max_replication_slots=10"
volumes:
- pg-data:/var/lib/postgresql/data
networks:
- pg-cluster
healthcheck:
test: ["CMD", "pg_isready", "-U", "main", "-d", "main"]
interval: 10s
timeout: 5s
retries: 10

data-loader:
image: postgres:15
container_name: sample_data_loader
environment:
PGUSER: main
PGPASSWORD: password
PGDATABASE: main
depends_on:
primary:
condition: service_healthy
entrypoint: >
bash -c "
echo \"Waiting for Postgres to be ready...\";
until pg_isready -h primary -p 5432 -U main -d main; do
echo \"Waiting...\";
sleep 2;
done;
echo \"Creating test table sample_data...\";
psql -h primary -U main -d main -c \"CREATE TABLE IF NOT EXISTS sample_data (id SERIAL PRIMARY KEY, str_col TEXT, num_col INT);\";
echo \"Inserting one test row...\";
psql -h primary -U main -d main -c \"INSERT INTO sample_data (str_col, num_col) VALUES ('Hello world', 123);\";
echo \"Creating logical replication slot...\";
psql -h primary -U main -d main -c \"SELECT * FROM pg_create_logical_replication_slot('postgres_slot', 'wal2json');\";
echo \"Done. Data and replication slot should now exist.\"
"
restart: "no"
networks:
- pg-cluster

networks:
pg-cluster:

volumes:
pg-data:

2. OLake Integration

After verifying Postgres’s configuration, proceed with OLake’s integration steps. Refer to the getting started guide for more details.

Update your source configuration file (config.json) to connect to Postgres as follows:

config.json
{
"host": "localhost",
"port": 5432,
"database": "main",
"username": "main",
"password": "password",
"jdbc_url_params": {},
"ssl": {
"mode": "disable"
},
"update_method": {
"replication_slot": "postgres_slot",
"intial_wait_time": 10
},
"reader_batch_size": 100000,
"default_mode": "cdc",
"max_threads": 50
}

3. Starting Postgres

To start the Postgres container, run the following command from your project directory:

docker compose up -d

4. Start replicating you Data

info

PATH_TO_OLAKE_DIRECTORY is the absolute path where you have created the directory [as discussed above]. -v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \ maps to -v /Users/JOHN_DOE_USERNAME/Desktop/projects/olake-docker:/mnt/config \ in macOS and Linux systems. Follow the same pattern in other systems.

4.1. Discover Collection Schema

docker run \
-v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
olakego/source-postgres:latest \
discover \
--config /mnt/config/config.json

4.2. Sync Data

docker run \
-v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
olakego/source-postgres:latest \
sync \
--config /mnt/config/config.json \
--catalog /mnt/config/catalog.json \
--destination /mnt/config/writer.json

Sample output syncing our sample reddit data and writing locally in parquet format:

docker compose 4

4.3. Sync with state

docker run \
-v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
olakego/source-postgres:latest \
sync \
--config /mnt/config/config.json \
--catalog /mnt/config/catalog.json \
--destination /mnt/config/writer.json \
--state /mnt/config/state.json

Additional References

Troubleshooting

If you encounter any issues, consider the following:

  • Container Logs:
    Check container logs for Postgres using:

    docker logs CONTAINER_ID 
  • To stop the container, perform:

    docker compose down --remove-orphans -v
  • To bash into the Postgres container, perform:

    docker exec -it CONTAINER_ID /bin/bash
    psql -h localhost -U main

Enter the password when prompted, in this case, its password

and then you can use any of the below commands to interract.

\l; // to list all databases
\c <db_name> // to switch to other database
\dt // // to see and list all the tables of selected database
\d <table-name> // describe the table schema
\du // to list users and their roles

Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!