Skip to main content

Amazon RDS PostgreSQL CDC Setup Guide

This guide walks you through setting up Change Data Capture (CDC) for Amazon RDS PostgreSQL instances using logical replication and the wal2json plugin to enable real-time data replication with OLake.

Prerequisites:

  • An Amazon RDS for PostgreSQL instance (PostgreSQL 10+).
  • A custom DB parameter group for this instance (the default parameter group cannot be modified).
  • The RDS master user or another user with the rds_superuser and rds_replication roles.
  • Automated backups enabled (backup retention > 0) – this is needed for long-term WAL retention on RDS

Steps:

  1. Create and Configure a Parameter Group: In the AWS RDS console, go to Parameter Groups and create a new parameter group for your PostgreSQL version (if you don’t already have one).

postgres-rds-pg-1

Associate the parameter group with your database instance and configure:

ParameterValuePurpose
rds.logical_replication1Enables logical replication and wal2json
wal_levellogicalUsually set automatically by above
max_replication_slots≥5Number of concurrent CDC connections
max_wal_senders≥7Should exceed replication slots
wal_sender_timeout0(Optional) Prevents snapshot timeouts

Save the parameter group changes.

postgres-rds-pg-2

caution

rds.logical_replication can only be turned on for read-replica instances that are running PostgreSQL 16 or later. If your replica is on an earlier major version the parameter will be ignored and logical decoding will NOT be available on that replica. Either upgrade the replica to v16+ or enable logical decoding only on the primary.

  1. Reboot the RDS Instance: Reboot the database instance so that the static parameters take effect. After reboot, confirm the settings:

    SELECT name, setting 
    FROM pg_settings
    WHERE name IN ('rds.logical_replication','wal_level');

    The query should show rds.logical_replication = on, and wal_level = logical.

  2. Set Up a Replication User:

    Option A: Use Master User (Recommended for simplicity)
    The RDS master user already has rds_superuser and rds_replication roles.

    Option B: Create Dedicated User

    CREATE USER cdc_user WITH PASSWORD 'strongpassword';
    GRANT rds_replication TO cdc_user;
    GRANT ALL ON SCHEMA public TO cdc_user; -- Prevents schema permission errors

    Optional: Read-only Privileges for Initial Snapshot

    For stricter security, grant only read access instead of ALL:

    -- Example for the "public" schema in database mydb
    GRANT CONNECT ON DATABASE mydb TO cdc_user;
    GRANT USAGE ON SCHEMA public TO cdc_user;
    GRANT SELECT ON ALL TABLES IN SCHEMA public TO cdc_user;
    -- Ensure future tables are covered automatically
    ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO cdc_user;
  3. Create a Logical Replication Slot:

    Slot Name Configuration

    Use the exact replication slot name that you configure in OLake UI or CLI. The slot name must match the replication_slot parameter in your OLake source configuration.

    You can let OLake create the slot automatically, or manually create one for testing:

    -- Replace 'your_olake_slot_name' with the slot name from your OLake configuration
    SELECT *
    FROM pg_create_logical_replication_slot('your_olake_slot_name', 'wal2json');

    Common Errors:

    ErrorSolution
    "must be superuser or replication role"Use master user or grant rds_replication
    "logical decoding requires wal_level >= logical"Check parameter group configuration
    "plugin not found"Ensure rds.logical_replication = 1
  4. Test Decoding (Optional): As a quick test, you can generate a change and read from the slot:

    INSERT INTO test_table(val) VALUES('rds cdc test');
    -- Replace 'your_olake_slot_name' with your actual slot name
    SELECT data FROM pg_logical_slot_get_changes('your_olake_slot_name', NULL, NULL, 'pretty-print', '1');

    You should see JSON change records from wal2json. This confirms CDC is capturing changes.

  5. Connect Your CDC Connector: Use the RDS endpoint, database name, and the replication user (cdc_user or master) in your OLake configuration. Specify the plugin as wal2json. The connector will stream changes via the logical slot.

    Make sure the network setup allows the connector to reach the RDS instance (RDS must be publicly accessible or your connector is in the same VPC/network).

    Now that you have configured the database and created the CDC user, you can add the PostgreSQL source in OLake to build an ELT pipeline to Iceberg or Parquet. See the PostgreSQL connector overview for a high-level walkthrough.

Troubleshooting

  • Parameter Group Not Applying: If rds.logical_replication remains off after reboot, check that:

    • The instance is actually using your custom parameter group (in the RDS console, on the Configuration tab, verify the parameter group name).
    • The parameter group family matches your engine version (e.g., “postgres12”). If not, some parameters might be ignored.
    • You saved the changes and rebooted. Static parameters won’t apply without a reboot.
  • Access Issues: If the OLake reports ERROR: permission denied for relation/sequence ... or "permission denied for schema" during snapshot or streaming, the replication user might lack SELECT privileges on those tables/schemas. Logical decoding itself does not check table ACLs for emitting changes. Grant the necessary privileges or use the master user for the initial snapshot. Also, ensure the user has the rds_replication role to use the slot.

  • Slot or Plugin Errors:

    • If creating a slot yields ERROR: permission denied to create replication slot, use the master user or a user with rds_superuser role to create it.
    • If you see ERROR: logical decoding requires wal_level "logical", it means wal_level is not logical – double-check the parameter group (maybe the DB is still running with the old param group).
    • If the error is about wal2json not found, ensure rds.logical_replication is enabled, as that is required to load wal2json on RDS.
  • WAL buildup / Disk-Full: Similar to Aurora, an inactive logical slot on RDS will prevent WAL files from being purged. Monitor the WAL storage on RDS (CloudWatch metric "TransactionLogsDiskUsage"). If your CDC connector falls behind or is stopped for a while, the disk usage can grow. In extreme cases, you might get RDS events complaining that “The transaction logs have consumed XX% of storage”. To fix, either restart the CDC process to consume the backlog or drop the slot if you no longer need it. For example:

    SELECT pg_drop_replication_slot('cdc_slot');

    Do this only if you are okay with re-snapshotting, as dropping the slot means you’ll lose the record of changes.

  • Replication Role Setup: On RDS, you cannot simply do ALTER ROLE ... WITH REPLICATION (superuser required). Instead, AWS uses the predefined roles. The master user automatically has rds_superuser and rds_replication. If you create a new user, you can grant rds_replication to it. If that grant fails, it may require you to be master user (try as master). If using the master user for CDC, no action needed. In summary, many users choose the master user for CDC on RDS to avoid role issues.

  • Network and Connectivity: Ensure your RDS instance is reachable:

    • If it’s in a private subnet, your CDC application must have network access (consider running the CDC connector on an EC2 in the same VPC).
    • If it’s publicly accessible, open the security group to the connector's IP (or corporate network) on port 5432.
    • RDS may enforce SSL; use the SSL connection if required (check the parameter rds.force_ssl).
  • Schema Changes: If the schema changes (DDL statements), wal2json will emit them in the WAL (as DDL is logged), but your connector needs to handle schema refresh accordingly.

By following these steps, CDC should be active on your RDS PostgreSQL instance, streaming changes through wal2json.


Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!