Skip to main content

How to set up AWS RDS Postgres along with CDC

· 6 min read
Pavan kalyan Chiluka
Community Member

Introduction

We have observed a large number of people not setting up their databases with CDC or having to set up CDC after they start ingesting and finding it complex.

As a result, we have decided to begin writing about how to configure popular databases with CDC to make your data journey smoother and faster.

Let's explore how to configure AWS RDS Postgres along with CDC.

As the first part of this series, let's explore how to set up and configure a Postgres database on Amazon RDS for change data capture (CDC), so you can emit data change streams to tools like Airbyte, Debezium, or managed stream processing platforms such as Jitsu, and Decodable.

What and Why's of CDC?

Change Data Capture (CDC) is a method used in databases to track and record changes made to data.

Recent trends in the data engineering industry point towards the increasing importance of real-time data processing and the integration of artificial intelligence (AI) in data architecture. This led to an enormous increase of data along with the need for faster processing and decision-making based on this data. CDC enables organisations to adapt to relentless growth of data and to implement real-time data updates by enabling real-time data replication.

tip

This doc is intended for both who are setting up a new database and also for users who need to modify their existing database for CDC.

Prerequisites

Before you create a database, you must set up a database parameter group and a VPC security group. The parameter group specifies the type of database you want and how that database should be configured. The VPC security group is used to connect to your database running on RDS securely.

tip

If you already have the database, you can just check and make the changes needed in the parameter groups and vpc security groups that are attached with the database. Jump to Step Modify Database Parameter Group

Create Database Parameter Group

AWS RDS already has a default RDS parameter group as given in the below picture and you won't be able to edit the parameters from this group.

how-to-set-up-postgresql-cdc-on-aws-rds-1

Hence it is advised to create a new parameter group as suggested below.

  1. Navigate to RDS and then to Parameter groups and the click on Create parameter group

  2. Enter details such as Name, Description and Database details.

    1. In Engine type, choose Postgres (different from Aurora Postgres)
    2. Choose the required postgres version

how-to-set-up-postgresql-cdc-on-aws-rds-2

  1. Click on Create and parameter group will be created.

Modify Database Parameter Group

Some of the parameters need to be changed to configure this database for CDC

info

All of the mentioned parameters can be modified even in an existing parameter group if you are not using a default one.

(Tip: Even in the parameter group, there are a few Non Modifiable types. You can check the Value Type in the details of the parameter group for this information)

how-to-set-up-postgresql-cdc-on-aws-rds-3

Click edit on the created parameter group and set the following parameters:

  1. Set rds.logical_replication to 1 in the parameter group
  2. Set max_replication_slots to atleast 5 (default is 20) so that the internal replication by RDS is not affected, best practice is set this to the number of concurrent replication streams you expect
  3. Set max_wal_senders to atleast 7 (default is 20), best to set this value greater than max_replication_slots. A common practice is to set it to max_replication_slots + 2
  4. Set wal_sender_timeout to 0 (default is 30000)
  5. Save these changes.
  6. Restart the database instance for the changes to be effective

If you are wondering what you have done. Let's get into details

rds.logical_replication: This parameter enables or disables logical replication for a PostgreSQL DB instance. logical replication is a method used to replicate changes made to a database at a logical level, capturing changes to the data itself, such as inserts, updates, and deletes, and then replicating those changes to another database in near real-time. Note that logical_rpelication is a kind of technique used for CDC

max_replication_slots: This parameter specifies the maximum number of replication slots to be used for WAL replication.

max_wal_senders: This parameter specifies the maximum number of WAL senders or processes that can be connected to the server for streaming replication.

wal_sender_timeout: This parameter specifies the time (in milliseconds) that the server allows for the WAL sender processes to send data without receiving an acknowledgment before the connection is closed

Create VPC Security Group

Everything on RDS runs within virtual private networks, which means we need to configure accessibility to those resources such as our Postgres database. Navigate to VPC settings and create a new security group. Then we configure inbound and outbound access routes for this security group.

Create the database:

  1. Navigate to RDS → Databases
  2. Choose on Create database and then Standard Create
  3. Choose the Engine version the same as the same family mentioned in the parameter group
  4. Make sure to mention the same VPC security group you have created
  5. In Additional configiuration, specify the DB parameter group to use the group we just created
  6. Also choose backup retention period to atleast 7 days
  7. Create the database
warning

In case you have edited the parameter group or changed backup retention, restart the database so that the changes are effective

Check Connectivity

To check it is properly configured, connect to your database and run this command

select * from pg_settings where name in ('wal_level', 'rds.logical_replication')

You should see results like below ( settings , on and logical )

how-to-set-up-postgresql-cdc-on-aws-rds-4

Now we could connect to this database using our Postgres root user. However, best practices are to use a dedicated account which has the minimal set of required privileges for CDC. Use this user credentials to connect to the Postgres source.

info

Create a user and grant replication access.​For More

The following commands will create a new user:

CREATE USER <user_name> PASSWORD 'your_password_here';

Now, provide this user with read-only access to relevant schemas and tables. Re-run this command for each schema you expect to replicate data from:

GRANT USAGE ON SCHEMA <schema_name> TO <user_name>;
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO <user_name>;
ALTER DEFAULT PRIVILEGES IN SCHEMA <schema_name> GRANT SELECT ON TABLES TO <user_name>;

User also need rds_replication role, run the below command for the same:

GRANT rds_replication to <user_name>;

Bonus Tips and Tricks

# To check all the replication slot
SELECT * FROM pg_replication_slots;

# To create a replication slot
SELECT pg_create_logical_replication_slot('<slot_name>','pgoutput'); --> Replace <slot_name> with the name of the slot that you would like to create

#If you want to delete a replication slot
select pg_drop_replication_slot('<slot_name>');
---
# To check all the publications
SELECT * from pg_publication_tables;

# To create a publication for all the tables in the database
CREATE PUBLICATION <publi_name> FOR ALL TABLES;
# To create a publication for a particular table in the database
CREATE PUBLICATION <publi_name> FOR TABLE table_name;
# To delete a publication
DROP PUBLICATION IF EXISTS <publi_name>;