Scenario
Suppose you have a banking application running on PostgreSQL. There are two tables: “users” and “transactions.” You want to sync these tables in real-time to AVRO files in your S3 or GCS buckets which other services or clients can pick up.Demo
Let’s look at a short video showcasing CDC from PostgreSQL to Google Cloud Storage with initial load.Prerequisites
Postgres Peer
- Enable logical decoding in PostgreSQL. Ensure that the following settings/GUCs are properly configured:
- wal_level: logical
- max_wal_senders: >1
- max_replication_slots: 4
- Enable replication access for a PostgreSQL user - ALTER USER pg_user REPLICATION;
- Ensure that both tables have primary keys. Composite primaries are also fine. If not, make sure your tables have REPLICA IDENTITY FULL.
- If you are using PostgreSQL on the cloud, below links capture how to enable logical replication for each cloud:
S3 peer
- Ensure you have an existing bucket to use.
- For PeerDB to access the S3 Peer, you can either use an existing AWS user or create a new AWS user.
- Create access keys for that user through AWS Console or AWS CLI or AWS API
-
For the same user, create and attach a policy as below using JSON editor. Sharing AWS docs for reference. PeerDB requires
s3:ListAllMyBuckets
,s3:GetObject
,s3:PutObject
,s3:ListBucket
ands3:DeleteObject
on that bucket.
GCS peer
- Ensure you have an existing bucket to use.
- For GCS, you must create a HMAC key-pair and use this for
ACCESS_KEY_ID
andSECRET_ACCESS_KEY
in the Docker compose file.
Step 1: Add PostgreSQL and S3/GCS Peers
Run the following commands to let PeerDB know about the existing PostgreSQL and S3.Step 2: Real-Time CDC from PostgreSQL to S3/GCS by creating a MIRROR
Create MIRROR using UI
If you prefer a UI, you can easily create a mirror using the PeerDB UI. Refer to the below video:Create MIRROR using SQL
To facilitate real-time Change Data Capture (CDC) from PostgreSQL to S3 or GCS, set up your peers and then create a mirror using the following SQL syntax:mirror-name
: Desired name for the mirror.postgres-peer-name
: Name of the PostgreSQL peer.storage-peer-name
: Name of the S3/GCS peer.- max_batch_size: Maximum number of records in a batch.
- publication_name: Name of the publication.
Step 3: Monitor the MIRROR
You can connect tolocalhost:8085
to get full visibility into the different jobs and steps that PeerDB is taking under the covers to manage the MIRROR.
