PostgreSQL To AWS S3 And Google Cloud Storage
PeerDB support Change-Data-Capture (CDC) from PostgreSQL to S3 and GCS buckets in the form of AVRO files in the destination. We utilise the interoperability between GCS and S3 here.
Scenario
Suppose you have a banking application running on PostgreSQL. There are two tables: “users” and “transactions.” You want to sync these tables in real-time to AVRO files in your S3 or GCS buckets which other services or clients can pick up.
Demo
Let’s look at a short video showcasing CDC from PostgreSQL to Google Cloud Storage with initial load.
Prerequisites
Postgres Peer
- Enable logical decoding in PostgreSQL. Ensure that the following settings/GUCs are properly configured:
- wal_level: logical
- max_wal_senders: >1
- max_replication_slots: 4
- Enable replication access for a PostgreSQL user - ALTER USER pg_user REPLICATION;
- Ensure that both tables have primary keys. Composite primaries are also fine. If not, make sure your tables have REPLICA IDENTITY FULL.
- If you are using PostgreSQL on the cloud, below links capture how to enable logical replication for each cloud:
S3 peer
-
Ensure you have an existing bucket to use.
-
For PeerDB to access the S3 Peer, you can either use an existing AWS user or create a new AWS user.
-
Create access keys for that user through AWS Console or AWS CLI or AWS API
-
For the same user, create and attach a policy as below using JSON editor. Sharing AWS docs for reference. PeerDB requires
s3:ListAllMyBuckets
,s3:GetObject
,s3:PutObject
,s3:ListBucket
ands3:DeleteObject
on that bucket.
GCS peer
- Ensure you have an existing bucket to use.
- For GCS, you must create a HMAC key-pair and use this for
ACCESS_KEY_ID
andSECRET_ACCESS_KEY
in the Docker compose file.
Step 1: Add PostgreSQL and S3/GCS Peers
Run the following commands to let PeerDB know about the existing PostgreSQL and S3.
Please refer to our CREATE PEER documentation for storage peers.
Step 2: Real-Time CDC from PostgreSQL to S3/GCS by creating a MIRROR
Create MIRROR using UI
If you prefer a UI, you can easily create a mirror using the PeerDB UI. Refer to the below video:
Create MIRROR using SQL
To facilitate real-time Change Data Capture (CDC) from PostgreSQL to S3 or GCS, set up your peers and then create a mirror using the following SQL syntax:
Example:
Parameters:
mirror-name
: Desired name for the mirror.postgres-peer-name
: Name of the PostgreSQL peer.storage-peer-name
: Name of the S3/GCS peer.- max_batch_size: Maximum number of records in a batch.
- publication_name: Name of the publication.
Remember to adjust placeholder values (<…>) with your specific details and preferences. The example above has been abbreviated for clarity; ensure you provide all the necessary mappings and configurations in practice.
Step 3: Monitor the MIRROR
You can connect to localhost:8085
to get full visibility into the different jobs and steps that PeerDB is taking under the covers to manage the MIRROR.
Step 5: DROP MIRROR
To make it easy in your development and test environments, PeerDB also introduces the DROP MIRROR command. DROP MIRROR drops all the underlying objects that CREATE MIRROR generates. More details are available in this PR.