PostgreSQL to BigQuery
Scenario
Suppose you have a banking application running on PostgreSQL. There are two tables: “users” and “transactions.” You want to sync these tables in real-time to BigQuery for analytics purposes, such as real-time fraud detection. Let’s see how we can make this happen within a few minutes and a few SQL commands using PeerDB.
Prerequisites
- Enable logical decoding in Postgres. Ensure that the following settings/GUCs are properly configured:
- wal_level: logical
- max_wal_senders: >1
- max_replication_slots: 4
- Enable replication access for a PostgreSQL user - ALTER USER pg_user REPLICATION;
- Ensure that both tables have primary keys. Composite primaries are also fine. If not, make sure your tables have REPLICA IDENTITY FULL.
- If you are using PostgreSQL on the cloud, below links capture how to enable logical replication for each cloud:
Step 1: Add Postgres and BigQuery Peers
Run the following commands to let PeerDB know about the existing Postgres and BigQuery Peers.
Make sure to replace (…)
with the appropriate connection details for both the PostgreSQL and BigQuery instances. More details on adding PEERs are available here.
Step 2: Real-Time CDC from PostgreSQL to BigQuery
With the peers set up, you can create a mirror that facilitates real-time CDC from PostgreSQL to BigQuery.
Create MIRROR using SQL
Since no CDC sync mode has been specified above, CDC will be performed in sql
mode.
To perform CDC via AVRO mode, you must set the following:
If you observe, TABLE MAPPING represents the table name mapping between the two Postgres peers. The final WITH
clause captures if you wanted to include initial snapshot as a part of the MIRROR. If you don’t include that WITH
, peerdb assumes that you don’t want to perform an initial snapshot. If just reads the slot and replays the changes to the target.
- Data type mapping between Postgres and BigQuery.
- If you want additional types to be supported or want to alter the existing data type mapping, please reach out to us. We can aim to support that within a few days. Also, PeerDB is fully open source, so feel free to submit a PR.
PeerDB also supports replicating TOAST columns very efficiently. Unlike most CDC tools, you don’t need to set up REPLICA IDENTITY FULL for replicating TOAST columns. This PR captures the infrastructural optimizations that PeerDB takes to support TOAST columns.
Create MIRROR using UI
If you prefer a UI, you can easily create a mirror using the PeerDB UI (localhost:3000). Refer to the below video:
Step 3: Validate the Mirror
Through the same PeerDB’s Postgres-compatible SQL interface, you can quickly validate the MIRROR (real-time CDC).
Step 4: Monitor the MIRROR
You can connect to localhost:8085
to get full visibility into the different jobs and steps that PeerDB is taking under the covers to manage the MIRROR.
Coming Soon
- Support for tables without primary keys using UNIQUE index or REPLICA IDENTITY FULL will be added in a few weeks.
- Handling Schema Changes will be added in a few weeks.
Support
If you run into any issues, join our slack channel and reach out to us. You can file an issue on our gihub repository or reach out to founders@peerdb.io . We will follow up!