SQL Quickstart Guide
Get started with PeerDB in a few simple steps.
Deploying PeerDB
We currently support deploying and testing PeerDB using our Docker Compose file.
Docker can be installed by referring to these instructions.
The docker compose
tool should also be present. The Postgres client tools (specifically
psql
) are used to test the PeerDB installation.
-
Run the following commands in your terminal:
git clone --recursive https://github.com/PeerDB-io/peerdb.git cd peerdb # Run docker containers: peerdb-server, postgres as catalog, temporal. # This might take a few minutes, so get a cup of coffee! :) docker compose up
-
Connect to PeerDB:
psql "port=9900 host=localhost password=peerdb"
That’s it! You now have PeerDB up and running on your machine, ready to query away!
Quickstart
The following steps assume you have PeerDB running locally. We also need the Postgres
client tools (specifically psql
and pgbench
) installed and available on your
PATH.
Setup
This script reuses the PeerDB internal Postgres instance to setup two databases, which we can use to create two peers. It also creates some tables on both databases to use later in the quickstart. Run the following commands in your terminal:
curl -O https://peerdb-sample-data.s3.us-east-2.amazonaws.com/quickstart_prepare_peers.sh
chmod +x quickstart_prepare_peers.sh
./quickstart_prepare_peers.sh
Creating Peers
With PeerDB running, we can create our first peers.
Connect to PeerDB via psql and run the below commands to create the source
and target
peers:
psql "port=9900 host=localhost password=peerdb"
CREATE PEER source FROM POSTGRES WITH
(
host = 'catalog',
port = '5432',
user = 'postgres',
password = 'postgres',
database = 'source'
);
CREATE PEER target FROM POSTGRES WITH
(
host = 'catalog',
port = '5432',
user = 'postgres',
password = 'postgres',
database = 'target'
);
Real-time CDC
Run the following command to kick off Change Data Capture (CDC) based streaming from source peer to target peer for the test
table.
--- CDC mirrors automatically create the destination table
CREATE MIRROR cdc_mirror FROM source TO target
WITH TABLE MAPPING (public.test:public.test)
WITH(do_initial_copy = true);
Now that our CDC mirror is set up, it will automatically replicate data from the source to the target peer. Let’s insert a row in the source table via PeerDB itself.
--- Verify that the source table has zero rows
SELECT id,c1,c2,t FROM source.public.test;
--- Insert a row into the source table
INSERT INTO source.public.test(c1, c2, t) VALUES(1, 2, 'oathbringer');
Within a few seconds, the row should be replicated to the target table. Let’s verify this by running the following command:
--- CDC mirrors replicate data as-is, so the row should be identical.
SELECT id,c1,c2,t FROM target.public.test;
The output should look like this:
peerdb=> SELECT id,c1,c2,t FROM target.public.test;
id | c1 | c2 | t
----+----+----+-------------
1 | 1 | 2 | oathbringer
(1 row)
The above MIRROR takes care of replicating all DML commands (INSERT, UPDATE, DELETE) from source to target.
Streaming Query from PostgreSQL
Run the following command to kick off query based streaming from source peer to target peer. You are able to mask c1
,c2
and t
columns by hashing them prior to streaming them to the target.
CREATE MIRROR qrep_mirror FROM source TO target
FOR $$
SELECT id, hashint4(c1) c1, hashint4(c2) c2, md5(t) AS t
FROM test WHERE id BETWEEN {{.start}} AND {{.end}}
$$ WITH (
watermark_table_name='public.test',
watermark_column='id',
num_rows_per_partition = 10000,
destination_table_name='public.test_transformed',
setup_watermark_table_on_destination=true,
mode='append'
);
/*'append' mode indicates that the records in the source
table are write-once and never updated.*/
Now our mirror is set up. You should see the results of the above SELECT query replicated on the target peer.
SELECT id, c1 as hash_c1, c2 as hash_c2, t as hash_t FROM target.public.test_transformed;
The output should look like this:
peerdb=> SELECT * FROM target.public.test_transformed;
id | hash_c1 | hash_c2 | hash_t
--+-------------+------------+----------------------------------
1 | -1905060026 | 1134484726 | 0e6014773782cbc514100d54328da18f
(1 row)
Mirrors are continuous
As you add more rows to the test
table on the source, the above MIRRORs take care of replicating the data to both test
and test_transformed
table on the target.
Insert a new row on the source:
-- insert a new row
INSERT INTO source.public.test(c1, c2, t) VALUES(1, 2, 'oauthereceiver');
-- Check data on the source
SELECT id,c1,c2,t FROM source.public.test;
Check the new row added to target tables
-- After a few seconds, check the second row added to the target tables
SELECT id,c1,c2,t FROM target.public.test;
SELECT * FROM target.public.test_transformed;
FAQ
If you have any questions about the PeerDB setup and deployment process, don’t hesitate to reach out on Slack. We’re more than happy to assist and answer any questions, including:
- What is the performance I can expect during CDC and query based replication?
- How do I know my data sync is successful?
- Can I set a specific time to start my data sync?
If there are any unanswered questions, we’d love to help you get started. Feel free to ask your questions on our community Slack channel.
In addition to this, if you require direct access to our team for any assistance, don’t hesitate to contact us to discuss our premium support offerings.