Pg pg tutorial
Using PeerDB
Peer Creation
The building block of a PeerDB deployment is a peer, which could represent a database from a wide variety of providers. We can query these peers using the same Postgres dialect of SQL and also run mirrors to migrate data between two peers. To begin with, let us create two Postgres peers on the same cluster that PeerDB uses for internal state management. If you have another Postgres cluster, you can use that as well! Just be sure to substitute the connection details wherever needed.
Let’s begin by connecting to the PeerDB internal cluster [which we will call catalog from now on]:
Then execute the following commands:
Now switch over to peerone
and create some sample tables.
All done! Keep this terminal open though, we’ll come to it later.
Now coming back to the terminal where we are connected to PeerDB itself, we run the following commands to create two Postgres peers.
Now check that we are able to query the two tables we created on peerone
.
CDC Mirror
Now imagine the two tables we created in peerone
have a lot of records being inserted, updated and deleted concurrently. We want a live copy of these tables in peertwo
. We don’t want to change the data in any way, just Extract it from peerone
and Load it in peertwo
.
PeerDB supports this via CDC mirrors, which builds on Postgres’ logical replication capablities and allows efficient synchronization between the state of two tables.
After a few seconds, you should be able to see the destination tables created automatically on peertwo
.
To simulate the desired usage patterns on the two tables, we’ll use pgbench
which is a standard tool for Postgres testing. It usually comes bundled with psql
, so it should already be present on your system.
These 3 pgbench scripts insert records in the two tables with random data, while the other two scripts attempt to update or delete a record if it updates. By running them concurrently, we can roughly simulate the sort of load we want.
After saving these scripts to files, we run them in parallel like so:
[For those who are curious, the flags are asking pgbench
to run each script on 8 parallel connections on 2 threads, for a time period of 120 seconds and a progress report every 60 seconds.]
[NOTE: Windows Powershell or Command Prompt users cannot run commands in parallel like this, there are other ways but it would probably be easier to run them in 3 seperate terminals.]
As soon as the rows are inserted into the source tables, PeerDB will start receiving logical replication messages and perform the same operations on the destination tables. We can observe the process from psql in the other terminal.
We can confirm that the records in both tables are exactly as the records in the source tables. Do note that rows present in the source tables before the mirror was created are not replicated currently.
QRep Mirror
After some brainstorming, we realize that having a copy of both tables is unnecessary when all we do is run this one query against them.
It would therefore be ideal if we could just get the results of this query in one table. Well, PeerDB can do this too! PeerDB supports something called query replication [QRep] mirrors. They allow you to Extract the data from the source tables, Transform it in some way and then Load it into the destination tables.
We need to first create the table that the results are replicated to, keeping in mind that the results of the query we run must map to the columns of the destination table.
Now in the PeerDB terminal, we kickoff the QRep mirror.
Note that this QRep mirror is operating in ‘append’ mode, where rows are inserted once and changes to them are not tracked. This is fine for our purposes because we are only tracking rows that don’t change.
Within a few seconds, you should see the existing data in peerone
and peertwo
in the table we created.
We can even run our pgbench
scripts from earlier and the changes should reflect in qrep_dst
. Also since our CDC mirror from before is still running, the new rows will reflect there as well.