Deploying PeerDB

We currently support deploying and testing PeerDB using our Docker Compose file, support for Kubernetes and other means of deployment are in development.

Docker can be installed by referring to these instructions. The docker compose tool should also be present. The Postgres client tools (specifically psql) are used to test the PeerDB installation.

  1. Run the following commands in your terminal:

    git clone --recursive https://github.com/PeerDB-io/peerdb.git
    cd peerdb
    
    # Run docker containers: peerdb-server, postgres as catalog, temporal.
    # This might take a few minutes, so get a cup of coffee! :)
    export COMPOSE_PROJECT_NAME=peerdb-stack
    docker compose up
    
  2. Connect to PeerDB:

    psql "port=9900 host=localhost password=peerdb"
    

That’s it! You now have PeerDB up and running on your machine, ready to query away!

Quickstart

The following steps assume you have PeerDB running locally. We also need the Postgres client tools (specifically psql and pgbench) installed and available on your PATH.

Setup

This script reuses the PeerDB internal Postgres instance to setup two databases, which we can use to create two peers. It also creates some tables on both databases to use later in the quickstart. Run the following commands in your terminal:

curl -O https://peerdb-sample-data.s3.us-east-2.amazonaws.com/quickstart_prepare_peers.sh
chmod +x quickstart_prepare_peers.sh
./quickstart_prepare_peers.sh

Creating Peers

With PeerDB running, we can create our first peers.

Connect to PeerDB via psql and run the below commands to create the source and target peers:

psql "port=9900 host=localhost password=peerdb"
CREATE PEER source FROM POSTGRES WITH
(
  host = 'catalog',
  port = '5432',
  user = 'postgres',
  password = 'postgres',
  database = 'source'
);

CREATE PEER target FROM POSTGRES WITH
(
  host = 'catalog',
  port = '5432',
  user = 'postgres',
  password = 'postgres',
  database = 'target'
);

Real-time CDC

Run the following command to kick off Change Data Capture (CDC) based streaming from source peer to target peer for the test table.

--- CDC mirrors automatically create the destination table
CREATE MIRROR cdc_mirror FROM source TO target
WITH TABLE MAPPING (public.test:public.test)
WITH(do_initial_copy = true);

Now that our CDC mirror is set up, it will automatically replicate data from the source to the target peer. Let’s insert a row in the source table via PeerDB itself.

--- Verify that the source table has zero rows
SELECT id,c1,c2,t FROM source.public.test;
--- Insert a row into the source table
INSERT INTO source.public.test(c1, c2, t) VALUES(1, 2, 'oathbringer');

Within a few seconds, the row should be replicated to the target table. Let’s verify this by running the following command:

--- CDC mirrors replicate data as-is, so the row should be identical.
SELECT id,c1,c2,t FROM target.public.test;

The output should look like this:

peerdb=> SELECT id,c1,c2,t FROM target.public.test;
id | c1 | c2 |      t
----+----+----+-------------
1 |  1 |  2 | oathbringer
(1 row)

The above MIRROR takes care of replicating all DML commands (INSERT, UPDATE, DELETE) from source to target.

Streaming Query from PostgreSQL

Run the following command to kick off query based streaming from source peer to target peer. You are able to mask c1,c2 and t columns by hashing them prior to streaming them to the target.

  CREATE MIRROR qrep_mirror FROM source TO target
  FOR $$
      SELECT id, hashint4(c1) c1, hashint4(c2) c2, md5(t) AS t
      FROM test WHERE id BETWEEN {{.start}} AND {{.end}}
  $$ WITH (
      watermark_table_name='public.test',
      watermark_column='id',
      num_rows_per_partition = 10000,
      destination_table_name='public.test_transformed',
      setup_watermark_table_on_destination=true,
      mode='append'
  );
  /*'append' mode indicates that the records in the source
  table are write-once and never updated.*/

Now our mirror is set up. You should see the results of the above SELECT query replicated on the target peer.

  SELECT id, c1 as hash_c1, c2 as hash_c2, t as hash_t FROM target.public.test_transformed;

The output should look like this:

  peerdb=> SELECT * FROM target.public.test_transformed;
  id |   hash_c1   |  hash_c2   |              hash_t
  --+-------------+------------+----------------------------------
  1 | -1905060026 | 1134484726 | 0e6014773782cbc514100d54328da18f
  (1 row)

Mirrors are continuous

As you add more rows to the test table on the source, the above MIRRORs take care of replicating the data to both test and test_transformed table on the target.

Insert a new row on the source:

  -- insert a new row
  INSERT INTO source.public.test(c1, c2, t) VALUES(1, 2, 'oauthereceiver');
  -- Check data on the source
  SELECT id,c1,c2,t FROM source.public.test;

Check the new row added to target tables

  -- After a few seconds, check the second row added to the target tables
  SELECT id,c1,c2,t FROM target.public.test;
  SELECT * FROM target.public.test_transformed;

FAQ

If you have any questions about the PeerDB setup and deployment process, don’t hesitate to reach out on Slack. We’re more than happy to assist and answer any questions, including:

  • What is the performance I can expect during CDC and query based replication?
  • How do I know my data sync is successful?
  • Can I set a specific time to start my data sync?

If there are any unanswered questions, we’d love to help you get started. Feel free to ask your questions on our community Slack channel.

In addition to this, if you require direct access to our team for any assistance, don’t hesitate to contact us to discuss our premium support offerings.