Creating Peers
Guide to creating Peers in PeerDB
PeerDB currently supports the below types of Peers. To connect a database to PeerDB, you need to create a Peer. A Peer is a connection to a database that PeerDB can query. Peers are created using the CREATE PEER
command.
- BigQuery
- Snowflake
- PostgreSQL
- SQL Server
- MySQL
- Storage Peers - S3 and Google Cloud
- Azure EventHubs
- ClickHouse
BigQuery Peer
Create Peer using UI
Using the PeerDB UI (localhost:3000), create the Snowflake Peer. See below video as reference:
Create Peer using connection string
Authentication
PeerDB authenticates the creation of a BigQuery peer based on the service account. All the fields within the CREATE PEER
command can be acquired from a service account JSON.
If you have a service account key, you can use that or create a new one dedicated for PeerDB.
Make sure you have the right permissions for the service account, PeerDB requires at least read permissions for querying. For MIRROR it requires both read and write.
Considerations
- Dataset Specific: PeerDB only supports querying tables within a dataset. Cross dataset joins are not supported yet.
- Supported Datatypes: All string, numeric and timestamp based datatypes are supported for querying. Record and Struct data-types are projected as JSONB. JSON datatype in BQ is not supported yet.
- SQL Coverage: Most SQL constructs in reads incl. Simple Selects, JOINs, aggregations, window functions, CTEs etc are supported. You can run both Postgres compatible and BQ compatible queries through PeerDBs interface.
- Error Handling: If a query fails on the BigQuery side because of lack of auth or query coverage or timeout, PeerDB handles that error and propagates the entire message as a JSON text to the end-user. We also capture this ERROR message within PeerDB logs.
Snowflake Peer
Create Peer using UI
Using the PeerDB UI (localhost:3000), create the Snowflake Peer. See below video as reference:
Create Peer using connection string
Granting permissions
PeerDB needs to be able to create, read and write to objects in the database you provide. The following commands ensure a smooth mirror:
More information can be found here.
Authentication
PeerDB authenticates the creation of a Snowflake peer based key pair authentication. Encrypted private keys are also supported. Refer to this doc to create key pair, assign them to the right role and help populate the above fields.
Considerations
- Database Specific: PeerDB only supports querying tables within a database. Cross database joins are not supported yet.
- Supported Datatypes: All string, numeric and timestamp based datatypes are supported for querying.
- SQL Coverage: Most SQL constructs in reads incl. Simple Selects, JOINs, aggregations, window functions, CTEs etc are supported. You can run both Postgres compatible and SF compatible queries through PeerDBs interface.
- Error Handling: If a query fails on the Snowflake side because of lack of auth or query coverage or timeout, PeerDB handles that error and propagates the entire message as a JSON text to the end-user. We also capture this ERROR message within PeerDB logs.
PostgreSQL Peer
Create Peer using UI
Using the PeerDB UI (localhost:3000), create the Postgres Peer. See below video as reference:
Create Peer using connection string
If you are connecting to a Docker Postgres instance on the same docker network as PeerDB, then the hostname will be the service name of the Postgres server and the port will be 5432
.
Considerations
- Supported Datatypes: All string, numeric and timestamp based datatypes are supported for querying. We are actively addding support for other types.
- Unsupported COPY command: COPY command is not supported yet.
SSH Tunneling Configuration (Optional)
You can connect to a PostgreSQL database through a bastion host using SSH tunneling. Here’s a guide on setting up an SSH tunnel for PeerDB.
Then, create a PostgreSQL peer using an SSH tunnel:
Note that the input for the private key here expects a base64 encoded private key. The host key input is optional and is for preventing MITM attacks.
Storage Peers - S3 and GCS
Create Peer using UI
Using the PeerDB UI (localhost:3000), create the Snowflake Peer. See below video as reference:
Create Peer using connection string
For CDC, you would require a PostgreSQL instance to use as an external metadata store. The CREATE PEER command for S3/GCS depends on how you set the following parameters.
Note that if you leave any of these parameters empty, PeerDB will use the corresponding Docker compose environment variable instead:
Considerations
- The bucket specified in the URL must already exist.
- Only
.avro
file format is currently supported. - Currently only data-movement is supported for this type of peer. Querying this peer from PeerDB’s interface is not supported.
Clickhouse Peer
You can create a Clickhouse peer using the following SQL syntax:
- For local docker installations of Clickhouse and if you’re using PeerDB OSS, the host will be
host.docker.internal
, with the port being the external port. - The
disable_tls
parameter is optional and can be set totrue
if you are using a Clickhouse instance without TLS. - PeerDB maintains an ephemeral internal stage for Clickhouse using a min.io bucket.
You can configure your own bucket (S3/min.io/GCS) by passing the following optional parameters:
SQLServer Peer
If you are connecting to a Docker SQL Server/Azure SQL Edge instance on the same docker network as PeerDB, then the hostname will be the service name of the SQL server and the port will be 5432
.
Considerations
- Currently only data-movement is supported for this type of peer. Querying this peer from PeerDB’s interface is not supported.
MySQL Peer
Considerations
- Currently only querying is supported for this type of peer. Mirrors are not supported.
setup
is optional, but can be used to run queries when connection setup. Useful for setting transactions as read only by default, or setting catalog/database withuse catalog.database
for StarRocks
EventHub Peer
To create an Eventhub peer, use the following SQL syntax:
Parameters:
eventhubs
: A JSON array of details of namespaces. Each item should have the following fields:subscription_id
: The ID of the Azure subscription.resource_group
: The name of the Azure resource group.namespace
: The namespace for the Eventhub.location
: The location of the Eventhub namespace’s resource group, e.g., eastus.partition_count
: The number of partitions for the Eventhub.message_retention_in_days
: The number of days the messages will be retained.
unnest_columns
: Specifies the columns you’d like to unnest
In order to use this peer in a self-hosted PeerDB instance, your Azure credentials must be filled in our PeerDB docker-compose file in the environments section of the flow-worker
service:
Considerations
- Currently only CDC data-movement is supported for this type of peer. Querying this peer from PeerDB’s interface is not supported.