> ## Documentation Index
> Fetch the complete documentation index at: https://docs.peerdb.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Creating Mirrors

> Guide to creating synchronization jobs in PeerDB

PeerDB introduces CREATE MIRROR command for blazing fast syncs between Peers. This section captures different types of MIRRORs and how to kick them off.

## Command Reference

### MIRROR for CDC

This type of MIRROR streams change feed in real-time from source peer to target peer.
The table mapping section supports two syntaxes, show below, and both can be used in the same mapping.
The second syntax with the JSON-like structure is useful when you want to exclude certain columns from the sync with `exclude:[]`

<Info>
  Note that for BigQuery and Clickhouse mirrors, the target tables should not be schema qualified.
</Info>

```sql theme={null}
CREATE MIRROR [IF NOT EXISTS] <mirror_name>
FROM <source_peer> TO <target_peer>
WITH TABLE MAPPING
(
  <schema_name>.<source_table_name>:<schema_name>.<target_table_name>,
  <schema_name>.<source_table_name>:<schema_name>.<target_table_name>,
  {
    from: <schema_name>.<source_table_name>,
    to: <schema_name>.<target_table_name>,
    exclude: [column1, column2 ,column3...]
  }
)
WITH (
  do_initial_copy = <true|false>, -- required
  max_batch_size = <max_batch_size>,
  sync_interval = <sync_interval>,
  publication_name = '<publication_name>',
  replication_slot_name = '<replication_slot_name>', -- applicable only when do_initial_copy is false
  snapshot_num_rows_per_partition = <number_of_rows_in_each_partition>,
  snapshot_max_parallel_workers = <number_of_threads_per_table>,
  snapshot_num_tables_in_parallel = <number_of_tables_in_parallel>,
  soft_delete = <true|false>,
  synced_at_col_name = '<column_name>',
  soft_delete_col_name = '<column_name>'
);
```

1. **do\_initial\_copy**: `true` if you want to include initial snapshot of tables as a part of the MIRROR, else `false`.
2. **max\_batch\_size**: Maximum number of rows to be synced in a single batch.
3. **sync\_interval**: Interval in seconds at which the mirror should sync the changes.
4. **publication\_name**: Name of the publication on the source peer. This is optional as PeerDB creates the publication it needs anyway.
   By creating a publication and providing it here, however, you do not need to provide write access to the database and tables (since that's needed for PeerDB to create the publication).
5. **replication\_slot\_name**: Name of the replication slot on the source peer. Do not set this if initial load is enabled.
   This is optional as PeerDB creates the slot it needs by default. A use-case for this setting is when you want to resume a change-data capture from an existing replication slot.
6. **snapshot\_num\_tables\_in\_parallel**: In the initial snapshot, number of tables to be snapshotted at in parallel.
7. **snapshot\_num\_rows\_per\_partition**: In the initial snapshot, PeerDB divides the rows of each table into partitions of these many rows, and syncs those partitions.
8. **snapshot\_max\_parallel\_workers**: In the initial snapshot, number of threads used per table. Max number of concurrent connections on source will be `snapshot_num_tables_in_parallel * snapshot_max_parallel_workers`
9. **soft\_delete**: `true` if you want to soft delete rows on the target peer when they are deleted on the source peer.
10. **synced\_at\_col\_name**: Column name to be used for tracking the last synced timestamp. Default is `_PEERDB_SYNCED_AT`.
11. **soft\_delete\_col\_name**: Column name to be used for tracking the soft delete status. Default is `_PEERDB_IS_DELETED`.

### MIRROR for Streaming Query Results

This type of MIRROR periodically streams query results on the source peer to the target peer.

```sql theme={null}
CREATE MIRROR <mirror_name> [IF NOT EXISTS] FROM
	<source_peer> TO <target_peer> FOR
$$
  SELECT * FROM <source_table_name> WHERE
  <watermark_column> BETWEEN {{.start}} AND {{.end}}
$$
WITH (
	destination_table_name = '<schema_qualified_destination_table_name>',
	watermark_column = '<watermark_column>',
	watermark_table_name = '<table_on_source_on_which_watermark_filter_should_be_applied>',
	mode = '<mode>',
	unique_key_columns = '<unique_key_columns>',
	parallelism = <parallelism>,
	refresh_interval = <refresh_interval_in_seconds>,
  num_rows_per_partition = 100000,
  initial_copy_only = <true|false>,
  setup_watermark_table_on_destination = <true|false>,
);
```

The `OPTIONS` clause provides granular control when configuring the MIRROR:

1. **watermark\_column (Required)** represents a sequentially incrementing integer or timestamp column. PeerDB uses this column to keep track of the processed rows and the ones that need to be processed.
2. **watermark\_table (Required)** is the table that PeerDB iterates sequentially based on the watermark\_column. The fact table is a good candidate for the watermark\_table.
3. **mode (Required)** tells PeerDB whether it should append data to the target or perform an upsert operation (update if the row exists).
   1. If your data is append-only, a sequential ID or a created\_at column can be a good watermark, and you can choose the mode **append.**
   2. If your data is updated, make sure to have an "updated\_at" column for PeerDB to identify the most recent version of the row. Choose the mode **upsert** to update already existing rows on the target.
      1. **unique\_key:** In the upsert mode, you need to specify a set of columns (unique\_key) that uniquely identify a row. This helps perform the upsert operation.
4. **parallelism (Optional)** represents the number of threads used to read from the source and sync to the target. This helps with workload management on the source and target.
   1. If you have a powerful PostgreSQL server or if it is a read-replica, you can configure a higher parallelism value. In scenarios where you want to put less load on the source, you can choose a lower value.
5. **refresh\_interval (Optional)** helps configure the frequency (in seconds) at which the sync should run. If not specified, PeerDB attempts to sync new rows every **10** seconds.
6. **initial\_copy\_only (Optional)**: If only a one-time copy needs to be done, this can be set to true. The mirror stops after a one-time data load. Apart from specifying **append** or **upsert** modes, a new mode **overwrite** can be specified when this option is set to true. This truncates all destination tables before starting the mirror, and is functionally identical to **append** otherwise.
7. **num\_rows\_per\_parition (Required)**: PeerDB internally splits a table into partitions of a certain size which it distributes amongst the worker threads. This option allows you to tune the size of each partition based on memory or network requirements.
8. **setup\_watermark\_table\_on\_destination (Optional)**: If set to true, a table with the same schema as the watermark table is created on the destination peer. Its name will be set to **destination\_table\_name**.
9. More details on the OPTIONs can be found [here](https://github.com/PeerDB-io/peerdb/issues/139).
10. `if_not_exists`: If specified, the MIRROR will be created only if it does not exist.

#### XMIN Query Replication

PeerDB supports `xmin` as watermark column. Note that DELETEs are not supported in XMIN Query Replication.

```sql theme={null}
CREATE MIRROR xmin_mirror_1 FROM pg_test_peer TO sf_test_peer FOR
$$ SELECT * FROM users $$
WITH (
	destination_table_name = 'public.users',
	watermark_column = 'xmin',
	watermark_table_name = 'public.users',
	mode = 'append', -- upsert (only updates) and overwrite are also supported
	parallelism = 10,
	refresh_interval = 30,
	num_rows_per_partition = 100
);
```
