Version:

Tutorial: Adding a Node to a Cluster with Zero Downtime

In this detailed walkthrough, we add a fourth node to a three-node cluster with zero downtime. In our example, the cluster nodes are n1 (the source node), n2, n3, and the new node is n4.

Warning

Important Prerequisites and Warnings

The new node should not be accessible to users while adding the node.
Disable auto_ddl on all cluster nodes.
Do not modify your DDL during node addition.
The users must be identical on the source and target node. You must create any users on the target node before proceeding; the permissions must be identical for all users on both the source and target nodes.
The ZODAN process validates that all users from the source node exist on the new node before proceeding with cluster addition. This prevents replication failures caused by missing user permissions.
The Spock configuration must be identical on both the source and the target node.
All nodes in your cluster must be available to Spock for the duration of the node addition.
The procedure should be performed on the new node being added.
The dblink extension must be installed on the node from which commands like SELECT spock.add_node() are being run.
Prepare the new node to meet all of the prerequisites described here.
If the process fails, do not immediately retry a command until you ensure that all artifacts created by the workflow have been removed.

Creating a Node Manually

If you are not using spock.node_create to create the new node, you will need to complete the following steps:

Initialize the new node with initdb.
Create a database.
Create a database user.
Follow the instructions at the GitHub repository to build and install Spock on the database.
Add spock to the shared_preload_library parameter in the postgresql.conf file.
Restart the server to update the configuration.
Use the CREATE EXTENSION spock command to create the Spock extension.

Sample Configuration and New Node Creation Steps

In our example, the cluster configuration details include the following information:

Database: inventory
User: pgedge
Password: 1safepassword
n1 (source): port 5432
n2 (replica): port 5433
n3 (replica): port 5434
n4 (new node): port 5435
Host: 127.0.0.1

The steps used to configure the new node (n4) include the following commands. The commands initialize the database and install Spock:

# Initialize database
initdb -D /path/to/data

# Start Postgres
pg_ctl -D /path/to/data start

# Create database and user
psql -c "CREATE DATABASE inventory;"
psql -c "CREATE USER pgedge WITH PASSWORD '1safepassword';"
psql -c "GRANT ALL ON DATABASE inventory TO pgedge;"

# Install Spock extension
psql -d inventory -c "CREATE EXTENSION spock;"
psql -d inventory -c "CREATE EXTENSION dblink;"

Using the Zodan Procedure to Add a Node

After creating the node, you can use Zodan scripts to simplify adding a node to a cluster.

To use the SQL script, connect to the new node that you wish to add to the pgEdge cluster. In the following example, the psql command connects to the new node:

psql -h 127.0.0.1 -p 5432 -d inventory -U pgedge

Load the Zodan procedures with the following command:

\i /path/to/zodan.sql

Then, use spock.add_node() from the new node to create the node definition. In the following example, the spock.add_node procedure adds node n4 to the cluster:

CALL spock.add_node(
    src_node_name := 'n1',
    src_dsn := 'host=127.0.0.1 dbname=inventory port=5432 user=pgedge password=1safepassword',
    new_node_name := 'n4',
    new_node_dsn := 'host=127.0.0.1 dbname=inventory port=5435 user=pgedge password=1safepassword',
    verb := true,
    new_node_location := 'Los Angeles',
    new_node_country := 'USA',
    new_node_info := '{"key": "value"}'::jsonb
);

The spock.add_node function executes the steps required to add a node to the cluster; a detailed explanation of the steps performed follows below.

Should a problem occur during this process, you can source the zodremove.sql script and call the spock.remove_node procedure to remove the node or reverse partially completed steps. The spock.remove_node procedure should be called on the node being removed.

Manually Adding a Node to a Cluster

The steps that follow outline the process Zodan goes through when adding a node. You can manually perform the same steps to add a node to a cluster instead of using spock.add_node above.

Check the Spock Version Compatibility

Before starting the node addition, verify that all nodes (n1, n2, n3, and n4) are running the exact same version of Spock. This prevents compatibility issues during replication setup.

Info

If any node has a different version, the process will abort with an error.

On the orchestrator node, check the Spock version with the following commands:

-- Check source node version
SELECT extversion 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5432 user=pgedge password=1safepassword',
    'SELECT extversion FROM pg_extension WHERE extname = ''spock'''
) AS t(version text);
-- Expected: 5.0.4

-- Check new node version
SELECT extversion 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=pgedge password=1safepassword',
    'SELECT extversion FROM pg_extension WHERE extname = ''spock'''
) AS t(version text);
-- Expected: 5.0.4

-- Check all existing cluster nodes (n2, n3)
SELECT node_name, version
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5432 user=alice password=1safepassword',
    'SELECT n.node_name, 
            (SELECT extversion FROM pg_extension WHERE extname = ''spock'') as version
     FROM spock.node n'
) AS t(node_name text, version text);

-- Expected output:
--  node_name |   version    
-- -----------+--------------
--  n1        | 5.0.4
--  n2        | 5.0.4
--  n3        | 5.0.4

Validate Prerequisites

You should ensure that the new node is in good condition before adding the node to your replicating cluster.

Perform safety checks to ensure node n4 is a clean slate:

The n4 node must have the database and Spock extension installed.
The n4 node must have no user data or existing replication configuration.
Verify that a node named n4 does not already exist in the cluster.

Info

If any prerequisite fails, the process aborts to prevent conflicts.

Confirm that the Database Exists on n4

Try to connect to the database on n4. If this fails, the database does not exist and needs to be created first.

In the following example, the query checks whether the database exists on n4:

SELECT 1 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'SELECT 1'
) AS t(dummy int);
-- If no error, database exists

Confirm that n4 Has No User-Created Tables

The new node must be empty; there can be no user tables in user schemas (system tables and extension tables are acceptable). If user tables exist, ZODAN will abort because syncing would overwrite existing data.

In the following example, the query checks for user-created tables on n4:

SELECT count(*) 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'SELECT count(*) FROM pg_tables 
     WHERE schemaname NOT IN (''information_schema'', ''pg_catalog'', ''pg_toast'', ''spock'')
     AND schemaname NOT LIKE ''pg_temp_%''
     AND schemaname NOT LIKE ''pg_toast_temp_%'''
) AS t(count integer);
-- Expected: 0

Confirm that n4 Does Not Exist in the Cluster

Check if a node with the name n4 is already registered in the cluster. If the node exists, ZODAN aborts to prevent duplicate node names. The following example checks to see if n4 already exists in the cluster:

-- On orchestrator node (connected to n1)
SELECT count(*) FROM spock.node WHERE node_name = 'n4';
-- Expected: 0

Confirm that n4 Contains No Subscriptions

In the following example, the query checks whether n4 has any existing subscriptions:

SELECT count(*) 
FROM spock.subscription s 
JOIN spock.node n ON s.sub_origin = n.node_id 
WHERE n.node_name = 'n4';
-- Expected: 0

Confirm that n4 Contains No Replication Sets

In the following example, the query checks whether n4 has any replication sets:

SELECT count(*) 
FROM spock.replication_set rs 
JOIN spock.node n ON rs.set_nodeid = n.node_id 
WHERE n.node_name = 'n4';
-- Expected: 0

Create a Replication Node on n4

After confirming the state of the new node, you can add the node to your cluster. In this step, you will create a Spock node object on n4:

This registers n4 as a participant in the replication cluster.
Think of this like giving n4 an ID badge that says "I'm part of this cluster now."

Info

After this step, n4 is visible in the spock.node table on all nodes.

Create a Node on n4

Using dblink, connect to n4 and run spock.node_create() to register the node. The DSN tells other nodes how to connect to n4.

In the following example, the spock.node_create function registers n4 as a cluster participant:

-- Via dblink to n4
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'SELECT spock.node_create(
        node_name := ''n4'',
        dsn := ''host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword'',
        location := ''Los Angeles'',
        country := ''USA'',
        info := ''{\"key\": \"value\"}''::jsonb
    )'
) AS t(node_id oid);

-- Expected output:
--  node_id 
-- ---------
--    16389

Confirm the Initial Node Count

In the following example, the query retrieves the current node count:

SELECT count(*) 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5432 user=alice password=1safepassword',
    'SELECT count(*) FROM spock.node'
) AS t(count integer);
-- Expected: 4 (n1, n2, n3, n4)

Create Disabled Subscriptions and Slots

This is the critical bookmarking stage. During this stage, the process completes the following steps:

Create subscriptions from n2 to n4 and n3 to n4, but keep them disabled.
Before creating each subscription, trigger a sync_event() which returns an LSN (log sequence number). You can think of the LSN as a bookmark in the replication stream.
Save these bookmarks in a temporary table for later use.
When enabling these subscriptions later, transactions will start from exactly the right position. This ensures no data is missed or duplicated.
Subscriptions remain disabled because n4 should get all initial data from n1 (the source) in one clean sync.

Info

The n2 to n4 and n3 to n4 subscriptions are just being prepared now, but will not start moving data until later.

Identify All Existing Nodes

Query the cluster to find all of the nodes (n1, n2, n3, n4) and their connection strings. In the following example, the query retrieves all nodes and their DSNs:

SELECT n.node_name, i.if_dsn 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5432 user=alice password=1safepassword',
    'SELECT n.node_name, i.if_dsn 
     FROM spock.node n 
     JOIN spock.node_interface i ON n.node_id = i.if_nodeid'
) AS t(node_name text, if_dsn text);

-- Expected output:
--  node_name |                              if_dsn                              
-- -----------+------------------------------------------------------------------
--  n1        | host=127.0.0.1 dbname=inventory port=5432 user=alice password=...
--  n2        | host=127.0.0.1 dbname=inventory port=5433 user=alice password=...
--  n3        | host=127.0.0.1 dbname=inventory port=5434 user=alice password=...
--  n4        | host=127.0.0.1 dbname=inventory port=5435 user=alice password=...

For n2 to n4: Trigger Sync Event on n2 and Store LSN

Call sync_event() on n2, which inserts a special marker into the n2 replication stream and returns the LSN where the marker was inserted. Store this LSN (0/1A7D1E0) in a temp table. Later, when enabling the sub_n2_n4 subscription, use this stored LSN to ensure the subscription starts from exactly this point, guaranteeing no data loss.

In the following example, commands trigger a sync event and store the LSN:

-- Trigger sync event on n2
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5433 user=alice password=1safepassword',
    'SELECT spock.sync_event()'
) AS t(sync_lsn pg_lsn);
-- Returns: 0/1A7D1E0

-- Store in temp table
CREATE TEMP TABLE IF NOT EXISTS temp_sync_lsns (
    origin_node text PRIMARY KEY,
    sync_lsn text NOT NULL
);

INSERT INTO temp_sync_lsns (origin_node, sync_lsn)
VALUES ('n2', '0/1A7D1E0')
ON CONFLICT (origin_node) DO UPDATE SET sync_lsn = EXCLUDED.sync_lsn;

Create a Replication Slot on n2

A replication slot is like a queue that holds all changes from n2 that need to be sent to n4. Even though the subscription is disabled, the slot starts collecting changes immediately. This ensures no data is lost between now and when the subscription is enabled.

In the following example, the commands create a replication slot on n2:

-- On n2
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5433 user=alice password=1safepassword',
    'SELECT slot_name, lsn 
     FROM pg_create_logical_replication_slot(
         ''spk_inventory_n2_sub_n2_n4'',
         ''spock_output''
     )'
) AS t(slot_name text, lsn pg_lsn);

-- Expected output:
--         slot_name         |    lsn     
-- --------------------------+------------
--  spk_inventory_n2_sub_n2_n4 | 0/1A7D1E8

Create Disabled Subscription on n4 from n2

We create the subscription object on n4, but with enabled := false. This tells n4 you will eventually subscribe to n2, but not yet. The subscription knows about the slot on n2 but is not actively pulling data yet.

-- On n4
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'SELECT spock.sub_create(
        subscription_name := ''sub_n2_n4'',
        provider_dsn := ''host=127.0.0.1 dbname=inventory port=5433 user=alice password=1safepassword'',
        replication_sets := ARRAY[''default'', ''default_insert_only'', ''ddl_sql''],
        synchronize_structure := false,
        synchronize_data := false,
        forward_origins := ARRAY[]::text[],
        apply_delay := ''0''::interval,
        enabled := false,
        force_text_transfer := false,
        skip_schema := ARRAY[]::text[]
    )'
) AS t(subscription_id oid);

Repeat the Previous Steps for n3 to n4

In the following example, the commands perform the same operations for n3 to n4:

-- Trigger sync event on n3
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5434 user=alice password=1safepassword',
    'SELECT spock.sync_event()'
) AS t(sync_lsn pg_lsn);
-- Returns: 0/1B8E2F0

-- Store LSN
INSERT INTO temp_sync_lsns (origin_node, sync_lsn)
VALUES ('n3', '0/1B8E2F0')
ON CONFLICT (origin_node) DO UPDATE SET sync_lsn = EXCLUDED.sync_lsn;

-- Create replication slot on n3
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5434 user=alice password=1safepassword',
    'SELECT slot_name, lsn 
     FROM pg_create_logical_replication_slot(
         ''spk_inventory_n3_sub_n3_n4'',
         ''spock_output''
     )'
) AS t(slot_name text, lsn pg_lsn);

-- Create disabled subscription on n4 from n3
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'SELECT spock.sub_create(
        subscription_name := ''sub_n3_n4'',
        provider_dsn := ''host=127.0.0.1 dbname=inventory port=5434 user=alice password=1safepassword'',
        replication_sets := ARRAY[''default'', ''default_insert_only'', ''ddl_sql''],
        synchronize_structure := false,
        synchronize_data := false,
        forward_origins := ARRAY[]::text[],
        apply_delay := ''0''::interval,
        enabled := false,
        force_text_transfer := false,
        skip_schema := ARRAY[]::text[]
    )'
) AS t(subscription_id oid);

Configure Cross-Node Replication

Now prepare for replication directing the data flow from n4 to the replica nodes (n2, n3):

Create replication slots on n2 and n3 that will hold changes from n4.
These slots will be used later when creating subscriptions for n2 and n3 to receive data from n4.
Create these slots early so they start buffering any changes from n4 immediately, even before the subscriptions are created.

Info

This prevents data loss if n4 receives writes during the setup process.

Create a Slot on n2 for a Future Subscription (sub_n4_n2)

Create a replication slot on n2 named spk_inventory_n2_sub_n4_n2. This slot will queue up changes from n4 that need to be sent to n2. The slot is ready, but there is no subscription using the slot yet; that comes later.

In the following example, the command creates a replication slot on n2:

SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5433 user=alice password=1safepassword',
    'SELECT slot_name, lsn 
     FROM pg_create_logical_replication_slot(
         ''spk_inventory_n2_sub_n4_n2'',
         ''spock_output''
     )'
) AS t(slot_name text, lsn pg_lsn);

Create a Slot on n3 for a Future Subscription (sub_n4_n3)

In the following example, the command creates a replication slot on n3:

SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5434 user=alice password=1safepassword',
    'SELECT slot_name, lsn 
     FROM pg_create_logical_replication_slot(
         ''spk_inventory_n3_sub_n4_n3'',
         ''spock_output''
     )'
) AS t(slot_name text, lsn pg_lsn);

Trigger a Sync on Other Nodes, Wait on Source

Before copying data from n1 to n4, ensure n1 has all the latest data from n2 and n3.

Any writes that happened on n2 or n3 must be fully replicated to n1 first.
For example, if a user inserted data on n2 one second ago, that data must be on n1 before copying n1's data to n4, or n4 would be missing that recent insert.
By triggering a sync event on n2 and waiting for the event on n1, you can confirm that n1 has received everything from n2 up to this exact moment.

Info

Process: Trigger sync_event on n2 (inserts a marker into the n2 replication stream), wait on n1 for that marker to arrive, and repeat for n3.

Sync n2 to n1

Call the sync_event() function on n2, which returns an LSN (0/1C9F400). Then on n1, wait for that specific LSN to be replicated. When wait_for_sync_event() completes, you are guaranteed that n1 has received all changes from n2 up to and including that LSN.

In the following example, the commands trigger a sync event and wait for completion:

-- Trigger sync event on n2
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5433 user=alice password=1safepassword',
    'SELECT spock.sync_event()'
) AS t(sync_lsn pg_lsn);
-- Returns: 0/1C9F400

-- Wait for sync event on n1
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5432 user=alice password=1safepassword',
    'CALL spock.wait_for_sync_event(true, ''n2'', ''0/1C9F400''::pg_lsn, 1200000)'
) AS t(result text);

Sync n3 to n1

Perform the same process on n3. Trigger sync_event() on n3, get the LSN (0/1D0E510), and then wait for that LSN on n1. After both syncs complete, n1 is fully caught up with both n2 and n3.

In the following example, the commands trigger a sync event on n3 and wait for completion on n1:

-- Trigger sync event on n3
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5434 user=alice password=1safepassword',
    'SELECT spock.sync_event()'
) AS t(sync_lsn pg_lsn);
-- Returns: 0/1D0E510

-- Wait for sync event on n1
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5432 user=alice password=1safepassword',
    'CALL spock.wait_for_sync_event(true, ''n3'', ''0/1D0E510''::pg_lsn, 1200000)'
) AS t(result text);

Copy the Source to New Subscription

This is the big data copy step.

Create an enabled subscription on n4 from n1 with synchronize_structure=true and synchronize_data=true.
This causes Spock to dump the entire schema (tables, indexes, constraints, etc.) from n1, restore that schema on n4, and copy all table data from n1 to n4.
Before creating the subscription, detect if n4 has any existing schemas that should be omitted from the sync (like monitoring tools, management schemas). Build a skip_schema array with these schema names. When Spock syncs the structure, Spock excludes these schemas using pg_dump --exclude-schema. When copying data, Spock also skips tables in these schemas. This prevents overwriting local tools or extensions on n4.

Info

This can take minutes to hours depending on database size (a 100GB database might take 30+ minutes to sync).

Detect Existing Schemas on n4

Query the n4 information_schema to find all user-created schemas (excluding system schemas like pg_catalog, information_schema, spock, public). If schemas like mgmt_tools or monitoring are found, they will be excluded from replication.

In the following example, the query detects existing schemas on n4:

SELECT string_agg(schema_name, ',') as schemas
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'SELECT schema_name
     FROM information_schema.schemata
     WHERE schema_name NOT IN (''information_schema'', ''pg_catalog'', ''pg_toast'', ''spock'', ''public'')
     AND schema_name NOT LIKE ''pg_temp_%''
     AND schema_name NOT LIKE ''pg_toast_temp_%'''
) AS t(schema_name text);
-- Returns: 'mgmt_tools,monitoring' (if any exist)

Create an Enabled Subscription on n4 from n1

Create a subscription (sub_n1_n4) on n4, pointing to n1 as the provider.

The key parameters include:

synchronize_structure := true - Dump and restore schema from n1.
synchronize_data := true - Copy all table data from n1.
enabled := true - Start replicating immediately.
skip_schema := ARRAY['mgmt_tools','monitoring']::text[] - Exclude detected schemas.
forward_origins := ARRAY[]::text[] - Do not forward changes that originated elsewhere.

What Spock does internally: - Run pg_dump on n1 with --exclude-schema=mgmt_tools --exclude-schema=monitoring - Restore dump on n4 - For each table in replication sets on n1: Skip tables in mgmt_tools and monitoring schemas → Run COPY command to transfer data from n1 to n4 - Once initial sync completes, start streaming ongoing changes from n1 to n4

-- Build skip_schema array based on detected schemas
-- If schemas found: ARRAY['mgmt_tools','monitoring']::text[]
-- If no schemas: ARRAY[]::text[]

SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'SELECT spock.sub_create(
        subscription_name := ''sub_n1_n4'',
        provider_dsn := ''host=127.0.0.1 dbname=inventory port=5432 user=alice password=1safepassword'',
        replication_sets := ARRAY[''default'', ''default_insert_only'', ''ddl_sql''],
        synchronize_structure := true,
        synchronize_data := true,
        forward_origins := ARRAY[]::text[],
        apply_delay := ''0''::interval,
        enabled := true,
        force_text_transfer := false,
        skip_schema := ARRAY[''mgmt_tools'',''monitoring'']::text[]
    )'
) AS t(subscription_id oid);

Trigger a Sync on the Source Node and Wait on the New Node

After the initial bulk data copy, there might be a small lag to recover changes that happened on n1 while the copy was in progress.

This confirms that n4 has caught up completely with n1 before proceeding.
Trigger a sync_event on n1 (marking the current position in the n1 stream); then, wait on n4 for that marker to arrive. When the marker arrives, n4 has received and applied everything from n1 up to this point.

Info

For example: You start copying 100GB at 10:00 AM, but while copying, users insert 1000 rows on n1 between 10:00-10:30. The copy completes at 10:30, but those 1000 rows are still being streamed. You must wait until all 1000 rows have been applied on n4 so you do not lose data.

Trigger a Sync Event on n1

Insert a sync marker into the n1 replication stream at the current position. The LSN returned (0/1E1F620) represents this exact moment in the n1 transaction log.

In the following example, the command triggers a sync event on n1:

SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5432 user=alice password=1safepassword',
    'SELECT spock.sync_event()'
) AS t(sync_lsn pg_lsn);
-- Returns: 0/1E1F620

Wait for Sync Event on n4

On n4, wait for the sync marker that matches the returned LSN (0/1E1F620) to arrive and be processed. This is a blocking call; the call will not return until the n4 subscription from n1 has replicated up to this LSN. The timeout (1200000 milliseconds = 20 minutes) prevents waiting forever if something goes wrong.

In the following example, the command waits for the sync event on n4:

SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'CALL spock.wait_for_sync_event(true, ''n1'', ''0/1E1F620''::pg_lsn, 1200000)'
) AS t(result text);

Check the Commit Timestamp and Advance Slots

This is a critical optimization step.

Earlier, n4 received a full copy of all data from n1, which includes data that originally came from n2 and n3. Those nodes now have disabled subscriptions waiting to activate (sub_n2_n4 and sub_n3_n4). The replication slots on n2 and n3 have been buffering changes.

Warning

If you enable those subscriptions now, n4 will receive duplicate data (n4 already has n2's data via n1, then will get the data again directly from n2).

Solution: Use the lag_tracker to find the exact timestamp when n4 last received data from n2 (even though the data came via n1). Then advance the n2 replication slot to skip past all data up to that timestamp. When enabling sub_n2_n4 later, the subscription only sends new changes that happened after the sync, not old data that n4 already has.

Info

Steps for each replica node: Check the n4 lag_tracker to get the commit_timestamp for the last change from n2. On n2, convert that timestamp to an LSN using get_lsn_from_commit_ts(). Then, advance the n2 replication slot to that LSN. Repeat these steps for n3.

Get the Commit Timestamp for n2 to n4

Query the n4 lag_tracker table, which tracks replication progress to find the row where origin_name='n2' and receiver_name='n4'. The commit_timestamp tells us the last time n4 received a change that originated from n2. Even though the change came through n1, lag_tracker tracks the original source.

In the following example, the query retrieves the commit timestamp:

SELECT commit_timestamp 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'SELECT commit_timestamp 
     FROM spock.lag_tracker 
     WHERE origin_name = ''n2'' AND receiver_name = ''n4'''
) AS t(commit_timestamp timestamp);
-- Returns: 2025-01-15 10:30:45.123456

Advance the Slot on n2

This is a two-step process: Use spock.get_lsn_from_commit_ts() to convert the timestamp (2025-01-15 10:30:45.123456) into an LSN that corresponds to that point in the n2 transaction log. Then, use pg_replication_slot_advance() to move the slot's restart_lsn forward to that position.

The slot spk_inventory_n2_sub_n2_n4 has been buffering all changes to n2. By advancing the slot, you discard everything up to this LSN (n4 already has the data). Now the slot only contains new changes that n4 needs.

In the following example, the commands convert the timestamp and advance the slot:

-- Get LSN from commit timestamp
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5433 user=alice password=1safepassword',
    'WITH lsn_cte AS (
         SELECT spock.get_lsn_from_commit_ts(
             ''spk_inventory_n2_sub_n2_n4'',
             ''2025-01-15 10:30:45.123456''::timestamp
         ) AS lsn
     )
     SELECT pg_replication_slot_advance(''spk_inventory_n2_sub_n2_n4'', lsn) 
     FROM lsn_cte'
) AS t(result text);

Repeat for n3 to n4

In the following example, the commands perform the same operations for n3 to n4:

-- Get commit timestamp
SELECT commit_timestamp 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'SELECT commit_timestamp 
     FROM spock.lag_tracker 
     WHERE origin_name = ''n3'' AND receiver_name = ''n4'''
) AS t(commit_timestamp timestamp);
-- Returns: 2025-01-15 10:30:46.789012

-- Advance the slot on n3
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5434 user=alice password=1safepassword',
    'WITH lsn_cte AS (
         SELECT spock.get_lsn_from_commit_ts(
             ''spk_inventory_n3_sub_n3_n4'',
             ''2025-01-15 10:30:46.789012''::timestamp
         ) AS lsn
     )
     SELECT pg_replication_slot_advance(''spk_inventory_n3_sub_n3_n4'', lsn) 
     FROM lsn_cte'
) AS t(result text);

Enable Disabled Subscriptions

In this step, activate subscriptions from replica nodes to new node using stored sync LSNs.

Note

Remember when you created disabled subscriptions and stored sync LSNs? Now we bring it all together. Enable those subscriptions (sub_n2_n4 and sub_n3_n4), and use the stored LSNs to verify they start from the correct position.

Why the stored LSNs matter: When you created sub_n2_n4 and triggered sync_event on n2, you got LSN 0/1A7D1E0. That LSN marked the exact moment you created the replication slot. Between then and now, hours may have passed. The replication slot has been buffering changes, but you need to ensure the subscription starts processing from that original bookmark point, not skipping ahead.

The verification: After enabling each subscription, call wait_for_sync_event() with the stored LSN. This confirms that the subscription has processed up to at least that sync point before you continue.

After the slot advancement: The slots were advanced to skip duplicate data. Now when you enable the subscriptions, they will only send new changes that happened after the initial sync, which is exactly what you want.

Enable sub_n2_n4 on n4

Next, call spock.sub_enable() on n4 to activate the subscription. The subscription worker process starts, connects to n2, and begins pulling changes from the replication slot.

The following example enables the subscription:

-- Enable the subscription
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'SELECT spock.sub_enable(
        subscription_name := ''sub_n2_n4'',
        immediate := true
    )'
) AS t(result text);

Wait for Stored Sync Event from n2

Then, retrieve the LSN we stored (0/1A7D1E0) and use the LSN in the wait_for_sync_event() function. This is the key to the entire ZODAN approach. We are verifying that the subscription has caught up to the sync point we marked when we first set things up. This guarantees data consistency.

In the following example, the query retrieves the stored LSN and waits for the sync event:

-- Retrieve stored LSN
SELECT sync_lsn FROM temp_sync_lsns WHERE origin_node = 'n2';
-- Returns: 0/1A7D1E0

-- Wait for that sync event on n4
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'CALL spock.wait_for_sync_event(true, ''n2'', ''0/1A7D1E0''::pg_lsn, 1200000)'
) AS t(result text);

Verify the Subscription is Replicating

Check the subscription status using the sub_show_status() function. We want to see status='replicating', which means the subscription worker is active, connected to n2, and successfully applying changes. Any other status (down, initializing) would indicate a problem.

The following example checks the subscription status on n4:

-- Check subscription status on n4
SELECT subscription_name, status, provider_node
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'SELECT subscription_name, status, provider_node 
     FROM spock.sub_show_status() 
     WHERE subscription_name = ''sub_n2_n4'''
) AS t(subscription_name text, status text, provider_node text);

-- Expected output:
--  subscription_name |   status    | provider_node 
-- -------------------+-------------+---------------
--  sub_n2_n4         | replicating | n2

Repeat the Steps for sub_n3_n4

In the following example, commands enable the subscription and verify replication:

-- Enable subscription
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'SELECT spock.sub_enable(
        subscription_name := ''sub_n3_n4'',
        immediate := true
    )'
) AS t(result text);

-- Retrieve stored LSN
SELECT sync_lsn FROM temp_sync_lsns WHERE origin_node = 'n3';
-- Returns: 0/1B8E2F0

-- Wait for stored sync event
SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'CALL spock.wait_for_sync_event(true, ''n3'', ''0/1B8E2F0''::pg_lsn, 1200000)'
) AS t(result text);

-- Verify status
SELECT subscription_name, status, provider_node
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
    'SELECT subscription_name, status, provider_node 
     FROM spock.sub_show_status() 
     WHERE subscription_name = ''sub_n3_n4'''
) AS t(subscription_name text, status text, provider_node text);

Create Subscriptions from Other Nodes to New Node

This step enables bidirectional replication from n4 to all other nodes.

Up to this point, data has been flowing TO n4 from other nodes. Now we set up the reverse paths. Data will flow FROM n4 TO other nodes. We create subscriptions on n1, n2, and n3 that subscribe FROM n4.

We waited until n4 was fully populated with data before allowing other nodes to subscribe to it. If we had created these subscriptions earlier, the other nodes would try to sync from an empty n4, which would cause problems.

These subscriptions have synchronize_structure=false and synchronize_data=false because n1, n2, and n3 already have all the data. We only want them to receive NEW changes that happen ON n4 from now on, not copy existing data.

After this, the cluster is fully bidirectional:

n1 sends to n4, n4 sends to n1
n2 sends to n4, n4 sends to n2
n3 sends to n4, n4 sends to n3

Create sub_n4_n1 on n1

Create a subscription ON n1 that points TO n4 as the provider. This is commonly misunderstood; the subscription is created on the RECEIVER node (n1), not the sender (n4). After this, any INSERT/UPDATE/DELETE that happens directly on n4 will replicate to n1.

The following example creates the subscription:

SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5432 user=alice password=1safepassword',
    'SELECT spock.sub_create(
        subscription_name := ''sub_n4_n1'',
        provider_dsn := ''host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword'',
        replication_sets := ARRAY[''default'', ''default_insert_only'', ''ddl_sql''],
        synchronize_structure := false,
        synchronize_data := false,
        forward_origins := ARRAY[]::text[],
        apply_delay := ''0''::interval,
        enabled := true,
        force_text_transfer := false,
        skip_schema := ARRAY[]::text[]
    )'
) AS t(subscription_id oid);

Create sub_n4_n2 on n2

This is the same concept. We create a subscription on n2 that pulls from n4. Now n2 will receive changes that happen on n4.

In the following example, the command creates the subscription on n2:

SELECT * 
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5433 user=alice password=1safepassword',
    'SELECT spock.sub_create(
        subscription_name := ''sub_n4_n2'',
        provider_dsn := ''host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword'',
        replication_sets := ARRAY[''default'', ''default_insert_only'', ''ddl_sql''],
        synchronize_structure := false,
        synchronize_data := false,
        forward_origins := ARRAY[]::text[],
        apply_delay := ''0''::interval,
        enabled := true,
        force_text_transfer := false,
        skip_schema := ARRAY[]::text[]
    )'
) AS t(subscription_id oid);

Create sub_n4_n3 on n3

The following command creates the subscription on n3:

SELECT * FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5434 user=alice password=1safepassword',
    'SELECT spock.sub_create(
        subscription_name := ''sub_n4_n3'',
        provider_dsn := ''host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword'',
        replication_sets := ARRAY[''default'', ''default_insert_only'', ''ddl_sql''],
        synchronize_structure := false,
        synchronize_data := false,
        forward_origins := ARRAY[]::text[],
        apply_delay := ''0''::interval,
        enabled := true,
        force_text_transfer := false,
        skip_schema := ARRAY[]::text[]
    )'
) AS t(subscription_id oid);

Monitor Replication Lag

Next, we will verify that n4 is keeping up with n1.

We monitor the lag_tracker table on n4 to ensure replication from n1 is keeping up. The lag_tracker shows the time difference between when a transaction was committed on n1 and when the transaction was applied on n4.

If lag is growing (for example, 5 seconds, then 10 seconds, then 30 seconds), n4 cannot keep up with the write load from n1. This could indicate the following issues:

Network bandwidth issues.
n4 hardware is slower than n1.
Long-running transactions blocking replication.

We want lag under 59 seconds, or lag_bytes=0 (meaning n4 has processed everything). If lag stays consistently low, n4 is successfully integrated.

The following example monitors replication lag:

-- Monitor lag from n1 → n4 on n4
DO $$
DECLARE
    lag_interval interval;
    lag_bytes bigint;
BEGIN
    LOOP
        SELECT now() - commit_timestamp, replication_lag_bytes 
        INTO lag_interval, lag_bytes
        FROM dblink(
            'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
            'SELECT now() - commit_timestamp, replication_lag_bytes 
             FROM spock.lag_tracker 
             WHERE origin_name = ''n1'' AND receiver_name = ''n4'''
        ) AS t(lag_interval interval, lag_bytes bigint);

        RAISE NOTICE 'n1 → n4 lag: % (bytes: %)', 
            COALESCE(lag_interval::text, 'NULL'),
            COALESCE(lag_bytes::text, 'NULL');

        EXIT WHEN lag_interval IS NOT NULL 
                  AND (extract(epoch FROM lag_interval) < 59 OR lag_bytes = 0);

        PERFORM pg_sleep(1);
    END LOOP;

    RAISE NOTICE 'Replication lag monitoring complete';
END;
$$;

-- Expected output:
-- NOTICE:  n1 → n4 lag: 00:00:02.345 (bytes: 1024)
-- NOTICE:  n1 → n4 lag: 00:00:01.123 (bytes: 512)
-- NOTICE:  n1 → n4 lag: 00:00:00.456 (bytes: 0)
-- NOTICE:  Replication lag monitoring complete

Show All Nodes

Next, we will verify that all of the nodes are registered; the following query displays all registered nodes:

SELECT n.node_id, n.node_name, n.location, n.country, i.if_dsn
FROM dblink(
    'host=127.0.0.1 dbname=inventory port=5432 user=alice password=1safepassword',
    'SELECT n.node_id, n.node_name, n.location, n.country, i.if_dsn 
     FROM spock.node n 
     JOIN spock.node_interface i ON n.node_id = i.if_nodeid 
     ORDER BY n.node_name'
) AS t(node_id integer, node_name text, location text, country text, if_dsn text);

-- Expected output:
--  node_id | node_name |  location   | country |                           if_dsn                           
-- ---------+-----------+-------------+---------+------------------------------------------------------------
--    16385 | n1        | New York    | USA     | host=127.0.0.1 dbname=inventory port=5432 user=alice ...
--    16386 | n2        | Chicago     | USA     | host=127.0.0.1 dbname=inventory port=5433 user=alice ...
--    16387 | n3        | Boston      | USA     | host=127.0.0.1 dbname=inventory port=5434 user=alice ...
--    16389 | n4        | Los Angeles | USA     | host=127.0.0.1 dbname=inventory port=5435 user=alice ...

Show the Status of all Subscriptions

Next, we want to verify that all subscriptions are replicating. In the following example, the query displays subscription status across all nodes:

-- Get subscriptions from all nodes
SELECT node_name, subscription_name, status, provider_node
FROM (
    -- n1 subscriptions
    SELECT 'n1' as node_name, *
    FROM dblink(
        'host=127.0.0.1 dbname=inventory port=5432 user=alice password=1safepassword',
        'SELECT subscription_name, status, provider_node FROM spock.sub_show_status()'
    ) AS t(subscription_name text, status text, provider_node text)

    UNION ALL

    -- n2 subscriptions
    SELECT 'n2' as node_name, *
    FROM dblink(
        'host=127.0.0.1 dbname=inventory port=5433 user=alice password=1safepassword',
        'SELECT subscription_name, status, provider_node FROM spock.sub_show_status()'
    ) AS t(subscription_name text, status text, provider_node text)

    UNION ALL

    -- n3 subscriptions
    SELECT 'n3' as node_name, *
    FROM dblink(
        'host=127.0.0.1 dbname=inventory port=5434 user=alice password=1safepassword',
        'SELECT subscription_name, status, provider_node FROM spock.sub_show_status()'
    ) AS t(subscription_name text, status text, provider_node text)

    UNION ALL

    -- n4 subscriptions
    SELECT 'n4' as node_name, *
    FROM dblink(
        'host=127.0.0.1 dbname=inventory port=5435 user=alice password=1safepassword',
        'SELECT subscription_name, status, provider_node FROM spock.sub_show_status()'
    ) AS t(subscription_name text, status text, provider_node text)
) combined
ORDER BY node_name, subscription_name;

-- Expected output (12 rows):
--  node_name | subscription_name |   status    | provider_node 
-- -----------+-------------------+-------------+---------------
--  n1        | sub_n2_n1         | replicating | n2
--  n1        | sub_n3_n1         | replicating | n3
--  n1        | sub_n4_n1         | replicating | n4
--  n2        | sub_n1_n2         | replicating | n1
--  n2        | sub_n3_n2         | replicating | n3
--  n2        | sub_n4_n2         | replicating | n4
--  n3        | sub_n1_n3         | replicating | n1
--  n3        | sub_n2_n3         | replicating | n2
--  n3        | sub_n4_n3         | replicating | n4
--  n4        | sub_n1_n4         | replicating | n1
--  n4        | sub_n2_n4         | replicating | n2
--  n4        | sub_n3_n4         | replicating | n3

Verification

Next, we perform a series of tests to verify that the cluster is replicating between n1 and n4.

First, we create a test table and insert data:

-- On n1
CREATE TABLE test_replication (id serial primary key, data text, created_at timestamp default now());
INSERT INTO test_replication (data) VALUES ('Test from n1');

-- On n4 (wait a few seconds)
SELECT * FROM test_replication;
-- Expected: Row appears on n4

Test Replication between n4 and n1

Next, we insert data on n4:

INSERT INTO test_replication (data) VALUES ('Test from n4');

Then, verify replication on n1:

-- On n1 (wait a few seconds)
SELECT * FROM test_replication;
-- Expected: Both rows appear on n1

Our Final Cluster Topology

        ┌──────┐
        │  n1  │ (source)
        └──┬───┘
     ┌─────┼─────┬─────┐
     │     │     │     │
  ┌──▼──┐ │  ┌──▼──┐ ┌▼────┐
  │ n2  │◄┼──┤ n3  │ │ n4  │ (new)
  └──┬──┘ │  └──┬──┘ └┬────┘
     │    │     │     │
     └────┴─────┴─────┘

All nodes are now bidirectionally replicating:

n1, n2, n3, and n4 are all bidirectionally connected.
Total subscriptions: 12 (each node subscribes to 3 others).

Troubleshooting

The following sections share commands that can help diagnose issues you encounter while adding a node:

Subscription Status Stuck in 'initializing' or 'down'

The following commands can help you identify places where a node is encountering issues with replication status:

-- Check subscription status
SELECT * FROM spock.sub_show_status();

-- Check worker processes
SELECT * FROM spock.sub_show_table(subscription_name);

-- Check logs
SELECT * FROM pg_stat_replication;

Replication Lag Too High

Use the following command to check replication lag on a subscribing node:

-- Check lag on receiver node
SELECT origin_name, receiver_name, 
       now() - commit_timestamp as lag,
       replication_lag_bytes
FROM spock.lag_tracker
ORDER BY lag DESC;

Slot Not Advancing

Use the following command to check the status of a replication slot:

-- Check slot status
SELECT slot_name, slot_type, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots
WHERE slot_name LIKE 'spk_%';