📋Steps in this guide1/13

How to open postgres standby database for read write(snapshot standby) - DBACLASS DBACLASS

postgresql configurationintermediate

by PostgreSQL

17 views

Overview

There is no technical term called snapshot standby in postgres. If you have knowledge of oracle database, then this snapshot standby method is there in dataguard. I.e you can open the standby database in read-write mode temporarily and do all you testing like data change schema change etc. And once your testing is done, you can make the standby sync with primary without rebuilding your standby from scratch. Same thing we can achieve in postgres also using pg_rewind. 1.Validate replication Make sure replication is running fine without any lag:

Section 2

2. Shutdown slave server: 3. Remove standby.signal file:

Code/Command (click line numbers to comment):

-- MASTER NODE

postgres=# <strong>select * from pg_stat_replication;</strong>
-[ RECORD 1 ]----+------------------------------
pid              | 6965
usesysid         | 10
usename          | postgres
application_name | walreceiver
client_addr      | 10.20.30.7
client_hostname  |
client_port      | 51244
backend_start    | 2022-07-13 10:29:03.01259+03
backend_xmin     |
state            | streaming
sent_lsn         | 0/9600C20
write_lsn        | 0/9600C20
flush_lsn        | 0/9600C20
replay_lsn       | 0/9600C20
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async
reply_time       | 2022-07-13 10:31:10.636569+03



-- SLAVE NODE:

Expanded display is on.
postgres=# <strong>select * from pg_stat_wal_receiver;</strong>
-[ RECORD 1 ]---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 30164
status                | streaming
receive_start_lsn     | 0/5000000
receive_start_tli     | 3
written_lsn           | 0/8000060
flushed_lsn           | 0/8000060
received_tli          | 3
last_msg_send_time    | 2022-07-13 10:30:32.937076+03
last_msg_receipt_time | 2022-07-13 10:29:33.399633+03
latest_end_lsn        | 0/8000060
latest_end_time       | 2022-07-13 10:30:02.798967+03
slot_name             |
sender_host           | 10.20.30.6
sender_port           | 5444
conninfo              | user=postgres password=postgres channel_binding=prefer host=10.20.30.6 port=5444 sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any

-bash-4.2$ <strong>/usr/pgsql-14/bin/pg_ctl stop  -D /oracle/pg_data</strong>
waiting for server to shut down.... done
server stopped
-bash-4.2$ ls -ltr

Section 3

4. Start the slave server: We can see , the db is open now and ready for transactions: 5. Do some data changes on SLAVE:

Code/Command (click line numbers to comment):

-bash-4.2$ <strong>rm -rf standby.signal
</strong>

-bash-4.2$ <strong>/usr/pgsql-14/bin/pg_ctl start -D /oracle/pg_data</strong>
waiting for server to start....2022-07-13 11:26:58.937 +03 [1941] LOG:  redirecting log output to logging collector process
2022-07-13 11:26:58.937 +03 [1941] HINT:  Future log output will appear in directory "log".
 done
server started

CHECK THE LOG:

2022-07-13 11:26:58.944 +03 [1941] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5444"
2022-07-13 11:26:58.951 +03 [1943] LOG:  database system was shut down in recovery at 2022-07-13 11:26:19 +03
2022-07-13 11:26:58.952 +03 [1943] LOG:  database system was not properly shut down; automatic recovery in progress
2022-07-13 11:26:58.956 +03 [1943] LOG:  redo starts at 0/9600C58
2022-07-13 11:26:58.956 +03 [1943] LOG:  invalid record length at 0/9600D40: wanted 24, got 0
2022-07-13 11:26:58.956 +03 [1943] LOG:  redo done at 0/9600D08 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2022-07-13 11:26:58.979 +03 [1941] LOG:  database system is ready to accept connections

Section 4

So we created few new tables on slave(though disconnected), which are not on MASTER. Now we want to revert this and sync the slave with master for replication setup. 6. Create standby.signal file:

Code/Command (click line numbers to comment):

postgres=# create table test102 as select * from test100;
SELECT 88064
postgres=# create table test103 as select * from test100;
SELECT 88064
postgres=#
postgres=#
postgres=#  create table test104 as select * from test100;
SELECT 88064

postgres=# <strong>\dt</strong>
          List of relations
 Schema |  Name   | Type  |  Owner
--------+---------+-------+----------
 public | test100 | table | postgres
 public | test101 | table | postgres
 public | test102 | table | postgres
 public | test103 | table | postgres
 public | test104 | table | postgres
(5 rows)

Section 5

7. Restart the server: 8. Now promote this server:

Code/Command (click line numbers to comment):

-bash-4.2$<strong> touch standby.signal
</strong>

-bash-4.2$<strong> /usr/pgsql-14/bin/pg_ctl stop -D /oracle/pg_data</strong>
waiting for server to shut down.... done
server stopped
-bash-4.2$  /usr/pgsql-14/bin/pg_ctl start -D /oracle/pg_data
waiting for server to start....2022-07-13 11:34:00.571 +03 [2503] LOG:  redirecting log output to logging collector process
2022-07-13 11:34:00.571 +03 [2503] HINT:  Future log output will appear in directory "log".
 done
server started

Section 6

If you see , a newtimeline id is created , which is different from master. 8. Now run pg_rewind for syncing with master: 9 . Create standby.signal file and update postgres.auto.conf file:

Code/Command (click line numbers to comment):

-bash-4.2$ <strong> /usr/pgsql-14/bin/pg_ctl promote -D /oracle/pg_data</strong>
waiting for server to promote.... done
server promoted
-bash-4.2$


2022-07-13 11:34:00.615 +03 [2511] LOG:  started streaming WAL from primary at 0/F000000 on timeline 3
2022-07-13 11:34:00.615 +03 [2511] FATAL:  could not receive data from WAL stream: ERROR:  requested starting point 0/F000000 is ahead of the WAL flush position of this server 0/9600D40
2022-07-13 11:34:00.637 +03 [2515] LOG:  started streaming WAL from primary at 0/F000000 on timeline 3
2022-07-13 11:34:00.638 +03 [2515] FATAL:  could not receive data from WAL stream: ERROR:  requested starting point 0/F000000 is ahead of the WAL flush position of this server 0/9600D40
2022-07-13 11:34:04.729 +03 [2506] LOG:  received promote request
2022-07-13 11:34:04.730 +03 [2506] LOG:  redo is not required
2022-07-13 11:34:04.737 +03 [2506] LOG:  selected new timeline ID: 4
2022-07-13 11:34:04.813 +03 [2506] LOG:  archive recovery complete
2022-07-13 11:34:04.839 +03 [2503] LOG:  database system is ready to accept connections

-- shutdown cluster
-bash-4.2$<strong> /usr/pgsql-14/bin/pg_ctl stop -D /oracle/pg_data</strong>
waiting for server to shut down.... done
server stopped

-- Run pg_rewind:

-bash-4.2$ <strong> /usr/pgsql-14/bin/pg_rewind  -D /oracle/pg_data --source-server='host=10.20.30.6  dbname=postgres user=postgres password=postgres port=5444'</strong>
pg_rewind: servers diverged at WAL location 0/F0000A0 on timeline 3
pg_rewind: rewinding from last common checkpoint at 0/F000028 on timeline 3
pg_rewind: Done!

Section 7

When you do pg_rewind. it copies the files including postgres conf files from master. So you need to update the master host details in postgres.auto.conf file . 10. Start the cluster:

Code/Command (click line numbers to comment):

-bash-4.2$ <strong>touch standby.signal</strong>

-bash-4.2$ <strong>cat postgresql.auto.conf</strong>
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=postgres password=postgres channel_binding=prefer host=10.20.30.6 port=5444 sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'
wal_log_hints = 'on'

-bash-4.2$ <strong> /usr/pgsql-14/bin/pg_ctl start -D /oracle/pg_data</strong>
waiting for server to start....2022-07-13 11:36:45.698 +03 [2734] LOG:  redirecting log output to logging collector process
2022-07-13 11:36:45.698 +03 [2734] HINT:  Future log output will appear in directory "log".
 done
server started
-bash-4.2$ psql -d postgres -p 5444
psql (14.2)
Type "help" for help.

postgres=# <strong>\dt</strong>
          List of relations
 Schema |  Name   | Type  |  Owner
--------+---------+-------+----------
 public | test100 | table | postgres
 public | test101 | table | postgres
(2 rows)

Section 8

We can see, the new tables which we created are gone and is in sync with master. 11. Validate the replication again: I created a new table on master and can observe that , it got replicated to slave also. COMMON ISSUES:

Code/Command (click line numbers to comment):

-- On master:
postgres=# create table test106 as select * from test100;
SELECT 88065
postgres=#



-- On slave:
postgres=# \dt
          List of relations
 Schema |  Name   | Type  |  Owner
--------+---------+-------+----------
 public | test100 | table | postgres
 public | test101 | table | postgres
 public | test106 | table | postgres
(3 rows)

Section 9

1.pg_rewind: source and target cluster are on the same timeline Problem: -bash-4.2$ /usr/pgsql-14/bin/pg_rewind -D /oracle/pg_data –source-server=’host=10.20.30.6 dbname=postgres user=postgres password=postgres port=5444′ pg_rewind: source and target cluster are on the same timeline pg_rewind: no rewind required

Section 10

Solution: pg_rewind, doesnot work when both master and slave have sametimeline id. You can change the timeline id of slave, using pg_ctl promote. So run pg_rewind, after running the promote command. 2. pg_ctl: cannot promote server; server is not in standby mode Problem:

Section 11

-bash-4.2$ /usr/pgsql-14/bin/pg_ctl promote -D /oracle/pg_data pg_ctl: cannot promote server; server is not in standby mode Solution: create a standby.signal file and then restart the cluster. touch standby.signal 3. pg_rewind: error: postgres single-user mode in target cluster failed

Section 12

Problem: -bash-4.2$ /usr/pgsql-14/bin/pg_rewind -D /oracle/pg_data –source-server=’host=10.20.30.6 dbname=postgres user=postgres password=postgres port=5444′ pg_rewind: executing “/usr/pgsql-14/bin/postgres” for target server to complete crash recovery 2022-07-13 11:31:57.696 +03 [2287] FATAL: lock file “postmaster.pid” already exists 2022-07-13 11:31:57.696 +03 [2287] HINT: Is another postmaster (PID 1941) running in data directory “/oracle/pg_data”? pg_rewind: error: postgres single-user mode in target cluster failed pg_rewind: fatal: Command was: “/usr/pgsql-14/bin/postgres” –single -F -D “/oracle/pg_data” template1 < “/dev/null” Solution:

Section 13

Before pg_rewind runs, make sure that postgres is down. /usr/pgsql-14/bin/pg_ctl stop -D /oracle/pg_data

Comments (0)

Please to add comments

No comments yet. Be the first to comment!