Point in Time Recovery (PITR) using pg_basebackup in PostgreSQL

July 6, 2021
()

Point in Time Recovery (PITR) using pg_basebackup in PostgreSQL

In previous blog we saw How to perform Full recovery using backup taken from pg_basebackup  Click here to read more.

In this blog we will see How to perform PITR using backup taken from pg_basebackup in PostgreSQL. 

For the purpose of demonstration we will be using our second PostgreSql Cluster created on Port: 5433 Click here to learn How to create second PostgreSql Cluster. For pg_basebackup backup commands Click here to read more.


PostgreSQL Cluster Physical Structure: PostgreSQL Cluster data directory: /u01/psql-13, Created two non-default tablespaces myts03: /u01/psql3_tbls & myts04: /u01/psql4_tbls. Created two sample databases sample3 sample4 and loaded them with sample data along with configured WAL archiving: /u01/wal_archive.

postgres=# show data_directory;
 data_directory
----------------
 /u01/psql-13
 
postgres=#  \db+
                                      List of tablespaces
    Name    |  Owner   |    Location     | Access privileges | Options |  Size   | Description
------------+----------+-----------------+-------------------+---------+---------+-------------
 myts03     | postgres | /u01/psql3_tbls |                   |         | 0 bytes |
 myts04     | postgres | /u01/psql4_tbls |                   |         | 0 bytes |
 pg_default | postgres |                 |                   |         | 46 MB   |
 pg_global  | postgres |                 |                   |         | 559 kB  |
(4 rows)
 
 
postgres=# \l+
                                                                    List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   |  Size   | Tablespace |                Description
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
 dvdrental | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres         +| 15 MB   | pg_default |
           |          |          |             |             | postgres=CTc/postgres+|         |            |
           |          |          |             |             | read_only=c/postgres  |         |            |
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 15 MB   | pg_default | default administrative connection database
 sample3   | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 15 MB   | myts03     |
 sample4   | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 15 MB   | myts04     |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 7969 kB | pg_default | unmodifiable empty database
           |          |          |             |             | postgres=CTc/postgres |         |            |
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 8117 kB | pg_default | default template for new databases
           |          |          |             |             | postgres=CTc/postgres |         |            |
(6 rows)
 
postgres=#
 
postgres=# select name,setting,unit from pg_settings where name in ('archive_mode','archive_command','archive_timeout');
      name       |                          setting                           | unit
-----------------+------------------------------------------------------------+------
 archive_command | test ! -f /u01/wal_archive/%f && cp %p /u01/wal_archive/%f |
 archive_mode    | on                                     


Create and Load test table and Switch WAL File: Let’s Create Table t_test and insert 1000 random records. Once insert operation is done do force WAL switch and make sure all changes are archived.

postgres=# \connect sample3
You are now connected to database "sample3" as user "postgres".
sample3=#
sample3=# create table t_Test (slno integer, ts timestamp);
CREATE TABLE
sample3=# insert into t_test values (generate_series(1,1000),now());
INSERT 0 1000
sample3=#
sample3=# select count(*) from t_test;
 count
-------
  1000
(1 row)
 
sample3=# \connect sample4
You are now connected to database "sample4" as user "postgres".
sample4=#
sample4=# create table t_Test (slno integer, ts timestamp);
CREATE TABLE
sample4=# insert into t_test values (generate_series(1,1000),now());
INSERT 0 1000
sample4=# select count(*) from t_test;
 count
-------
  1000
(1 row)
 
postgres=# select pg_switch_wal();
 pg_switch_wal
---------------
 0/A04DDC8
(1 row)
 
postgres=#
postgres=#


Full DB backup: Perform full cluster level tar backup using pg_basebackup using the below command.

-bash-4.2$ pg_basebackup -h localhost -p 5433 -U postgres -D /u01/backup/latest1_backup -Ft -z -Xs -P
78382/78382 kB (100%), 3/3 tablespaces
-bash-4.2$
 
-bash-4.2$ ls -ltr /u01/backup/latest1_backup
total 12900
-rw-------. 1 postgres postgres 2699207 Jul  4 16:16 21913.tar.gz
-rw-------. 1 postgres postgres 2699006 Jul  4 16:16 21914.tar.gz
-rw-------. 1 postgres postgres 7453080 Jul  4 16:16 base.tar.gz
-rw-------. 1 postgres postgres   17655 Jul  4 16:16 pg_wal.tar.gz
-rw-------. 1 postgres postgres  334189 Jul  4 16:16 backup_manifest
-bash-4.2$


Delete All Records: Let simulate human error that caused accidental delete of table data after 2021-07-05 11:04:28

sample4=#
sample4=# select now();
              now
-------------------------------
 2021-07-05 11:04:28.721948+03
(1 row)

sample4=#
sample4=# delete from t_test;
DELETE 1000
sample4=#
sample4=# \connect sample3
You are now connected to database "sample3" as user "postgres".
sample3=#
sample3=# delete from t_test;
DELETE 1000
sample3=#
sample3=# select now();
              now
-------------------------------
 2021-07-05 11:07:36.401521+03
(1 row)

sample3=#


Stop PostgreSql Cluster and remove all datafiles: In order to restore the complete postgresql cluster, we need to delete all datafiles in the base and tablespace directory.

[root@test-machine02 ~]# systemctl stop psql-13.service
[root@test-machine02 ~]# rm -rf /u01/psql-13/*
[root@test-machine02 ~]# rm -rf /u01/psql3_tbls/*
[root@test-machine02 ~]# rm -rf /u01/psql4_tbls/*


Follow the below steps to perform PITR


Step 1. Untar Base backup: Use OS command tar to copy the base backup to the base location: /u01/psql-13.

-bash-4.2$ tar xvf /u01/backup/latest1_backup/base.tar.gz -C  /u01/psql-13
backup_label
tablespace_map
pg_wal/
./pg_wal/archive_status/
global/
global/1262


Step 2. View tablespace_map File: pg_basebackup command creates a text file called tablespace_map which has mapping to tablespace OID and directory location.

-bash-4.2$ pwd
/u01/psql-13
-bash-4.2$ cat tablespace_map
21913 /u01/psql3_tbls
21914 /u01/psql4_tbls
-bash-4.2$


Step 3. Untar Tablespace & WAL Backup: Untar both tablespace backup and WAL backup to their respective location.

-bash-4.2$
-bash-4.2$ tar xvf /u01/backup/latest1_backup/21913.tar.gz -C  /u01/psql3_tbls
PG_13_202007201/
PG_13_202007201/21916/
PG_13_202007201/21916/1255
 
-bash-4.2$
-bash-4.2$ tar xvf /u01/backup/latest1_backup/21914.tar.gz -C  /u01/psql4_tbls
PG_13_202007201/
PG_13_202007201/21917/
PG_13_202007201/21917/1255
 
-bash-4.2$
-bash-4.2$ tar xvf /u01/backup/latest1_backup/pg_wal.tar.gz -C /u01/psql-13/pg_wal
00000001000000000000000C
archive_status/00000001000000000000000C.done
-bash-4.2$
-bash-4.2$


Step 4 : Update restore_command & recovery_target_time in postgresql.conf: Edit postgresql.conf file and provide WAL archiving location in restore_command and until the time we need to recover in recovery_target_time parameters. So postgresql service and do recovery based on provided target time.

-bash-4.2$ cd /u01/psql-13/
-bash-4.2$ vi postgresql.conf
restore_command = 'cp /u01/wal_archive/%f %p'
recovery_target_time = '2021-07-05 11:04:00 AST'
:wq!

-bash-4.2$ egrep -w 'restore_command|recovery_target_time' postgresql.conf
restore_command = 'cp /u01/wal_archive/%f %p'           # command to use to restore an archived logfile segment
recovery_target_time = '2021-07-05 11:04:00 AST'        # the time stamp up to which recovery will proceed
-bash-4.2$


Step 5. Start PostgreSQL Service: Once all the above steps are done. Start postgresql service.

[root@test-machine02 u01]# systemctl start psql-13.service
Job for psql-13.service failed because the control process exited with error code. See "systemctl status psql-13.service" and "journalctl -xe" for details.
[root@test-machine02 u01]# journalctl -xe
-- Unit psql-13.service has begun starting up.
Jul 05 11:16:07 test-machine02 postmaster[111318]: 2021-07-05 08:16:07.894 GMT [111318] LOG:  invalid value for parameter "recovery_target_time": "2021-07-05 11:04:00 AST"
Jul 05 11:16:07 test-machine02 postmaster[111318]: 2021-07-05 11:16:07.895 +03 [111318] FATAL:  configuration file "/u01/psql-13/postgresql.conf" contains errors
Jul 05 11:16:07 test-machine02 systemd[1]: psql-13.service: main process exited, code=exited, status=1/FAILURE
Jul 05 11:16:07 test-machine02 systemd[1]: Failed to start PostgreSQL 13 database server.
-- Subject: Unit psql-13.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--


Step 6. Update recovery_target_time: To fix above error edit parameter recovery_target_time and replace AST (Arabic Standard Time) to Asia/Riyadh (Depends on server/DB time zone).

-bash-4.2$ cd /u01/psql-13/
-bash-4.2$ vi postgresql.conf
restore_command = 'cp /u01/wal_archive/%f %p'
recovery_target_time = '2021-07-05 11:04:00 Asia/Riyadh'
:wq!

-bash-4.2$ egrep -w 'restore_command|recovery_target_time' postgresql.conf
restore_command = 'cp /u01/wal_archive/%f %p'           # command to use to restore an archived logfile segment
recovery_target_time = '2021-07-05 11:04:00 Asia/Riyadh'        # the time stamp up to which recovery will proceed
-bash-4.2$


Step 7. Start PostgreSQL Service: Once all the above steps are done. Start postgresql service.

[root@test-machine02 u01]#
[root@test-machine02 u01]# systemctl start psql-13.service
[root@test-machine02 u01]#


Step 8. Verify logfile: Verify logfile to validate everything went fine while restoring.

-bash-4.2$ pwd
/u01/psql-13/log
bash-4.2$ vi postgresql-Mon.log

2021-07-05 11:24:21.430 +03 [111753] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5433"
2021-07-05 11:24:21.432 +03 [111753] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
2021-07-05 11:24:21.434 +03 [111757] LOG:  database system was interrupted; last known up at 2021-07-04 16:16:13 +03
2021-07-05 11:24:21.453 +03 [111757] LOG:  redo starts at 0/C000028
2021-07-05 11:24:21.453 +03 [111757] LOG:  consistent recovery state reached at 0/C000100
2021-07-05 11:24:21.453 +03 [111757] LOG:  redo done at 0/C000100
2021-07-05 11:24:21.532 +03 [111753] LOG:  database system is ready to accept connections


Step 9: Verify Data: Verify all data exists after restore and recovery.

-bash-4.2$ psql -p 5433
psql (13.2)
Type "help" for help.

postgres=#
postgres=# \connect sample3
You are now connected to database "sample3" as user "postgres".
sample3=# select count(*) from t_test;
 count
-------
  1000
(1 row)

sample3=# \connect sample4
You are now connected to database "sample4" as user "postgres".
sample4=# select count(*) from t_test;
 count
-------
  1000
(1 row)

sample4=#


This document is just for learning purpose and always validate in the LAB environment first before applying in the LIVE environment.

Hope so you like this article!
Please share your valuable feedback/comments/subscribe and follow us below and don’t forget to click on the bell icon to get the most recent update. Click here to understand more about our pursuit.

Related Articles


Loading

How useful was this post?

Click on a star to rate it!

Average rating / 5. Vote count:

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?