Postgresql Architecture – Physical Storage Structure

June 2, 2021
()

Postgresql Architecture – Physical Storage Structure


In this blog, we will discuss PostgreSQL architecture of physical storage which can be categorized in below four part:

1. Data Directory
2. Physical Files
3. Data file layout (OID)
4. Database Structure


1. Data Directory

Specifies the directory to use for data storage. This parameter can only be set at server start.


config_file:
Specifies the main server configuration file  called postgresql.conf. This parameter can only be set on the Postgres command line.


hba_file:
Specifies the configuration file for host-based authentication called pg_hba.conf. This parameter can only be set at the server start.


ident_file:
Specifies the configuration file for user name mapping called pg_ident.conf. This parameter can only be set at the server start.


postmaster.opts file:
A file recording the command-line options the postmaster was last started with


postmaster.pid file:
 
A lock file recording the current postmaster PID and shared memory segment ID (not present after postmaster shutdown)


PG_VERSION file :
A file containing the major version number of PostgreSQL.


pg_wal directory:
Here the write-ahead logs are stored. It is the log file, where all the logs are stored of committed and uncommitted transactions. It contains a maximum of 6 logs, and the last one overwrites. If the archiver is on, it moves there.


base directory:
Subdirectory containing per-database subdirectories.


global directory:
Subdirectory containing cluster-wide tables, such as pg_database.


pg_multixact directory:
Subdirectory containing multi transaction status data (used for shared row locks).


pg_subtrans directory:
Subdirectory containing subtransaction status data.


pg_tblspc directory:
Subdirectory containing symbolic links to tablespaces.


pg_twophase directory:
Subdirectory containing state files for prepared transactions.

postgres=# show data_directory;
     data_directory
------------------------
 /var/lib/pgsql/13/data
(1 row)

postgres=#
[root@test-machine02 data]# pwd
/var/lib/pgsql/13/data
[root@test-machine02 data]# ls -al | grep '^-'
-rw-------.  1 postgres postgres    30 Jun  1 00:00 current_logfiles
-rw-------.  1 root     root      4548 Apr 26 16:53 pg_hba_bkp.conf
-rw-------.  1 postgres postgres  4626 Apr 27 11:34 pg_hba.conf
-rw-------.  1 postgres postgres  1636 Apr 26 15:07 pg_ident.conf
-rw-------.  1 postgres postgres     3 Apr 26 15:07 PG_VERSION
-rw-------.  1 postgres postgres   234 May 20 16:35 postgresql.auto.conf
-rw-------.  1 postgres postgres 28004 Apr 27 11:33 postgresql.conf
-rw-------.  1 postgres postgres    58 May 20 16:38 postmaster.opts
-rw-------.  1 postgres postgres    96 May 20 16:38 postmaster.pid
[root@test-machine02 data]#
[root@test-machine02 data]# ls -al | grep '^d'
drwx------. 20 postgres postgres  4096 Jun  1 00:00 .
drwx------.  4 postgres postgres    51 Apr 26 15:07 ..
drwx------. 26 postgres postgres  4096 May 25 14:33 base
drwx------.  2 postgres postgres  4096 May 20 17:10 global
drwx------.  2 postgres postgres  4096 May  2 00:47 log
drwx------.  2 postgres postgres     6 Apr 26 15:07 pg_commit_ts
drwx------.  2 postgres postgres     6 Apr 26 15:07 pg_dynshmem
drwx------.  4 postgres postgres    68 May 25 15:08 pg_logical
drwx------.  4 postgres postgres    36 Apr 26 15:07 pg_multixact
drwx------.  2 postgres postgres     6 Apr 26 15:07 pg_notify
drwx------.  2 postgres postgres     6 Apr 26 15:07 pg_replslot
drwx------.  2 postgres postgres     6 Apr 26 15:07 pg_serial
drwx------.  2 postgres postgres     6 Apr 26 15:07 pg_snapshots
drwx------.  2 postgres postgres     6 May 20 16:38 pg_stat
drwx------.  2 postgres postgres  4096 Jun  1 12:26 pg_stat_tmp
drwx------.  2 postgres postgres    18 Apr 26 15:07 pg_subtrans
drwx------.  2 postgres postgres    32 May 25 15:02 pg_tblspc
drwx------.  2 postgres postgres     6 Apr 26 15:07 pg_twophase
drwx------.  3 postgres postgres  4096 May 25 14:03 pg_wal
drwx------.  2 postgres postgres    18 Apr 26 15:07 pg_xact
[root@test-machine02 data]#


2. Physical Files

Data Files: It contains all the data required by the user for processing. Data from data files are first moved to shared buffers and then the data is processed accordingly.


WAL Files/Segments:
 WAL Buffers are linked with WAL Files/Segments. WAL buffer flushes all the data into WAL Segment whenever commit command occurs. Commit is a command which ensures the end of the transaction. Commit work by default after every command in PostgreSQL which can also be changed accordingly.


Archived WAL/Files:
This file segment is a kind of copy of WAL Segment which stores all the archived data of WAL Segment which can be stored for a long period. WAL segment is having limited storage capacity. Once space is exhausted in WAL Segment, the system starts replacing the old data with new data which results in the loss of old data.  Archives are required till full backup of the database. After that, we can even delete the archive in order to gain some space in the DB.


Error Log Files:
Those files which contain all the error messages, warning messages, informational messages, or messages belonging to all the major thing happening to the database. All logs related to DDL and DML changes are not stored in this space. Internal errors, internal bugs entry is stored in this file which helps the admin to troubleshoot the problem.

Data Files : -
[root@test-machine02 14174]# pwd
/var/lib/pgsql/13/data/base/14174
[root@test-machine02 14174]#
[root@test-machine02 14174]# ls -ltr *1255*
-rw-------. 1 postgres postgres  24576 Apr 26 15:08 1255_fsm
-rw-------. 1 postgres postgres 663552 Apr 26 15:08 1255
-rw-------. 1 postgres postgres   8192 Apr 26 15:08 1255_vm
[root@test-machine02 14174]#

WAL Files/Segments : -
[root@test-machine02 pg_wal]# pwd
/var/lib/pgsql/13/data/pg_wal
[root@test-machine02 pg_wal]#
[root@test-machine02 pg_wal]#  ls -ltr
total 147460
-rw-------. 1 postgres postgres 16777216 May 20 16:44 000000010000000000000002
-rw-------. 1 postgres postgres 16777216 May 20 16:46 000000010000000000000003
-rw-------. 1 postgres postgres 16777216 May 20 16:46 000000010000000000000004
-rw-------. 1 postgres postgres 16777216 May 20 16:47 000000010000000000000005

Archived WAL/Files :-
[root@test-machine02 wal_backup]# pwd
/u01/wal_backup
[root@test-machine02 wal_backup]# ls -ltr
total 147456
-rw-------. 1 postgres postgres 16777216 May 20 16:44 000000010000000000000002
-rw-------. 1 postgres postgres 16777216 May 20 16:46 000000010000000000000003
-rw-------. 1 postgres postgres 16777216 May 20 16:46 000000010000000000000004
-rw-------. 1 postgres postgres 16777216 May 20 16:47 000000010000000000000005

Error Log Files :-
[root@test-machine02 log]# pwd
/var/lib/pgsql/13/data/log
[root@test-machine02 log]# ls -ltr
total 8
-rw-------. 1 postgres postgres   0 May 26 00:00 postgresql-Wed.log
-rw-------. 1 postgres postgres   0 May 27 00:00 postgresql-Thu.log
-rw-------. 1 postgres postgres   0 May 28 00:00 postgresql-Fri.log
-rw-------. 1 postgres postgres   0 May 29 00:00 postgresql-Sat.log
-rw-------. 1 postgres postgres 161 May 30 13:39 postgresql-Sun.log
-rw-------. 1 postgres postgres   0 May 31 00:00 postgresql-Mon.log
-rw-------. 1 postgres postgres 356 Jun  1 12:36 postgresql-Tue.log
[root@test-machine02 log]#



3. Data file layout (OID)

All database objects in postgreSQL are managed internally by their respective object identifiers (OIDs), which are assigned 4 byte integers. The OID of the database is stored in the pg_database system table. The OIDs of objects such as tables, indexes, and sequences in the database are stored in the pg_class system table:

In the below example, we derived OID of database, Tablespace & Table and we can relate all OIDs with function pg_relation_filepath, This will give you the exact location where table datafiles located. 

postgres=#
postgres=# \connect dvdrental
You are now connected to database "dvdrental" as user "postgres".
dvdrental=#
dvdrental=# select oid,datname,datdba,dattablespace from pg_database where datname='dvdrental';
  oid  |  datname  | datdba | dattablespace
-------+-----------+--------+---------------
 16384 | dvdrental |     10 |         23704
(1 row)

dvdrental=# select oid,spcname,spcowner from pg_tablespace where spcname='myts01';
  oid  | spcname | spcowner
-------+---------+----------
 23701 | myts01  |       10
(1 row)

dvdrental=# SELECT oid,relname,relkind FROM pg_class WHERE relname = 'address';
  oid  | relname | relkind
-------+---------+---------
 16462 | address | r
(1 row)

dvdrental=# SELECT pg_relation_filepath('address');
            pg_relation_filepath
---------------------------------------------
 pg_tblspc/23704/PG_13_202007201/16384/16462
(1 row)



4. Database Structure

Here are some things that are important to know when attempting to understand the database structure of PostgreSQL.


Items related to the database

  1. PostgreSQL consists of several databases. This is called a database cluster.
  2. When initdb () is executed, template0 , template1 , and postgres databases are created.
  3. The template0 and template1 databases are template databases for user database creation and contain the system catalog tables.
  4. The list of tables in the template0 and template1 databases is the same immediately after initdb (). However, the template1 database can create objects that the user needs.
  5. The user database is created by cloning the template1 database.


Items related to the tablespace

  1. The pg_default and pg_global tablespaces are created immediately after initdb().
  2. If you do not specify a tablespace at the time of table creation, it is stored in the pg_dafault tablespace.
  3. Tables managed at the database cluster level are stored in the pg_global tablespace.
  4. The physical location of the pg_default tablespace is $PGDATA\base.
  5. The physical location of the pg_global tablespace is $PGDATA\global.
  6. One tablespace can be used by multiple databases. At this time, a database-specific subdirectory is created in the tablespace directory.
  7. Creating a user tablespace creates a symbolic link to the user tablespace in the $PGDATA\tblspc directory.


Items related to the table

  1. There are three files per table.
  2. One is a file for storing table data. The file name is the OID of the table.
  3. One is a file to manage table-free space. The file name is OID_fsm.
  4. One is a file for managing the visibility of the table block. The file name is OID_vm.
  5. The index does not have a _vm file. That is, OID and OID_fsm are composed of two files.
postgres=# \l
                                  List of databases
    Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
------------+----------+----------+-------------+-------------+-----------------------
 postgres   | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
            |          |          |             |             | postgres=CTc/postgres
 template1  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
            |          |          |             |             | postgres=CTc/postgres

postgres=# \db
           List of tablespaces
    Name    |  Owner   |    Location
------------+----------+-----------------
 pg_default | postgres |
 pg_global  | postgres |

[root@test-machine02 17115]# pwd
/var/lib/pgsql/13/data/base/17115
[root@test-machine02 17115]#
[root@test-machine02 17115]# ls -ltr *17246*
-rw-------. 1 postgres postgres   8192 May 20 16:46 17246_vm
-rw-------. 1 postgres postgres  24576 May 20 16:46 17246_fsm
-rw-------. 1 postgres postgres 884736 May 20 16:46 17246
[root@test-machine02 17115]#
[root@test-machine02 17115]#

postgres=#
postgres=# select oid,datname,datdba,dattablespace from pg_database where oid=17115;
  oid  | datname | datdba | dattablespace
-------+---------+--------+---------------
 17115 | db1     |     10 |          1663
(1 row)

postgres=# \connect db1
You are now connected to database "db1" as user "postgres".
db1=# SELECT oid,relname,relkind FROM pg_class WHERE oid=17246;
  oid  | relname | relkind
-------+---------+---------
 17246 | payment | r
(1 row)

db1=# SELECT pg_relation_filepath('payment');
 pg_relation_filepath
----------------------
 base/17115/17246
(1 row)

db1=#

[root@test-machine02 pg_tblspc]#
[root@test-machine02 pg_tblspc]# pwd
/var/lib/pgsql/13/data/pg_tblspc
[root@test-machine02 pg_tblspc]#
[root@test-machine02 pg_tblspc]# ls -ltr
total 0
lrwxrwxrwx. 1 postgres postgres 14 May 25 14:03 23701 -> /u01/psql_tbls
lrwxrwxrwx. 1 postgres postgres 15 May 25 15:02 23704 -> /u01/psql2_tbls
[root@test-machine02 pg_tblspc]#



Click on below to know more about the Memory and Process of PostgreSQL.

PostgreSQL Architecture – Memory Components
PostgreSQL Architecture – Process


Recent articles:

Hope so you like this article!
Please share your valuable feedback/comments/subscribe and follow us below and don’t forget to click on the bell icon to get the most recent update. Click here to understand more about our pursuit.

Loading

How useful was this post?

Click on a star to rate it!

Average rating / 5. Vote count:

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?