Blog

May 26, 2021

Exploring the PostgreSQL System Catalogs

Databases,

Open Source

PostgreSQL, referred to simply as Postgres, is a relational database engine focused on high performance, scalability, and enterprise readiness. It shares characteristics with MySQL and MariaDB. Although as an object-relational database, the features go well beyond what MySQL and other purely relational database engines offer. Postgres can handle complex queries, massive databases, function overloading, table inheritance, and much more. In short, it supports a number of enterprise application and database tasks.

In this blog, we look at PostgreSQL System Catalogs, with an overview and glossary of commonly used catalogs, pg_stat_user_tables, pg_statio_user_indexes, pg_statio_user_tables, pg_stat_activity, pg_database, and pg_locks.

PostgreSQL System Catalogs
Get Support and Guidance for Your Open Source Data Stack

PostgreSQL System Catalogs

Inside of Postgres, there’s an extensive system of data catalogs available. When you’re troubleshooting, this is where you’re going to spend the bulk of your time. In short, the data catalogs are where Postgres stores schema metadata. This includes information about tables and columns, internal bookkeeping information, and database performance variables as a very short list of examples. When hooking Postgres up to a monitoring solution, this is where your monitoring data ultimately comes from. Below, we will explore some key system catalogs in greater depth.

pg_stat_user_tables

This catalog returns valuable information pertaining to the database engine activity behind particular tables. Seq_scan (sequential scans) and seq_tup (rows added and deleted) count the number of sequential scans of tables and the number of tuples (rows) read during those scans.

pg_stat_user_tables Glossary

Idx_scan indicates how many times and index was used when seeking. In general, you want to see a higher number of indexed scans as opposed to full table scans.
n_tup field indicates the various CRUD operation (in number of tuples or rows) statistics.
n_live and n_dead tuples show how many active and dead rows. Dead rows indicate how close an autovacuum is.
Autovacuum statistics are displayed as well. This will show you when you ran your last autovacuum.

pg_statio_user_indexes

This offers granular detail on how often indexes are being used inside of the database. Administrators can check this catalog and in general you will hope to see a high number. Relative to operations you are running against a database, your indexes should be utilized as often as possible. If this view returns a low index number, you likely may have an issue with your table relations design. Or you may not have have appropriately set up your index field.

pg_statio_user_indexes Glossary

idx_blks_read indicates the number of disk blocks that were retrieved as part of a seek involving the index.
idx_blks_hit indicates that a large amount of data is read from memory using an index, which is a good thing.
If you have slow performance, compare these numbers to the number of sequential table scans listed in pg_stat_user_tables.

pg_statio_user_tables

Going a layer deeper beyond logical indexes, this shows where within our memory we are being hit. It’s complementary to pg_stat_user_tables, and shows IO information from several mediums. This can answer questions like, how much memory utilization are we seeing?

pg_statio_user_tables Glossary

heap_blks_read and heap_blks_hit indicate the overall number of blocks read from disk and memory, respectively.
idx_blks_read and idx_blks_hit indicate the same thing, except from an index scan.
toast_blks_read, toast_blks_hit, and tidx_blks_read and tidx_blks_hit all relate to Postgres’s TOAST storage.

pg_stat_activity

When clients enter the database and request a new connection, the Postgres main process generates a backend process, and that backend process is basically a user session. Ultimately, we see those sessions and information on individual sessions within the pg_stat_activity catalog. This information is organized by name of database, application hitting the database, utility for looking at that database, as well as the bid and process ID. This information is very useful if you have a client who appears to be stuck against the database.

pg_database

When you run this catalog against your own system catalog internally, it will return information on a number of different databases. Why? The pg_database view displays information about each database, with one database represented per row.

pg_database Glossary

datdba is the user id of the database owner.
Encoding indicates the character encoding, and can be revealed with pg_encoding_to_char().
datcollate and datctype call out the collation options selected for the database.
datistemplate indicates whether this database has been marked as a template for re-use.
datallowcon represents whether outbound connections are allowed into the database and datconnlimit reports on the maximum number of connections allowed to the database.
datalastsysoid is the object ID of the last system object in the database which helps pg_dump understand when it is finished with an output.
datfrozenxid contains the ‘next’ transaction ID to be used. This is important to the vacuum process, which needs to know if this value is about to wraparound and needs to be cleaned. minmxid is the same for multixact IDs.
dattablespace is the object ID of the default tablespace for the database.
datacl returns an ACL object.

pg_locks

Postgres is a locking database that supports several different types of locks. If we want to view how each lock is functioning, this is where the pg_locks catalog comes into play. You can also see the lock mode, Exclusive vs. Share, etc.. If the lock is currently held, then the “granted” Boolean will be true. For “Fast path” locking, locking is indicated by fastpath 0.

Get Support and Guidance for Your Open Source Data Stack

Do you need PostgreSQL support? At OpenLogic, a highly experienced enterprise architect responds to every support ticket. Going well beyond bug fixes, we help you solve complex technology problems. This often includes security, performance issues, interoperability, or getting started with an emerging open source platform.

Need help with a tough integration? Have questions about how to optimize your open source database performance? All you need to do is call or submit a ticket and support is on the way.

TALK TO AN EXPERT

Additional Resources

Blog - What Is PostgreSQL?
Blog - PostgreSQL vs. MySQL: Key Differences and Similarities
Blog - Architecting Applications With Apache Cassandra
Blog - EnterpriseDB vs. Postgres: Availability, DR, Security, & Performance
Guide - Decision Maker's Guide to Open Source Databases
Resource Collection – Intro to Open Source Databases

Featured Product

Kafka Service Bundle

Services

Training

Taking an Open Source Approach to Big Data Management