[PostgreSQL] Stats Collector has been removed from PostgreSQL 15

Those trying out the upcoming PostgreSQL 15 will notice one less background process:

postgres    1710       1  0 04:03 ?        00:00:00 /usr/pgsql-15/bin/postmaster -D /var/lib/pgsql/15/data/postgres    1711    1710  0 04:03 ?        00:00:00 postgres: logger postgres    1712    1710  0 04:03 ?        00:00:00 postgres: checkpointer postgres    1713    1710  0 04:03 ?        00:00:00 postgres: background writer postgres    1715    1710  0 04:03 ?        00:00:00 postgres: walwriter postgres    1716    1710  0 04:03 ?        00:00:00 postgres: autovacuum launcher postgres    1717    1710  0 04:03 ?        00:00:00 postgres: logical replication launcher

Let's compare it with PostgreSQL 14:

postgres    1751       1  0 04:04 ?        00:00:00 /usr/pgsql-14/bin/postmaster -D /var/lib/pgsql/14/data/postgres    1752    1751  0 04:04 ?        00:00:00 postgres: logger postgres    1754    1751  0 04:04 ?        00:00:00 postgres: checkpointer postgres    1755    1751  0 04:04 ?        00:00:00 postgres: background writer postgres    1756    1751  0 04:04 ?        00:00:00 postgres: walwriter postgres    1757    1751  0 04:04 ?        00:00:00 postgres: autovacuum launcher postgres    1758    1751  0 04:04 ?        00:00:00 postgres: stats collector postgres    1759    1751  0 04:04 ?        00:00:00 postgres: logical replication launcher

Yes, the stats collector process is gone. But getting rid of this process is a good thing, a major bottleneck and headache is gone forever.

What is the job of stats collector?

Novice users may wonder what it is and why it is required for PG 14 and earlier. At least some users will be confused about the collection of table-level statistics (ANALYZE) used for query plans. But this is different. PostgreSQL tracks all activity of each process to obtain cumulative statistics, such as the number of times a table or index was scanned, or the last time vacuum or autovacuum was run on a table, or the number of times autovacuum was run on a table, etc. All data collected by the stats collector is available through the different pg_stat_* views.

question

Since each backend of a session is a separate process in PostgreSQL, gathering statistics and transferring them is not trivial. Each backend sends information about the activities they do to a single "stats collector" process.

This communication used to be via UDP sockets. There are a lot of problems with this approach, it's not a scalable model.

Users often report different types of issues, such as: outdated stats, stats collector not running, autovacuum not working/starting, etc.

It used to be really hard to understand what went wrong if the stats collector had a problem on a particular machine.

Another adverse effect of the "stats collector" is the IO it causes. If you enable DEBUG level 2, you may see messages that keep appearing in the PostgreSQL log, such as:

2022-08-22 03:49:57.153 UTC [736] DEBUG:  received inquiry for database 02022-08-22 03:49:57.153 UTC [736] DEBUG:  writing stats file "pg_stat_tmp/global.stat"2022-08-22 03:49:57.153 UTC [736] DEBUG:  writing stats file "pg_stat_tmp/db_0.stat"2022-08-22 03:49:57.168 UTC [1278] DEBUG:  autovacuum: processing database "postgres"2022-08-22 03:49:57.168 UTC [736] DEBUG:  received inquiry for database 138812022-08-22 03:49:57.168 UTC [736] DEBUG:  writing stats file "pg_stat_tmp/global.stat"2022-08-22 03:49:57.168 UTC [736] DEBUG:  writing stats file "pg_stat_tmp/db_13881.stat"2022-08-22 03:49:57.169 UTC [736] DEBUG:  writing stats file "pg_stat_tmp/db_0.stat"

This can cause a lot of IO to the mount point where the data directory is located. This is where the value of the parameter stats_temp_directory points to. On many systems it will be pg_stat_tmp in the data directory.

On Ubuntu/Debian it will be in /var/run/postgresql, for example:

postgres=# show stats_temp_directory ;          stats_temp_directory           ----------------------------------------- /var/run/postgresql/14-main.pg_stat_tmp(1 row)

New in PostgreSQL 15

Start using dynamic shared memory to collect statistics instead of using files and filesystems.

Previously, the statistics collector received statistics updates over UDP and shared statistics by periodically writing the statistics to a temporary file. These files can reach tens of megabytes and are written up to two times per second. This prevents us from adding other useful stats.

Statistics are now stored in shared memory. Statistics for variable numbered objects are stored in a dshash hash table (backed by dynamic shared memory). Fixed-numbered statistics are stored in ordinary shared memory.

The header file for pgstat.c contains an overview of the architecture.

The stats collector is no longer needed, remove it.

Replica now removes stat entries for deleted objects, starting from a cleanly shut down replica no longer needs to reset stats.

Apparently, the parameter stats_temp_directory is missing. Therefore, we don't need the pg_stat_tmp directory, which is in the data directory (or elsewhere), where all statistics files are generated and read. However, this directory is kept because many extensions that depend on it, such as pg_stat_statements, are not broken.

The directory remains empty until the extension library is loaded. For example, if we load the pg_stat_statements library, a file will appear in the directory.

$ ls pg_stat_tmp/pgss_query_texts.stat

Of course, extensions are not free, they also have corresponding costs.

In the new architecture, most stat updates are first accumulated as "pending" locally in each process (each backend has a backend local hash table). "pending" means they have been accumulated but not yet committed to the shared stats system. It will then be flushed to shared memory after a commit or timeout.

Read consistency occurs because statistics are updated at the same time someone tries to read. So PostgreSQL 15 introduces a new parameter: stats_fetch_consistency, which can take three values of none, cache or snapshot:

· "none" is the most efficient. However, read consistency will not be provided. But for most usage it should be fine.

"cache" ensures that repeated accesses yield the same value, which is necessary for cases involving e.g. self-joins.

"snapshot" is useful when checking statistics interactively, but is more expensive.

Defaults to "cache"

If it is in shared memory, how to keep it after restart?

Before being shut down, the checkpoint process will write these statistics to the file system, which can be loaded again after restart. Usually, if it crashes, statistics are lost.

Will this change affect my monitoring/scripting?

The monitoring views pg_stat_* will continue to function. However, make sure to get the proper value for stats_fetch_consistency. As mentioned above, the pg_stat_tmp directory is reserved for extensions only.

other

Many people, like me, use postgresql's wait events to understand postgresql and where sessions spend their time. Data collection and analysis tools, such as pg_gather, analyze problems by using wait events.

To better monitor postgresql, three new wait events have been introduced:

PgStatsDSA: Wait for statistics dynamic shared memory allocator access

PgStatsHash: Wait for stats shared memory hash table access

PgStatsData: Waiting for shared memory statistics access

With all the overhead of the statistics collector and its maintenance gone, other subsystems (like autovacuum) have less work to do.

Additionally, monitoring tools that frequently query statistics are expected to significantly reduce system load.