picture

picture

picture
picture
foreword

Fujitsu has made some improvements to logical replication communication in PostgreSQL 15. Let me show you some details. In this post, I will explain the improvements introduced in PostgreSQL 15 to address the following two issues in logical replication communication:

1. If all DMLs in the transaction are not published according to the subscription filter, the walsender will send an empty transaction, which will cause a waste of resources such as CPU/memory/network.

2. When processing a large transaction, if the walsender is busy processing the unpublished DML in the transaction, it may be unable to communicate with the walreceiver for a long time. This can lead to unexpected timeout errors even though walsender is working as expected.

picture
picture
Content outline

> Communication in Logical Replication
   > Communication Information Types
   > Filtering
> Overview
   of Improvements > Improvements to Null Transactions
   > Fixed Unexpected Timeout Error
> Performance Impact
> Future Plans

picture
picture
Communication in logical replication

Before going any further, I would like to briefly introduce two concepts of communication in logical replication: types and filtering of communication messages.

Type of communication message

In logical replication, there are two types of messages sent by walsender to walreceiver:

picture

01
keepalive message

This message is used to tell the walreceiver that the walsender is working as expected.
The user can set the timeout for the walreceiver GUC using the wal_receiver_timeout parameter (60 seconds by default). Within wal_receiver_timeout, if the walreceiver does not receive any kind of message from the walsender, it will exit with a timeout error.

02
Logical Replication Agreement Messages

There are many types of logical replication protocol messages (reference 1 at the end of the article), in this blog we will focus on only two of them:

  • DML messages, including INSERT, UPDATE, DELETE, and TRUNCATE;

  • Messages that define the start and end of a transaction, such as BEGIN and COMMIT messages.

A full list of logical replication protocols can be viewed on the PostgreSQL website.

picture
picture
filter

You may not realize this, but not all DML in the transaction is sent to the walreceiver. This is because filters can be specified on a table, row, or action type when creating a publication. Therefore, some DMLs will not be sent if the filter conditions are met.

picture
picture
Improvement overview

Now, I'll describe how the two issues mentioned at the beginning of this article were improved/fixed separately.

picture
picture
Improvements to null transactions

If all DMLs in a transaction are filtered out during decoding, then we call the transaction an empty transaction. Before PostgreSQL 15, the standard logical decoding plugin pgoutput (the logical replication plugin used by default in PostgreSQL) would send every transaction to walreceiver. Even for an empty transaction, although no DML-related messages are sent, messages that define the start and end of the transaction are sent. Building/transmitting these empty transactions is a waste of CPU cycles and network bandwidth. We can see the process below.

How PostgreSQL handled empty transactions prior to version 15

picture

To solve this problem, we let the walsender postpone the BEGIN message until the first DML message is sent. At the end of decoding, if no BEGIN message has been sent, no COMMIT message is sent either. Detailed development information can be viewed on GitHub.

As for non-null transactions, there is also a change in when the message is sent. Let's look at this change with an example - let's say we have the following transaction T1:

ALTER TABLE tab_1 DISABLE TRIGGER ALL;
导入数据
ALTER TABLE tab_1 ENABLE TRIGGER ALL;postgres=#BEGIN;
BEGIN
postgres=*# INSERT INTO tab_not_publish VALUES (1) ; -- 这个DML会被过滤掉
INSERT 0 1
postgres=*# INSERT INTO tab_publish VALUES (1); -- 这个 DML 将被发布
INSERT 0 1
postgres=*# COMMIT;
COMMIT

The sending sequence of BEGIN message and COMMIT message is shown in the following figure.
When to send BEGIN and COMMIT messages - before modification

picture

As we saw above, only INSERT messages for table tab_publish can be sent from walsender to walreceiver.

Timing of sending BEGIN and COMMIT messages - modified

picture

As shown in the figure above, after modification, if all DMLs in a transaction are filtered out, then BEGIN messages and COMMIT messages will not be sent. This way the walsender can skip sending empty transactions.

However, in synchronous logical replication , before PostgreSQL15, in order to confirm that the data has been synchronized, when the walreceiver receives a COMMIT message of an empty transaction, it will synchronize the local data and send a feedback message to the walsender to confirm that the data has been synchronized.

The walsender will only continue after receiving a feedback message from the walreceiver, otherwise the walsender will prevent the client backend from committing the transaction. So in PostgreSQL 15, after walsender skips an empty transaction in synchronous logical replication, it sends keep-alive message to walreceiver and requests feedback message. Then the walreceiver will synchronize the data and send a feedback message to the walsender based on this message. This way we can avoid situations where transactions delay responses in synchronous logical replication. The following diagram shows the changes in communication in synchronous logical replication.

Communication in synchronous logical replication - before modification

picture

picture

Communication in Synchronous Logical Replication - Modified

picture

picture
picture
fix unexpected timeout bug

Before PostgreSQL 15, if a transaction had many consecutive DMLs that were not published, this would cause the walsender to be unable to communicate with the walreceiver for a long time, because the walsender would be busy decoding these unpublished DMLs. In this case, even though the walsender is working as expected, since the walreceiver has not received any messages from the walsender within the specified timeout, this will cause the walreceiver to receive an unexpected timeout error.

In PostgreSQL 15, to avoid this error, the walsender keeps communicating with the walreceiver periodically. So when the walsender processes a certain threshold of DMLs (whether those DMLs are published or not), it will try to send keep-alive messages to the walreceiver when needed to maintain communication. Details about this development are available in GitHub.

Suppose we have a transaction T2, and there are many DMLs in T2, but none of them will be published. As shown in the figure below, when transaction T2 is processed, the communication between walsender and walreceiver changes.

Communication between walsender and walreceiver - before modification

picture

Communication between walsender and walreceiver - modified

picture

As can be seen above, we set the threshold to 100 in the PostgreSQL kernel (after performance testing, it is confirmed that this threshold can solve this timeout error without reducing performance, refer to 2 at the end of the article). This way the walreceiver doesn't get this unexpected timeout error when the walsender works as expected.

picture
picture
performance impact

Finally, I'll share the performance test results of the null transaction improvement.
As mentioned earlier, after improvements to the handling of empty transactions, the walsender no longer sends empty transactions containing only BEGIN and COMMIT messages to the walreceiver. This reduces network traffic and improves performance. In my tests, I found that after the improvement, when decoding empty transactions, in asynchronous logical replication, walsender will transmit 97 bytes less; in synchronous logical replication, although walsender will transmit an additional keep-alive message, but The total transfer size is still reduced by 79 bytes. Next, we look at the performance improvement in terms of network bandwidth consumption through the test results.

Obviously, the proportion of empty transactions in transactions transmitted by walsender affects the test results, so we tested five different proportions of empty transactions. In addition, after the improvement, in synchronous logical replication, if walsender skips an empty transaction, an additional keep-alive message will be sent, so both synchronous logical replication and asynchronous logical replication are tested. As shown below, the performance of both asynchronous logical replication and synchronous logical replication has improved. (The x-axis represents the proportion of empty transactions in transfer transactions. The y-axis represents the total amount of data transferred by the walsender to the walreceiver, in bytes.)

Performance Comparison - Asynchronous Replication

picture

Performance Comparison - Synchronous Replication

picture

It can be seen from the test results that the higher the proportion of empty transactions, the greater the improvement. When the proportion of empty transactions is 25%, 50%, 75% and 100%, the network transfer is reduced by about 8%, 22%, 40% and 84% respectively. There is also no performance degradation when there are no empty transactions.

picture
picture
Future plan

picture

In this post, I explain improvements in logical replication performance and functionality due to skipping empty transactions and fixing unexpected timeout errors. However, the mechanism for skipping empty transactions still has some limitations.

For example, two-phase commit does not skip empty transactions. This is because if the walsender restarts between the prepare transaction and the commit of the prepare transaction, we have no way of knowing whether the prepare transaction was skipped on commit. Due to the same issue, streaming of in-progress transactions (set by the "streaming" parameter of subscription_parameter) is not improved either. Soon, we may try to improve the handling of hollow transactions for both of the above transactions in a better way.

Additionally, to improve the efficiency of applying ongoing transaction streams on the walreceiver, our team has shared a patch for applying transactions in parallel in the walreceiver (reference 3 at the end of the article) and has been actively discussing in the community for continuous improvement.

Reference to this post
1. https://www.postgresql.org/docs/15/protocol-logicalrep-message-formats.html
2. https://www.postgresql.org/message-id/OS3PR01MB6275C67F14954E05CE5D04399E139%40OS3PR01MB6275.jpnprd01. prod.outlook.com
3. https://www.postgresql.org/message-id/flat/CAA4eK1%2BwyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw%40mail.gmail.com

picture

picture

picture
Click here to read the original text