Testing and Tuning PGD clusters v5
You can test PGD applications using the following approaches:
Trusted Postgres Architect
Trusted Postgres Architect is the system used by EDB to deploy reference architectures, including those based on EDB Postgres Distributed.
Trusted Postgres Architect includes test suites for each reference architecture. It also simplifies creating and managing a local collection of tests to run against a TPA cluster, using a syntax like the following:
We strongly recommend that developers write their own multi-node suite of Trusted Postgres Architect tests that verify the main expected properties of the application.
pgd_bench
The Postgres benchmarking application
pgbench
has been
extended in PGD 5.0 in the form of a new applications: pgd_bench
.
pgd_bench
is a regular command-line utility that's added to PostgreSQL's bin
directory. The utility is based on the Community PostgreSQL pgbench
tool but
supports benchmarking CAMO transactions and PGD specific workloads.
Functionality of the pgd_bench
is a superset of those of pgbench
but
requires the BDR extension to be installed in order to work properly.
Key differences include:
- Adjustments to the initialization (
-i
flag) with the standardpgbench
scenario to prevent global lock timeouts in certain cases VACUUM
command in the standard scenario is executed on all nodespgd_bench
releases are tied to the releases of the BDR extension and are built against the corresponding PostgreSQL flavour (this is reflected in the output of--version
flag)
The current version allows users to run failover tests while using CAMO or regular PGD deployments.
The following options were added:
- Use
-m camo
or-m failover
to specify the mode for pgd_bench. You can use The-m failover
specification to test failover in regular PGD deployments.
- Use
--retry
to specify whether to retry transactions when failover happens with-m failover
mode. This option is enabled by default for-m camo
mode.
In addition to these options, you must specify the connection information about the peer node for failover in DSN form.
Here's an example in a CAMO environment:
This command runs in CAMO mode. It connects to node1 and runs the tests. If the connection to node1 is lost, then pgd_bench connects to node2. It queries node2 to get the status of in-flight transactions. Aborted and in-flight transactions are retried in CAMO mode.
In failover mode, if you specify --retry
, then in-flight transactions are
retried. In this scenario there's no way to find the status of in-flight
transactions.
Notes on pgd_bench usage
When using custom init-scripts it is important to understand implications behind the DDL commands. It is generally recommended to wait for the secondary nodes to catch-up on the data-load steps before proceeding with DDL operations such as
CREATE INDEX
. The latter acquire global locks which can't be acquired until the data-load is complete and thus may time out.No extra steps are taken to suppress client messages, such as
NOTICE
s andWARNING
s emitted by PostgreSQL and or any possible extensions including the BDR extension. It is the user's responsibility to suppress them by setting appropriate variables (e.g.client_min_messages
,bdr.camo_enable_client_warnings
etc.).
Performance testing and tuning
PGD allows you to issue write transactions onto multiple master nodes. Bringing those writes back together onto each node has a cost in performance.
First, replaying changes from another node has a CPU cost, an I/O cost, and it generates WAL records. The resource use is usually less than in the original transaction since CPU overheads are lower as a result of not needing to reexecute SQL. In the case of UPDATE and DELETE transactions, there might be I/O costs on replay if data isn't cached.
Second, replaying changes holds table-level and row-level locks that can produce contention against local workloads. The conflict-free replicated data types (CRDT) and column-level conflict detection (CLCD) features ensure you get the correct answers even for concurrent updates, but they don't remove the normal locking overheads. If you get locking contention, try to avoid conflicting updates, or keep transactions as short as possible. A heavily updated row in a larger transaction causes a bottleneck on performance for that transaction. Complex applications require some thought to maintain scalability.
If you think you're having performance problems, develop performance tests using the benchmarking tools. pgd_bench allows you to write custom test scripts specific to your use case so you can understand the overheads of your SQL and measure the impact of concurrent execution.
If PGD is running slow, then we suggest the following:
- Write a custom test script for pgd_bench, as close as you can make it to the production system's problem case.
- Run the script on one node to give you a baseline figure.
- Run the script on as many nodes as occurs in production, using the same number of sessions in total as you did on one node. This technique shows you the effect of moving to multiple nodes.
- Increase the number of sessions for these two tests so you can plot the effect of increased contention on your application.
- Make sure your tests are long enough to account for replication delays.
- Ensure that replication delay isn't growing during your tests.
Use all of the normal Postgres tuning features to improve the speed of critical parts of your application.
- On this page
- Performance testing and tuning