Implementing PostgreSQL With Patroni
For organizations who want to simplify PostgreSQL clustering, Patroni is a popular option. In this blog, we look at the PostgreSQL Patroni combination, how it works, how to configure it, and commands and tips to help you along the way.
- What Is Patroni?
- How Does Patroni Work?
- Installing Patroni on CentOS
- Patroni Configuration
- Patroni PostgreSQL Configuration
- Patroni Commands
- Patroni Replication Modes
- Patroni REST API
- Final Thoughts
What Is Patroni?
Patroni is a Python-based open-source PostgreSQL template and controller, in which you can run high-availability Postgres clusters. Patroni is technically a Python daemon, in which single Python daemons manage single PostgreSQL instances. It operates on the same system as PostgreSQL and was created out of a need for an automatic failure option for Postgres, while it leverages the permissive MIT license.
Patroni uses a distributed configuration store to track which Postgres instance is the master. Likewise, it can be configured to work with Zookeeper, etcd, or Consul configuration stores.
PostgreSQL Patroni implementations can simplify the overall cluster management lifecycle significantly.
How Does Patroni Work?
Approximately every 10 seconds, Patroni checks every Postgres node in the cluster for the presence of an existing primary in the Distributed Configuration Store (DCS). If no primary is found, it inserts a key in the DCS to claim the primary location. If this server happens to be the primary, it informs HAProxy to use this as the new master. On the other hand, if a primary is found, it performs a variety of checks and attempts to transform the current server into a replica version. It’s truly that simple.
In the example above, we have 3 Postgres instances (node A, B, C) running on the same box. On the right is the Distributed Configuration Store (DCS) running on etcd. In this example there’s one server, with the DCS clustered together. Keep in mind, etcd (the DCS) is simply storing information about who is the primary, the members in the cluster, the cluster name, and the status of members. Above, Postgres is not talking directly to etcd. Again, Patroni runs on a bot within the same box and acts as the mediator to Postgres.
When the Master Becomes Unavailable
Within the 10 second span, the bots on Node B and Node C will begin asking where the leader has gone.
The leader key will expire and signal Node B and Node C that they need to activate as the leader. This will create a leader race.
Node A will then disappear. Node B will query node C to figure out where its’ write-ahead log position is. This will inform the node how caught up each remaining node is and was when the failure took place. If one node is more caught up than another, that will become the new leader. If not, the node that makes the first call to etcd becomes the new leader.
Above, node C becomes the primary.
Node B will then start replicating from Node C. And then when HAPROXY hits each nodes’ REST API ports, it will figure out that node A is no longer available, and which of the remaining nodes is the leader. /
Get Support for PostgreSQL
No matter your technical challenge or needs, OpenLogic can provide a solution. With support for over 450 open source packages, we can help you with all things PostgreSQL.
Installing Patroni on CentOS
Patroni is relatively straightforward to install. Here’s a quick step-by-step guide:
- Install the Extra Packages for Enterprise Linux repository
- Install wget, git, etcd, and Package Index for Python, curl, and the JSON processor
- Install PostgreSQL 13
- Upgrade the package manager for Python
- Install the PostgreSQL adapter for Python
Configuration is also relatively straightforward with Patroni. After you have installed Patroni, the bin_dir (bin directory) of the potgres0.yml, postgres1.yml, and postgres2.yml will need to be adjusted to the Postgres server installation in the following manner:
- listen: 127.0.0.1:5432
- connect_address: 127.0.0.1:5432
- data_dir: data/postgressql0
- bin_dir: /usr/pgsql-13/bin
PostgreSQL Patroni Configuration
While the general installation and configuration process is relatively easy, there are a few nuances you will need to be aware of in order to keep Patroni functioning properly. There are three configuration types including dynamic, local, and environment configuration. We walk through each configuration in detail below.
Dynamic configuration (etcd) is the global configuration of the cluster. When you change a value in etcd, you are effectively altering the configuration of every PostgreSQL node in the cluster in a global fashion. Changes are applied asynchronously during the next available wake cycle. Here are some of the common configuration settings in etcd that can be changed based on cluster-wide needs:
- max_connection (100)
- max_locks_per_transaction (64)
- max_worked_process (8)
- max_prepared_transactions (0)
- wal_level (hot_standby)
- wal_log_hints (on)
- track_commit_timestamp (off)
Your local configuration (patroni.yml) will override anything set in a cluster-wide dynamic configuration (etcd). You can apply local changes using patronictl reload or by sending a POST / reload REST API request.
If you are running Postgres in containers, environment configurations can be useful in allowing you to override local configurations. If you are not running Postgres in a containerized environment, you likely will not need to worry about the environment configurations.
There are a variety of Patroni commands you may leverage on a regular basis. Here are some of the most important:
Patronictl.py -c postgres0.yml show-config batman
Show the cluster-wide configuration.
Patronictl.py -c postgres0.yml failover batman
Allows the user to perform manual failover when there are no healthy nodes, but at the same time it will not allow you to schedule a switchover.
Patronictl.py -c postgres0.yml switchover batman
Opposite of a failover. Only works when the cluster is healthy and allows to schedule a switchover at a given time.
Shows the maximum number of seconds a replica will notice a missing master.
The maximum bytes a follower may lag to be able to participate in leader election.
Timeout for DCS and PostgreSQL operation retries. DCS or network issues shorter than this will not cause Patroni to demote the leader.
Patroni Replication Modes
There are two PostgreSQL streaming replication modes, including asynchronous and synchronous. By default Patroni configures PostgreSQL for asynchronous replication. The replication mode you select will depend heavily on your business needs and technology implementation.
Asynchronous Mode Durability
This is Patroni’s default replication mode. The cluster is allowed to lose some transactions to remain highly available. Any non-replicated transactions will be ported to a standby to remain in a “forked timeline” on the primary and will not be recovered. maximum_lag_on_failover controls the amount of transactions can be lost by a standby.
PostgreSQL Synchronous Replication
Synchronous replication confirms that writes have been committed to a secondary before returning to the client with a success confirmation. In general, you will get less throughput on writes with synchronous replication. This can cause issues with Postgres instances running in separate data centers due to network latency. If follower nodes are not accessible by the master, then the master defaults to read-only. It is still possible to lose transactions in this mode. To prevent transaction loss, you can enable synchronous_mode, or synchronous_mode_strict.
Patroni REST API
The Patroni REST API allows you to perform all of the tasks that you would perform within Patroni via an external API. The REST API can be use by HAProxy, or any other kind of load balancer, such as F5 for performing HTTP health checks. This API could also be used as a monitoring proxy, such as Prometheus. This is worth checking out, because you simply get a huge wealth of data with the Patroni REST API. Finally, note that all data within the Patroni REST API is stored as JSON data.
As a cluster management tool for automating deployment and management of PostgreSQL high-availability clusters, Patroni can really streamline your Postgres footprint. While getting up and running may take a bit of time, the long-term benefits are well worth it.
Get Implementation Services
Still having trouble understanding how to get Patroni up and running? We offer implementation services that can get get you up and running fast.
- Blog – What Is PostgreSQL?
- Blog – Exploring the PostgreSQL System Catalogs
- Blog – PostgreSQL Support Options
- Blog – EnterpriseDB vs. PostgreSQL
- Blog – PostgreSQL vs. MongoDB
- Blog – PostgreSQL vs. MySQL
- Resource Collection - Intro to Open Source Databases