Pgpool-II
Pgpool-II is a tool available on application instances to help manage your connection pool and load balancing of your PostgreSQL add-ons.
What’s Pgpool-II?
Pgpool-II is a proxy software that sits between PostgreSQL servers and a PostgreSQL database client.
You can read the official project page for more information.
Why use Pgpool-II?
Connection pooling: Pgpool-II maintains established connections to the PostgreSQL servers, and reuses them whenever a new connection with the same properties (user name, database, protocol version, and other connection parameters if any) comes in. It reduces the connection overhead, and improves the system’s overall throughput.
Load balancing: If a database is replicated (because running in either replication mode or native replication mode), performing a SELECT
query on any server will return the same result. Pgpool-II takes advantage of the replication feature in order to reduce the load on each PostgreSQL server. It does that by distributing SELECT
queries among available servers, improving the system’s overall throughput. In an ideal scenario, read performance could improve proportionally to the number of PostgreSQL servers. Load balancing works best in a scenario where a lot of users execute many read-only queries at the same time.
Automated fail over: If one of the database servers goes down or becomes unreachable, Pgpool-II will detach it and will continue operations by using the rest of the database servers. Some sophisticated features that help for automated failovers including timeouts and retries.
Limiting Exceeding Connections: PostgreSQL puts a limit on the amount of concurrent connections, so that new connections are rejected when this number is reached. Raising this maximum number of connections, however, increases resource consumption and has a negative impact on overall system performance. Pgpool-II also limits the number of concurrent connections, but extra connections will be queued instead of returning an error immediately. Or you can configure it to return an error when the connection limit is exceeded (4.1
or later).
In Memory Query Cache: In memory query cache allows to save a SELECT
statement with its result. If an identical SELECT
comes in, Pgpool-II returns the value from cache. Since neither SQL parsing nor access to PostgreSQL are involved, using in memory cache is extremely fast. On the other hand, it might be slower than the normal path in some cases, because it adds some overhead of storing cache data.
The various features are available in the official Pgpool documentation.
How to configure Pgpool-II
Generate direct variables for PG
Go to the “Add-on dashboard” tab of your PostgreSQL add-ons and click on the “Generate direct hostname and port” button.
Enable Pgpool-II
In order to configure Pgpool-II, the first thing to do is to link your PostgreSQL add-on. To do that, you can go to the Service Dependencies
page of your application
and select your PostgreSQL add-on.
Once linked, you can enable the Pgpool-II feature by defining the following environment variable: CC_ENABLE_PGPOOL=true
.
If you ever need to stop using Pgpool-II, you can remove this variable or set it to false
.
Your application will use a Unix Domain Socket to connect to Pgpool-II. Unix domain sockets are faster than TCP sockets because there is no handshake and the connection is made locally on the same machine. A special environment variable will be injected to your environment variables: CC_PGPOOL_SOCKET_PATH
. This variable contains the path to the Unix Domain Socket you have to connect to. See Usage below for some examples on how to use it.
The available Pgpool-II variables and their descriptions are available on our environment variables reference page.
Concurrent session and pool size
You can use the CC_PGPOOL_NUM_INIT_CHILDREN
and CC_PGPOOL_MAX_POOL
environment variables to set the number of concurrent sessions and the number of connection pool caches per connection.
Let’s take an example where you have 3 PostgreSQL servers, and the CC_PGPOOL_NUM_INIT_CHILDREN
variable configured with a value of 32.
On startup, Pgpool-II will start 32 processes, one process per connection. The CC_PGPOOL_MAX_POOL
variable is used to configure the number of cached connections per connection (process). This option is mostly used when you have different credentials for the same PostgreSQL server (database, user…). But in the case of Streaming mode on Clever Cloud, you have access to only one database, and the credentials are the same on all add-ons with binary replication. You should not need to change the default value of this variable (1 by default).
An example of the SHOW POOL_POOLS;
command with CC_PGPOOL_NUM_INIT_CHILDREN=32
and CC_PGPOOL_MAX_POOL=1
:
psql -U u1mgltc2ezvkylo2cjno -d bjzfhkl5qcdhdqatyqjk
bjzfhkl5qc**********=> show pool_pools;
pool_pid | start_time | pool_id | backend_id | database | username | create_time | pool_backendpid |
----------+---------------------+---------+------------+----------------------+----------------------+---------------------+-----------------+
3526 | 2021-06-02 15:22:12 | 0 | 0 | bjzfhkl5qcdhdqatyqjk | u1mgltc2ezvkylo2cjno | 2021-06-02 15:22:35 | 1520665 |
3526 | 2021-06-02 15:22:12 | 0 | 1 | bjzfhkl5qcdhdqatyqjk | u1mgltc2ezvkylo2cjno | 2021-06-02 15:22:35 | 1665071 |
3526 | 2021-06-02 15:22:12 | 0 | 2 | bjzfhkl5qcdhdqatyqjk | u1mgltc2ezvkylo2cjno | 2021-06-02 15:22:35 | 531483 |
3527 | 2021-06-02 15:22:12 | 0 | 0 | | | | 0 |
3527 | 2021-06-02 15:22:12 | 0 | 1 | | | | 0 |
3527 | 2021-06-02 15:22:12 | 0 | 2 | | | | 0 |
3528 | 2021-06-02 15:22:12 | 0 | 0 | | | | 0 |
3528 | 2021-06-02 15:22:12 | 0 | 1 | | | | 0 |
3528 | 2021-06-02 15:22:12 | 0 | 2 | | | | 0 |
... | ... | . | . | | | | . |
We can see the 32 available processes (3526, 3527, 3528…), and active connections for process 3526, with an open connection for each PostgreSQL server (1520665, 1665071 and 531483).
If a process remains inactive for more than 300 seconds, it will be deleted and a new process will take its place. This value can be configured with the CC_PGPOOL_CHILD_LIFE_TIME
variable. This is a measure to prevent memory leaks and other unexpected errors in Pgpool-II children.
Cached connections for each process do not have a default expiration time, but you can set one if needed with the CC_PGPOOL_CONNECTION_LIFE_TIME
variable.
It’s also possible to specify the lifetime of a Pgpool-II child process (CC_PGPOOL_CHILD_MAX_CONNECTIONS
) and the time in seconds to disconnect a client if it remains inactive since the last request (CC_PGPOOL_CLIENT_IDLE_LIMIT
).
Replication and load balancing
Use Pgpool-II to distribute the load and separate WRITE and READ queries between your PostgreSQL servers.
When using the Streaming mode, it’s not Pgpool-II that manages the replication of your databases. Replication must be set up on the PostgreSQL side, otherwise known as binary, Hot Standby or Streaming replication. To schedule this procedure, you can contact our support at support@clever-cloud.com or make a request via our sales form.
Once replication is in place, you can use the CC_PGPOOL_FOLLOWERS
environment variable to add the followers to your Pgpool-II configuration. This variable is in JSON format, and must contain the host, port and weight of each follower.
HOST
and PORT
, you must use the values of the POSTGRESQL_ADDON_DIRECT_HOST
and POSTGRESQL_ADDON_DIRECT_PORT
variables.An example of the CC_PGPOOL_FOLLOWERS
variable with two followers:
|
|
The weight is used to specify the load balance ratio of the backends, you can also configure the weight of the leader with the CC_PGPOOL_LEADER_WEIGHT
variable.
SELECT
queries that could be made on followers.Many other load balancing options are configurable with environment variables. You can refer to our environment variables reference page and the official Pgpool-II documentation for more information.
Health check
Pgpool-II periodically connects to the configured PostgreSQL backends to detect any error on the servers or networks. If an error is detected, Pgpool-II performs failover or degeneration depending on the configurations.
The health check is not activated by default, but you can activate it with the CC_PGPOOL_HEALTH_CHECK_PERIOD
variable.
Many other health check options are configurable with environment variables, such as:
CC_PGPOOL_HEALTH_CHECK_TIMEOUT
: Specifies the timeout in seconds to give up connecting to the backend PostgreSQL if the TCP connect does not succeed within this time.CC_PGPOOL_HEALTH_CHECK_MAX_RETRIES
: Specifies the maximum number of retries before giving up and initiating failover when health check fails.CC_PGPOOL_HEALTH_CHECK_RETRY_DELAY
: Specifies the amount of time (in seconds) to sleep before retrying after a failed health check retries.CC_PGPOOL_CONNECT_TIMEOUT
: Specifies the amount of time in milliseconds before giving up connecting to backend usingconnect()
system call.
You can consult the official Pgpool-II documentation to get more information about the health check.
In memory cache
In memory cache saves the pair of SELECT
statement and its result. If the same SELECT
comes in, Pgpool-II returns the value from cache. This feature can be activated with the CC_PGPOOL_MEMORY_CACHE_ENABLED=on
variable.
Many other in memory cache options are configurable with environment variables, such as:
CC_PGPOOL_MEMQCACHE_TOTAL_SIZE
: Specifies the shared memory cache size in bytes.CC_PGPOOL_MEMQCACHE_MAX_NUM_CACHE
: Specifies the number of cache entries. This is used to define the size of cache management space.CC_PGPOOL_MEMQCACHE_CACHE_BLOCK_SIZE
: Specifies the cache block size.CC_PGPOOL_MEMQCACHE_EXPIRE
: Specifies the life time of query cache in seconds.CC_PGPOOL_MEMQCACHE_AUTO_CACHE_INVALIDATION
: Automatically deletes the cache related to the updated tables.CC_PGPOOL_MEMQCACHE_MAXCACHE
: Specifies the maximum size in bytes of theSELECT
query result to be cached.CC_PGPOOL_CACHE_SAFE_MEMQCACHE_TABLE_LIST
: Specifies a comma-separated list of table names whoseSELECT
results should be cached by Pgpool-II.CC_PGPOOL_CACHE_UNSAFE_MEMQCACHE_TABLE_LIST
: Specifies a comma-separated list of table names whoseSELECT
results should NOT be cached by Pgpool-II.
You can consult the official Pgpool-II documentation to get more information about the in memory cache.
Scalability
When the scalability of your application is enabled, you may need to tweak the CC_PGPOOL_NUM_INIT_CHILDREN
value. This is what we are going to see here.
Auto scalability will add or remove instances following your needs. This means that you may have multiple instances running at the same time for a certain period of time.
If your application restarts, you will have your current instance number * 2
instances running in parallel while the deployment finishes and the old instances are stopped.
A new deployment starts new instances alongside the old ones.
This means that have up to maximum scalability * 2
instances can run at the same time. And if all of your instances open the maximum connections they are allowed to,
it means there will be up to maximum scalability * 2 * CC_PGPOOL_NUM_INIT_CHILDREN
connections at the same time. We will call this result MaxCon
.
MaxCon
doesn’t exceed your plan’s max connections. If it does, you might have issues connecting to the remote PostgreSQL add-on. You have to adjust CC_PGPOOL_NUM_INIT_CHILDREN
to a number that makes sense for your scalability parameters.Example
Let’s say that I’m hosting a simple PHP application on 1 XS
instance, with a scalability that can go up to 4 M
instances. And I’m using the PostgreSQL S Small Space
plan
which has a connection limit of 125
.
Now my application receives a lot of traffic and scales up to 4 M
instances. But at the same time, I also need to deploy a hot fix. This means that 4 new M
instances will be started, alongside the already existing 4 M
instances. I need to make sure that MaxCon
doesn’t go above 125
.
Here is the summary:
- 8 instances: 4 currently running, 4 currently deploying my hot fix
- 125 max connections: the maximum number of connections of my PostgreSQL add-on’s plan
- 5 other connections: I want to be able to use PG Studio or any CLI tool at the same time, in case I need it
CC_PGPOOL_NUM_INIT_CHILDREN = ( (125 - 5) / 8 ) = 15
. If I set CC_PGPOOL_NUM_INIT_CHILDREN
to 15
, each instance should have enough connections to query the database without errors. And it also leaves me some spare connections (5
) available if anything goes wrong and I need to connect to my database using a CLI tool or PG Studio.
Tips / debug
Connecting to Pgpool-II
Connect via ssh to your application and use the psql command.
ssh -t ssh@sshgateway-clevercloud-customers.services.clever-cloud.com <app_id>
Then, launch the psql
command with :
psql
Showing configured nodes
You can use the SHOW POOL_NODES;
command.
bjzfhkl5qc**********=> SHOW POOL_NODES;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node |
---------+-----------------------------------------------------------+------+--------+-----------+---------+------------+-------------------+
0 | bjzfhkl5qc**********-postgresql.services.clever-cloud.com | 57** | up | 0.333333 | primary | 0 | false |
1 | b3diajq7iy**********-postgresql.services.clever-cloud.com | 57** | up | 0.333333 | standby | 2 | false |
2 | bmtwtemn40**********-postgresql.services.clever-cloud.com | 57** | up | 0.333333 | standby | 1 | true |
Watching Pgpool-II processes
You can use the SHOW POOL_PROCESSES;
command to see processes waiting for connections and processing a connection.
bjzfhkl5qc**********=> SHOW POOL_PROCESSES;
pool_pid | start_time | database | username | create_time | pool_counter
----------+---------------------+----------------------+----------------------+---------------------+--------------
21554 | 2021-06-03 11:28:36 | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-03 11:28:48 | 1
11409 | 2021-06-02 19:54:53 | | | |
11410 | 2021-06-02 19:54:53 | | | |
21566 | 2021-06-03 11:29:20 | | | |
Listing managed pools
You can use the SHOW POOL_POOLS;
command to see the list of pools managed by Pgpool-II.
bjzfhkl5qc**********=> SHOW POOL_POOLS;
pool_pid | start_time | pool_id | backend_id | database | username | create_time | pool_backendpid |
----------+---------------------+---------+------------+----------------------+----------------------+---------------------+-----------------+
21554 | 2021-06-03 11:28:36 | 0 | 0 | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-03 11:28:48 | 1549123 |
21554 | 2021-06-03 11:28:36 | 0 | 1 | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-03 11:28:48 | 1695967 |
21554 | 2021-06-03 11:28:36 | 0 | 2 | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-03 11:28:48 | 562378 |
11409 | 2021-06-02 19:54:53 | 0 | 0 | | | | 0 |
11409 | 2021-06-02 19:54:53 | 0 | 1 | | | | 0 |
11409 | 2021-06-02 19:54:53 | 0 | 2 | | | | 0 |
11410 | 2021-06-02 19:54:53 | 0 | 0 | | | | 0 |
11410 | 2021-06-02 19:54:53 | 0 | 1 | | | | 0 |
11410 | 2021-06-02 19:54:53 | 0 | 2 | | | | 0 |
21566 | 2021-06-03 11:29:20 | 0 | 0 | | | | 0 |
21566 | 2021-06-03 11:29:20 | 0 | 1 | | | | 0 |
21566 | 2021-06-03 11:29:20 | 0 | 2 | | | | 0 |
Statistics of SQL commands
You can use the SHOW POOL_BACKEND_STATS;
command to see the statistics of SQL commands.
bjzfhkl5qc**********=> SHOW POOL_BACKEND_STATS;
node_id | hostname | port | status | role | select_cnt | insert_cnt | update_cnt | delete_cnt |
---------+-----------------------------------------------------------+------+--------+---------+------------+------------+------------+------------+
0 | bjzfhkl5qc**********-postgresql.services.clever-cloud.com | 57** | up | primary | 0 | 0 | 0 | 1 |
1 | b3diajq7iy**********-postgresql.services.clever-cloud.com | 57** | up | standby | 2 | 0 | 0 | 0 |
2 | bmtwtemn40**********-postgresql.services.clever-cloud.com | 57** | up | standby | 1 | 0 | 0 | 0 |
Using the PCP command
The PCP tool is pre-configured and available on your application.
pcp_pool_status -h /tmp -U pcp -w
Managing nodes
Attach or detach a given node to Pgpool-II;
pcp_pool_status -h /tmp -U pcp -w -n 0
pcp_pool_status -h /tmp -U pcp -w -n 0
Statistical data
Display the statistical data of the health check on the given node ID
pcp_pool_status -h /tmp -U pcp -w -n 0 -v
Node Id : 0
Host Name : bjzfhkl5qc**********-postgresql.services.clever-cloud.com
Port : 57**
Status : up
Role : primary
Last Status Change : 2021-06-02 19:54:53
Total Count : 0
Success Count : 0
Fail Count : 0
Skip Count : 0
Retry Count : 0
Average Retry Count : 0.000000
Max Retry Count : 0
Max Health Check Duration : 0
Minimum Health Check Duration : 0
Average Health Check Duration : 0.000000
Last Health Check :
Last Successful Health Check :
Last Skip Health Check :
Last Failed Health Check :
You can find all PCP commands in the official Pgpool-II documentation.
Usage
PHP using PDO
Using PDO, you have to use the unix_socket
option in your DSN:
|
|
WordPress
For WordPress, you can change the DB_HOST
variable in your wp-config.php
:
|
|
Node.js
On Node.js, using the pg
npm package:
|
|
Did this documentation help you ?