Pgpool-II

Pgpool-II

Pgpool-II is a tool available on application instances to help manage your connection pool and load balancing of your PostgreSQL add-ons.

⚠️
Pgpool-II is not available on Docker instances. If you want to use Pgpool-II, you can add it and configure it in your Dockerfile.

What’s Pgpool-II?

Pgpool-II is a proxy software that sits between PostgreSQL servers and a PostgreSQL database client.

You can read the official project page for more information.

Why use Pgpool-II?

Connection pooling: Pgpool-II maintains established connections to the PostgreSQL servers, and reuses them whenever a new connection with the same properties (user name, database, protocol version, and other connection parameters if any) comes in. It reduces the connection overhead, and improves the system’s overall throughput.

Load balancing: If a database is replicated (because running in either replication mode or native replication mode), performing a SELECT query on any server will return the same result. Pgpool-II takes advantage of the replication feature in order to reduce the load on each PostgreSQL server. It does that by distributing SELECT queries among available servers, improving the system’s overall throughput. In an ideal scenario, read performance could improve proportionally to the number of PostgreSQL servers. Load balancing works best in a scenario where a lot of users execute many read-only queries at the same time.

Automated fail over: If one of the database servers goes down or becomes unreachable, Pgpool-II will detach it and will continue operations by using the rest of the database servers. Some sophisticated features that help for automated failovers including timeouts and retries.

Limiting Exceeding Connections: PostgreSQL puts a limit on the amount of concurrent connections, so that new connections are rejected when this number is reached. Raising this maximum number of connections, however, increases resource consumption and has a negative impact on overall system performance. Pgpool-II also limits the number of concurrent connections, but extra connections will be queued instead of returning an error immediately. Or you can configure it to return an error when the connection limit is exceeded (4.1 or later).

In Memory Query Cache: In memory query cache allows to save a SELECT statement with its result. If an identical SELECT comes in, Pgpool-II returns the value from cache. Since neither SQL parsing nor access to PostgreSQL are involved, using in memory cache is extremely fast. On the other hand, it might be slower than the normal path in some cases, because it adds some overhead of storing cache data.

The various features are available in the official Pgpool documentation.

ℹ️
We only support the Streaming mode, which is the most used and recommended mode for Pgpool-II. If you need other modes or features that are not supported on Clever Cloud, please contact us at support@clever-cloud.com

How to configure Pgpool-II

Generate direct variables for PG

Go to the “Add-on dashboard” tab of your PostgreSQL add-ons and click on the “Generate direct hostname and port” button.

Enable Pgpool-II

In order to configure Pgpool-II, the first thing to do is to link your PostgreSQL add-on. To do that, you can go to the Service Dependencies page of your application and select your PostgreSQL add-on.

Once linked, you can enable the Pgpool-II feature by defining the following environment variable: CC_ENABLE_PGPOOL=true. If you ever need to stop using Pgpool-II, you can remove this variable or set it to false.

Your application will use a Unix Domain Socket to connect to Pgpool-II. Unix domain sockets are faster than TCP sockets because there is no handshake and the connection is made locally on the same machine. A special environment variable will be injected to your environment variables: CC_PGPOOL_SOCKET_PATH. This variable contains the path to the Unix Domain Socket you have to connect to. See Usage below for some examples on how to use it.

The available Pgpool-II variables and their descriptions are available on our environment variables reference page.

Concurrent session and pool size

You can use the CC_PGPOOL_NUM_INIT_CHILDREN and CC_PGPOOL_MAX_POOL environment variables to set the number of concurrent sessions and the number of connection pool caches per connection.

Let’s take an example where you have 3 PostgreSQL servers, and the CC_PGPOOL_NUM_INIT_CHILDREN variable configured with a value of 32.

On startup, Pgpool-II will start 32 processes, one process per connection. The CC_PGPOOL_MAX_POOL variable is used to configure the number of cached connections per connection (process). This option is mostly used when you have different credentials for the same PostgreSQL server (database, user…). But in the case of Streaming mode on Clever Cloud, you have access to only one database, and the credentials are the same on all add-ons with binary replication. You should not need to change the default value of this variable (1 by default).

An example of the SHOW POOL_POOLS; command with CC_PGPOOL_NUM_INIT_CHILDREN=32 and CC_PGPOOL_MAX_POOL=1:

psql -U u1mgltc2ezvkylo2cjno -d bjzfhkl5qcdhdqatyqjk

bjzfhkl5qc**********=> show pool_pools;
 pool_pid |     start_time      | pool_id | backend_id |       database       |       username       |     create_time     | pool_backendpid |
----------+---------------------+---------+------------+----------------------+----------------------+---------------------+-----------------+
 3526     | 2021-06-02 15:22:12 | 0       | 0          | bjzfhkl5qcdhdqatyqjk | u1mgltc2ezvkylo2cjno | 2021-06-02 15:22:35 | 1520665         | 
 3526     | 2021-06-02 15:22:12 | 0       | 1          | bjzfhkl5qcdhdqatyqjk | u1mgltc2ezvkylo2cjno | 2021-06-02 15:22:35 | 1665071         | 
 3526     | 2021-06-02 15:22:12 | 0       | 2          | bjzfhkl5qcdhdqatyqjk | u1mgltc2ezvkylo2cjno | 2021-06-02 15:22:35 | 531483          | 
 3527     | 2021-06-02 15:22:12 | 0       | 0          |                      |                      |                     | 0               | 
 3527     | 2021-06-02 15:22:12 | 0       | 1          |                      |                      |                     | 0               | 
 3527     | 2021-06-02 15:22:12 | 0       | 2          |                      |                      |                     | 0               | 
 3528     | 2021-06-02 15:22:12 | 0       | 0          |                      |                      |                     | 0               | 
 3528     | 2021-06-02 15:22:12 | 0       | 1          |                      |                      |                     | 0               | 
 3528     | 2021-06-02 15:22:12 | 0       | 2          |                      |                      |                     | 0               | 
 ...      | ...                 | .       | .          |                      |                      |                     | .               | 

We can see the 32 available processes (3526, 3527, 3528…), and active connections for process 3526, with an open connection for each PostgreSQL server (1520665, 1665071 and 531483).

If a process remains inactive for more than 300 seconds, it will be deleted and a new process will take its place. This value can be configured with the CC_PGPOOL_CHILD_LIFE_TIME variable. This is a measure to prevent memory leaks and other unexpected errors in Pgpool-II children.

Cached connections for each process do not have a default expiration time, but you can set one if needed with the CC_PGPOOL_CONNECTION_LIFE_TIME variable.

It’s also possible to specify the lifetime of a Pgpool-II child process (CC_PGPOOL_CHILD_MAX_CONNECTIONS) and the time in seconds to disconnect a client if it remains inactive since the last request (CC_PGPOOL_CLIENT_IDLE_LIMIT).

Replication and load balancing

Use Pgpool-II to distribute the load and separate WRITE and READ queries between your PostgreSQL servers.

When using the Streaming mode, it’s not Pgpool-II that manages the replication of your databases. Replication must be set up on the PostgreSQL side, otherwise known as binary, Hot Standby or Streaming replication. To schedule this procedure, you can contact our support at support@clever-cloud.com or make a request via our sales form.

Once replication is in place, you can use the CC_PGPOOL_FOLLOWERS environment variable to add the followers to your Pgpool-II configuration. This variable is in JSON format, and must contain the host, port and weight of each follower.

ℹ️
For the HOST and PORT, you must use the values of the POSTGRESQL_ADDON_DIRECT_HOST and POSTGRESQL_ADDON_DIRECT_PORT variables.

An example of the CC_PGPOOL_FOLLOWERS variable with two followers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[
  {
    "hostname": "<HOST>",
    "port": "<PORT>",
    "weight": "1"
  },
  {
    "hostname": "<HOST>",
    "port": "<PORT>",
    "weight": "1"
  }
]

The weight is used to specify the load balance ratio of the backends, you can also configure the weight of the leader with the CC_PGPOOL_LEADER_WEIGHT variable.

ℹ️
You can set a higher value for followers to avoid overloading the leader with SELECT queries that could be made on followers.

Many other load balancing options are configurable with environment variables. You can refer to our environment variables reference page and the official Pgpool-II documentation for more information.

Health check

Pgpool-II periodically connects to the configured PostgreSQL backends to detect any error on the servers or networks. If an error is detected, Pgpool-II performs failover or degeneration depending on the configurations.

The health check is not activated by default, but you can activate it with the CC_PGPOOL_HEALTH_CHECK_PERIOD variable.

Many other health check options are configurable with environment variables, such as:

  • CC_PGPOOL_HEALTH_CHECK_TIMEOUT: Specifies the timeout in seconds to give up connecting to the backend PostgreSQL if the TCP connect does not succeed within this time.
  • CC_PGPOOL_HEALTH_CHECK_MAX_RETRIES: Specifies the maximum number of retries before giving up and initiating failover when health check fails.
  • CC_PGPOOL_HEALTH_CHECK_RETRY_DELAY: Specifies the amount of time (in seconds) to sleep before retrying after a failed health check retries.
  • CC_PGPOOL_CONNECT_TIMEOUT: Specifies the amount of time in milliseconds before giving up connecting to backend using connect() system call.

You can consult the official Pgpool-II documentation to get more information about the health check.

In memory cache

In memory cache saves the pair of SELECT statement and its result. If the same SELECT comes in, Pgpool-II returns the value from cache. This feature can be activated with the CC_PGPOOL_MEMORY_CACHE_ENABLED=on variable.

Many other in memory cache options are configurable with environment variables, such as:

  • CC_PGPOOL_MEMQCACHE_TOTAL_SIZE: Specifies the shared memory cache size in bytes.
  • CC_PGPOOL_MEMQCACHE_MAX_NUM_CACHE: Specifies the number of cache entries. This is used to define the size of cache management space.
  • CC_PGPOOL_MEMQCACHE_CACHE_BLOCK_SIZE: Specifies the cache block size.
  • CC_PGPOOL_MEMQCACHE_EXPIRE: Specifies the life time of query cache in seconds.
  • CC_PGPOOL_MEMQCACHE_AUTO_CACHE_INVALIDATION: Automatically deletes the cache related to the updated tables.
  • CC_PGPOOL_MEMQCACHE_MAXCACHE: Specifies the maximum size in bytes of the SELECT query result to be cached.
  • CC_PGPOOL_CACHE_SAFE_MEMQCACHE_TABLE_LIST: Specifies a comma-separated list of table names whose SELECT results should be cached by Pgpool-II.
  • CC_PGPOOL_CACHE_UNSAFE_MEMQCACHE_TABLE_LIST: Specifies a comma-separated list of table names whose SELECT results should NOT be cached by Pgpool-II.

You can consult the official Pgpool-II documentation to get more information about the in memory cache.

Scalability

When the scalability of your application is enabled, you may need to tweak the CC_PGPOOL_NUM_INIT_CHILDREN value. This is what we are going to see here.

Auto scalability will add or remove instances following your needs. This means that you may have multiple instances running at the same time for a certain period of time. If your application restarts, you will have your current instance number * 2 instances running in parallel while the deployment finishes and the old instances are stopped. A new deployment starts new instances alongside the old ones.

This means that have up to maximum scalability * 2 instances can run at the same time. And if all of your instances open the maximum connections they are allowed to, it means there will be up to maximum scalability * 2 * CC_PGPOOL_NUM_INIT_CHILDREN connections at the same time. We will call this result MaxCon.

⚠️
Each PostgreSQL add-on has a connection limit which varies following the plan you are using. You must be sure that MaxCon doesn’t exceed your plan’s max connections. If it does, you might have issues connecting to the remote PostgreSQL add-on. You have to adjust CC_PGPOOL_NUM_INIT_CHILDREN to a number that makes sense for your scalability parameters.

Example

Let’s say that I’m hosting a simple PHP application on 1 XS instance, with a scalability that can go up to 4 M instances. And I’m using the PostgreSQL S Small Space plan which has a connection limit of 125.

Now my application receives a lot of traffic and scales up to 4 M instances. But at the same time, I also need to deploy a hot fix. This means that 4 new M instances will be started, alongside the already existing 4 M instances. I need to make sure that MaxCon doesn’t go above 125.

Here is the summary:

  • 8 instances: 4 currently running, 4 currently deploying my hot fix
  • 125 max connections: the maximum number of connections of my PostgreSQL add-on’s plan
  • 5 other connections: I want to be able to use PG Studio or any CLI tool at the same time, in case I need it

CC_PGPOOL_NUM_INIT_CHILDREN = ( (125 - 5) / 8 ) = 15. If I set CC_PGPOOL_NUM_INIT_CHILDREN to 15, each instance should have enough connections to query the database without errors. And it also leaves me some spare connections (5) available if anything goes wrong and I need to connect to my database using a CLI tool or PG Studio.

Tips / debug

Connecting to Pgpool-II

Connect via ssh to your application and use the psql command.

ssh -t ssh@sshgateway-clevercloud-customers.services.clever-cloud.com <app_id>

Then, launch the psql command with :

psql

Showing configured nodes

You can use the SHOW POOL_NODES; command.

bjzfhkl5qc**********=> SHOW POOL_NODES;
 node_id |                         hostname                          | port | status | lb_weight |  role   | select_cnt | load_balance_node |
---------+-----------------------------------------------------------+------+--------+-----------+---------+------------+-------------------+
 0       | bjzfhkl5qc**********-postgresql.services.clever-cloud.com | 57** | up     | 0.333333  | primary | 0          | false             |
 1       | b3diajq7iy**********-postgresql.services.clever-cloud.com | 57** | up     | 0.333333  | standby | 2          | false             |
 2       | bmtwtemn40**********-postgresql.services.clever-cloud.com | 57** | up     | 0.333333  | standby | 1          | true              |

Watching Pgpool-II processes

You can use the SHOW POOL_PROCESSES; command to see processes waiting for connections and processing a connection.

bjzfhkl5qc**********=> SHOW POOL_PROCESSES;
 pool_pid |     start_time      |       database       |       username       |     create_time     | pool_counter
----------+---------------------+----------------------+----------------------+---------------------+--------------
 21554    | 2021-06-03 11:28:36 | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-03 11:28:48 | 1
 11409    | 2021-06-02 19:54:53 |                      |                      |                     |
 11410    | 2021-06-02 19:54:53 |                      |                      |                     |
 21566    | 2021-06-03 11:29:20 |                      |                      |                     |

Listing managed pools

You can use the SHOW POOL_POOLS; command to see the list of pools managed by Pgpool-II.

bjzfhkl5qc**********=> SHOW POOL_POOLS;
 pool_pid |     start_time      | pool_id | backend_id |       database       |       username       |     create_time     | pool_backendpid |
----------+---------------------+---------+------------+----------------------+----------------------+---------------------+-----------------+
 21554    | 2021-06-03 11:28:36 | 0       | 0          | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-03 11:28:48 | 1549123         |
 21554    | 2021-06-03 11:28:36 | 0       | 1          | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-03 11:28:48 | 1695967         |
 21554    | 2021-06-03 11:28:36 | 0       | 2          | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-03 11:28:48 | 562378          |
 11409    | 2021-06-02 19:54:53 | 0       | 0          |                      |                      |                     | 0               |
 11409    | 2021-06-02 19:54:53 | 0       | 1          |                      |                      |                     | 0               |
 11409    | 2021-06-02 19:54:53 | 0       | 2          |                      |                      |                     | 0               |
 11410    | 2021-06-02 19:54:53 | 0       | 0          |                      |                      |                     | 0               |
 11410    | 2021-06-02 19:54:53 | 0       | 1          |                      |                      |                     | 0               |
 11410    | 2021-06-02 19:54:53 | 0       | 2          |                      |                      |                     | 0               |
 21566    | 2021-06-03 11:29:20 | 0       | 0          |                      |                      |                     | 0               |
 21566    | 2021-06-03 11:29:20 | 0       | 1          |                      |                      |                     | 0               |
 21566    | 2021-06-03 11:29:20 | 0       | 2          |                      |                      |                     | 0               |

Statistics of SQL commands

You can use the SHOW POOL_BACKEND_STATS; command to see the statistics of SQL commands.

bjzfhkl5qc**********=> SHOW POOL_BACKEND_STATS;
 node_id |                         hostname                          | port | status |  role   | select_cnt | insert_cnt | update_cnt | delete_cnt |
---------+-----------------------------------------------------------+------+--------+---------+------------+------------+------------+------------+
 0       | bjzfhkl5qc**********-postgresql.services.clever-cloud.com | 57** | up     | primary | 0          | 0          | 0          | 1          |
 1       | b3diajq7iy**********-postgresql.services.clever-cloud.com | 57** | up     | standby | 2          | 0          | 0          | 0          |
 2       | bmtwtemn40**********-postgresql.services.clever-cloud.com | 57** | up     | standby | 1          | 0          | 0          | 0          |

Using the PCP command

The PCP tool is pre-configured and available on your application.

pcp_pool_status -h /tmp -U pcp -w

Managing nodes

Attach or detach a given node to Pgpool-II;

pcp_pool_status -h /tmp -U pcp -w -n 0
pcp_pool_status -h /tmp -U pcp -w -n 0

Statistical data

Display the statistical data of the health check on the given node ID

pcp_pool_status -h /tmp -U pcp -w -n 0 -v
Node Id                       : 0
Host Name                     : bjzfhkl5qc**********-postgresql.services.clever-cloud.com
Port                          : 57**
Status                        : up
Role                          : primary
Last Status Change            : 2021-06-02 19:54:53
Total Count                   : 0
Success Count                 : 0
Fail Count                    : 0
Skip Count                    : 0
Retry Count                   : 0
Average Retry Count           : 0.000000
Max Retry Count               : 0
Max Health Check Duration     : 0
Minimum Health Check Duration : 0
Average Health Check Duration : 0.000000
Last Health Check             :
Last Successful Health Check  :
Last Skip Health Check        :
Last Failed Health Check      :

You can find all PCP commands in the official Pgpool-II documentation.

Usage

ℹ️
Please note that to connect to Pgpool-II, the default port 5432 must be used. So if you already have a different port set in your code, you have to change it or simply remove the port from the connection information.

PHP using PDO

Using PDO, you have to use the unix_socket option in your DSN:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
<?php
// This variable is injected during deployment
$host = getenv("CC_PGPOOL_SOCKET_PATH");
// Get the database name from the environment
$database = getenv("POSTGRESQL_ADDON_DB");
// Get the database user from the environment
$user = getenv("POSTGRESQL_ADDON_USER");
// Get the database password from the environment
$password = getenv("POSTGRESQL_ADDON_PASSWORD");

$dsn = "pgsql:host=$host;dbname=$database;user=$user;password=$password";

try {
    $conn = new PDO($dsn);
    if($conn){
  echo "Successfully connected to $dbname!";
     }
} catch (PDOException $e){
     echo $e->getMessage();
}
?>

WordPress

For WordPress, you can change the DB_HOST variable in your wp-config.php:

1
2
// To connect using a socket, the syntax is: `localhost:/path/to/socket`
define( 'DB_HOST', "localhost:" . getenv("CC_PGPOOL_SOCKET_PATH") );

Node.js

On Node.js, using the pg npm package:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
const { Client } = require('pg');
client = new Client({
    // This variable is injected during the deployment
    host: process.env["CC_PGPOOL_SOCKET_PATH"],
    // Get the database name from the environment
    database: process.env["POSTGRESQL_ADDON_DB"],
    // Get the database user from the environment
    user: process.env["POSTGRESQL_ADDON_USER"],
    // Get the database password from the environment
    password: process.env["POSTGRESQL_ADDON_PASSWORD"],
});

client.connect();
Last updated on

Did this documentation help you ?