Здравствуйте, форумчане!
Развернут кластер серверов из мастера 192.168.100.2, слейва 192.168.100.3, pgpool 192.168.100.4. Между мастером и слейвом настроена потоковая репликация. Хочу настроить pgpool так, чтобы он определял состояние узлов и в случае падения мастера высылал слейву failover. Настроил все по документации к pgpool, в логах pgpool появились следующие сообщения:
2013-06-20 04:12:36 LOG: pid 12954: read_status_file: 0 th backend is set to down status
2013-06-20 04:12:36 LOG: pid 12954: pgpool-II successfully started. version 3.0.5 (umiyameboshi)
2013-06-20 04:12:36 LOG: pid 12954: find_primary_node: 1 node is standby
2013-06-20 04:12:36 LOG: pid 12954: find_primary_node: no primary node found
Это происходит при работающем мастере. Вот мой конфиг pgpool.conf:
# Host name or IP address to listen on: '*' for all, '' for no TCP/IP
# connections
listen_addresses = '*'
# Port number for pgpool
port = 9999
# Port number for pgpool communication manager
pcp_port = 9898
# Unix domain socket path. (The Debian package defaults to
# /var/run/postgresql.)
socket_dir = '/tmp-noinst'
# Unix domain socket path for pgpool communication manager.
# (Debian package defaults to /var/run/postgresql)
pcp_socket_dir = '/tmp-noinst'
# Unix domain socket path for the backend. Debian package defaults to /var/run/postgresql!
backend_socket_dir = '/tmp-noinst'
# pgpool communication manager timeout. 0 means no timeout. This parameter is ignored now.
pcp_timeout = 10
# number of pre-forked child process
num_init_children = 32
# Number of connection pools allowed for a child process
max_pool = 4
# If idle for this many seconds, child exits. 0 means no timeout.
child_life_time = 300
# If idle for this many seconds, connection to PostgreSQL closes.
# 0 means no timeout.
connection_life_time = 0
# If child_max_connections connections were received, child exits.
# 0 means no exit.
child_max_connections = 0
# If client_idle_limit is n (n > 0), the client is forced to be
# disconnected whenever after n seconds idle (even inside an explicit
# transactions!)
# 0 means no disconnect.
client_idle_limit = 0
# Maximum time in seconds to complete client authentication.
# 0 means no timeout.
authentication_timeout = 60
# Logging directory
logdir = '/var/log/pgpool'
# pid file name
pid_file_name = '/var/run/pgpool/pgpool.pid'
# Replication mode
replication_mode = false
# Load balancing mode, i.e., all SELECTs are load balanced.
load_balance_mode = true
# If there's a disagreement with the packet kind sent from backend,
# then degenrate the node which is most likely "minority". If false,
# just force to exit this session.
replication_stop_on_mismatch = false
# If there's a disagreement with the number of affected tuples in
# UPDATE/DELETE, then degenrate the node which is most likely
# "minority".
# If false, just abort the transaction to keep the consistency.
failover_if_affected_tuples_mismatch = false
# If true, replicate SELECT statement when replication_mode or parallel_mode is enabled.
# A priority of replicate_select is higher than load_balance_mode.
replicate_select = false
# Semicolon separated list of queries to be issued at the end of a
# session
reset_query_list = 'ABORT; DISCARD ALL'
# for 8.2 or older this should be as follows.
#reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'
# white_function_list is a comma separated list of function names
# those do not write to database. Any functions not listed here
# are regarded to write to database and SELECTs including such
# writer-functions will be executed on master(primary) in master/slave
# mode, or executed on all DB nodes in replication mode.
#
# black_function_list is a comma separated list of function names
# those write to database. Any functions not listed here
# are regarded not to write to database and SELECTs including such
# read-only-functions will be executed on any DB nodes.
#
# You cannot make full both white_function_list and
# black_function_list at the same time. If you specify something in
# one of them, you should make empty other.
#
# Pre 3.0 pgpool-II recognizes nextval and setval in hard coded
# way. Following setting will do the same as the previous version.
# white_function_list = ''
# black_function_list = 'nextval,setval'
white_function_list = ''
black_function_list = 'currval,lastval,nextval,setval'
# If true print timestamp on each log line.
print_timestamp = true
# If true, operate in master/slave mode.
master_slave_mode = true
# Master/slave sub mode. either 'slony' or 'stream'. Default is 'slony'.
master_slave_sub_mode = 'stream'
# If the standby server delays more than delay_threshold,
# any query goes to the primary only. The unit is in bytes.
# 0 disables the check. Default is 0.
# Note that health_check_period required to be greater than 0
# to enable the functionality.
delay_threshold = 10000000
# 'always' logs the standby delay whenever health check runs.
# 'if_over_threshold' logs only if the delay exceeds delay_threshold.
# 'none' disables the delay log.
log_standby_delay = 'if_over_threshold'
# If true, cache connection pool.
connection_cache = true
# Health check timeout. 0 means no timeout.
health_check_timeout = 20
# Health check period. 0 means no health check.
health_check_period = 0
# Health check user
health_check_user = 'postgres'
# Execute command by failover.
# special values: %d = node id
# %h = host name
# %p = port number
# %D = database cluster path
# %m = new master node id
# %H = hostname of the new master node
# %M = old master node id
# %P = old primary node id
# %% = '%' character
#
failover_command = '/etc/pgpool-II/failover.sh %d %H /tmp-noinst/failover'
# Execute command by failback.
# special values: %d = node id
# %h = host name
# %p = port number
# %D = database cluster path
# %m = new master node id
# %H = hostname of the new master node
# %M = old master node id
# %P = old primary node id
# %% = '%' character
#
failback_command = ''
# If true, trigger fail over when writing to the backend communication
# socket fails. This is the same behavior of pgpool-II 2.2.x or
# earlier. If set to false, pgpool will report an error and disconnect
# the session.
fail_over_on_backend_error = true
# If true, automatically locks a dummy row or a table with INSERT
# statements to keep SERIAL data consistency. If the data does not have
# SERIAL data type, no lock will be issued. An /*INSERT LOCK*/ comment
# has the same effect. A /*NO INSERT LOCK*/ comment disables the effect.
insert_lock = false
# If true, ignore leading white spaces of each query while pgpool judges
# whether the query is a SELECT so that it can be load balanced. This
# is useful for certain APIs such as DBI/DBD which is known to adding an
# extra leading white space.
ignore_leading_white_space = true
# If true, print all statements to the log. Like the log_statement option
# to PostgreSQL, this allows for observing queries without engaging in full
# debugging.
log_statement = false
# If true, print all statements to the log. Similar to log_statement except
# that prints DB node id and backend process id info.
log_per_node_statement = false
# If true, incoming connections will be printed to the log.
log_connections = false
# If true, hostname will be shown in ps status. Also shown in
# connection log if log_connections = true.
# Be warned that this feature will add overhead to look up hostname.
log_hostname = false
# if non 0, run in parallel query mode
parallel_mode = false
# if non 0, use query cache
enable_query_cache = false
#set pgpool2 hostname
pgpool2_hostname = ''
# system DB info
system_db_hostname = 'localhost'
system_db_port = 5432
system_db_dbname = 'pgpool'
system_db_schema = 'pgpool_catalog'
system_db_user = 'pgpool'
system_db_password = ''
# backend_hostname, backend_port, backend_weight
# here are examples
backend_hostname0 = '192.168.100.2'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/pgsql/data'
backend_hostname1 = '192.168.100.3'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/data'
# - HBA -
# If true, use pool_hba.conf for client authentication.
enable_pool_hba = true
# - online recovery -
# online recovery user
recovery_user = 'nobody'
# online recovery password
recovery_password = ''
# execute a command in first stage.
recovery_1st_stage_command = ''
# execute a command in second stage.
recovery_2nd_stage_command = ''
# maximum time in seconds to wait for the recovering node's postmaster
# start-up. 0 means no wait.
# this is also used as a timer waiting for clients disconnected before
# starting 2nd stage
recovery_timeout = 90
# If client_idle_limit_in_recovery is n (n > 0), the client is forced
# to be disconnected whenever after n seconds idle (even inside an
# explicit transactions!) in the second stage of online recovery.
# n = -1 forces clients to be disconnected immediately.
# 0 disables this functionality(wait forever).
# This parameter only takes effect in recovery 2nd stage.
client_idle_limit_in_recovery = 0
# Specify table name to lock. This is used when rewriting lo_creat
# command in replication mode. The table must exist and has writable
# permission to public. If the table name is '', no rewriting occurs.
lobj_lock_table = ''
# If true, enable SSL support for both frontend and backend connections.
# note that you must also set ssl_key and ssl_cert for SSL to work in
# the frontend connections.
ssl = false
# path to the SSL private key file
#ssl_key = './server.key'
# path to the SSL public certificate file
#ssl_cert = './server.cert'
# If either ssl_ca_cert or ssl_ca_cert_dir is set, then certificate
# verification will be performed to establish the authenticity of the
# certificate. If neither is set to a nonempty string then no such
# verification takes place. ssl_ca_cert should be a path to a single
# PEM format file containing CA root certificate(s), whereas ssl_ca_cert_dir
# should be a directory containing such files. These are analagous to the
# -CAfile and -CApath options to openssl verify(1), respectively.
#ssl_ca_cert = ''
#ssl_ca_cert_dir = ''
# Debug message verbosity level. 0: no message, 1 <= : more verbose
debug_level = 0
Прошу, помогите! Заранее спасибо!
По этой доке делали?
http://aricgardner.com/databases/postgresql/pgpool-ii-3-0-5-with-streami...
Я с такой штукой дела не имел, поэтому помочь вам не могу. Если руки дойдут дома развернуть кластер (но не факт, очень много других дел), тогда посмотрю. А так, может кто ещё из ребят откликнется.
Если нет, попробуйте у иностранцев спросить, вот здесь обсуждается что-то похожее на ваше:
http://comments.gmane.org/gmane.comp.db.postgresql.pgpool.general/2589
как видите в заголовке страницы, это похоже на список рассылки pgpool-general@pgfoundry.org
Нет, по официальной
Нет, по официальной документации pgpool. http://www.pgpool.net/pgpool-web/pgpool-II/doc/pgpool-en.html#MASTER_SLA...
И ссылка Ваша по версии pgpool-II 3.2.0, моя 3.0.5. Эту статью я также читал.
В общем, я тут нашел интересную вещь: если в моем конфиге попроавить номера backend серверов на следующие:
backend_hostname1 = '192.168.100.3'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/data'
backend_hostname2 = '192.168.100.2'
backend_port2 = 5432
backend_weight2 = 1
backend_data_directory2 = '/var/lib/pgsql/data'
то pgpool в своих логах пишет следующее:
2013-06-24 10:19:37 LOG: pid 4786: find_primary_node: 1 node is standby
2013-06-24 10:19:37 LOG: pid 4786: find_primary_node: primary node id is 2
в принципе, вполне предпологаемое поведение, НО:
2013-06-24 10:19:37 LOG: pid 4786: read_status_file: 0 th backend is set to down status
Если же действовать по официальной документации pgpool, то придем к ошибке, описанной выше.
Предположительно - это
Предположительно - это нормально.
> 2013-06-24 10:19:37 LOG: pid 4786: read_status_file: 0 th backend is set to down status
Т.е. она написала, что нет ни одного бакенда в состоянии down - т.е. погашенного.
standby по идее тоже погашен не должен быть.
Нет, это не нормальное
Нет, это не нормальное поведение
У меня такая же беда вылезла когда я перенастраивал свой кластер.
Я вручную переносил конфиги pgpool.conf и pcp.conf, pgpool по идее должен был быть запущен как Active/Active на базе которая Master/Slave, однако после запуска pgpool на мастере не видел слейва и наоборот, пгпул на слейве не видел мастера. Ну и в логах такие же ошибки.
Проблема оказалась в настройках pcp. При первоначальной настройке всей системы я сделал всё по шагам из руководства и успешно забыл что ноды надо подключать вручную через pcp_attach_node и при этом указывать пароль, который хранится в виде хеша, сгенеренного через pg_md5.
В общем последовательность такая, вместе с генерацие нового пароля:
1. pg_md5 -p
password: вводите новый пароль
2. записываете хеш в pcp.conf на всех нодах postgres:<сгенеренный хеш>
3. стартуете pgpool на всех нодах
4. заходите на каждую ноду и подключаете
pcp_attach_node 10 нода1 9898 postgres ПАРОЛЬ 0
pcp_attach_node 10 нода2 9898 postgres ПАРОЛЬ 1
5. проверить список нод, подключенных к пгпулу можно подсоединившись к пгпулу через psql на конкретной ноде и вызвав
show pool_nodes;