提交 3da75e2c 编写于 作者: A antirez

Sentinel: abort failover when in wait-start if master is back.

When we are a Leader Sentinel in wait-start state, starting with this
commit the failover is aborted if the master returns online.

This improves the way we handle a notable case of net split, that is the
split between Sentinels and Redis servers, that will be a very common
case of split becase Sentinels will often be installed in the client's
network and servers can be in a differnt arm of the network.

When Sentinels and Redis servers are isolated the master is in ODOWN
condition since the Sentinels can agree about this state, however the
failover does not start since there are no good slaves to promote (in
this specific case all the slaves are unreachable).

However when the split is resolved, Sentinels may sense the slave back
a moment before they sense the master is back, so the failover may start
without a good reason (since the master is actually working too).

Now this condition is reversible, so the failover will be aborted
immediately after if the master is detected to be working again, that
is, not in SDOWN nor in ODOWN condition.
上级 e328e41a
......@@ -2400,6 +2400,24 @@ sentinelRedisInstance *sentinelSelectSlave(sentinelRedisInstance *master) {
/* ---------------- Failover state machine implementation ------------------- */
void sentinelFailoverWaitStart(sentinelRedisInstance *ri) {
/* If we in "wait start" but the master is no longer in ODOWN nor in
* SDOWN condition we abort the failover. This is important as it
* prevents a useless failover in a a notable case of netsplit, where
* the senitnels are split from the redis instances. In this case
* the failover will not start while there is the split because no
* good slave can be reached. However when the split is resolved, we
* can go to waitstart if the slave is back rechable a few milliseconds
* before the master is. In that case when the master is back online
* we cancel the failover. */
if ((ri->flags & (SRI_S_DOWN|SRI_O_DOWN)) == 0) {
sentinelEvent(REDIS_WARNING,"-failover-abort-master-is-back",
ri,"%@");
sentinelAbortFailover(ri);
return;
}
/* Start the failover going to the next state if enough time has
* elapsed. */
if (mstime() >= ri->failover_start_time) {
ri->failover_state = SENTINEL_FAILOVER_STATE_SELECT_SLAVE;
ri->failover_state_change_time = mstime();
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册