提交 · e480c279bc28832ca78a33f7f52c0b57ae5a16f1 · xindoo / redis

05 10月, 2012 1 次提交

Force expire all timer events when system clock skew is detected. · e480c279

由 Jokea 提交于 8月 30, 2012

When system time changes back, the timer will not worker properly
hence some core functionality of redis will stop working(e.g. replication,
bgsave, etc). See issue #633 for details.

The patch saves the previous time and when a system clock skew is detected,
it will force expire all timers.

Modiifed by @antirez: the previous time was moved into the eventLoop
structure to make sure the library is still thread safe as long as you
use different event loops into different threads (otherwise you need
some synchronization). More comments added about the reasoning at the
base of the patch, that's worth reporting here:

/* If the system clock is moved to the future, and then set back to the
 * right value, time events may be delayed in a random way. Often this
 * means that scheduled operations will not be performed soon enough.
 *
 * Here we try to detect system clock skews, and force all the time
 * events to be processed ASAP when this happens: the idea is that
 * processing events earlier is less dangerous than delaying them
 * indefinitely, and practice suggests it is. */

e480c279

04 10月, 2012 2 次提交

"Timeout receiving bulk data" error message modified. · 0c19880c

由 antirez 提交于 10月 04, 2012

The new message now contains an hint about modifying the repl-timeout
configuration directive if the problem persists.

This should normally not be needed, because while the master generates
the RDB file it makes sure to send newlines to the replication channel
to prevent timeouts. However there are times when masters running on
very slow systems can completely stop for seconds during the RDB saving
process. In such a case enlarging the timeout value can fix the problem.

See issue #695 for an example of this problem in an EC2 deployment.

0c19880c

"SORT by nosort" (skip sorting) respect sorted set ordering. · 2ba96271

由 antirez 提交于 10月 03, 2012

When SORT is called with the option BY set to a string constant not
inclduing the wildcard character "*", there is no way to sort the output
so any ordering is valid. This allows the SORT internals to optimize its
work and don't really sort the output at all.

However it was odd that this option was not able to retain the natural
order of a sorted set. This feature was requested by users multiple
times as sometimes to call SORT with GET against sorted sets as a way to
mass-fetch objects can be handy.

This commit introduces two things:

1) The ability of SORT to return sorted sets elements in their natural
ordering when `BY nosort` is specified, accordingly to `DESC / ASC` options.
2) The ability of SORT to optimize this case further if LIMIT is passed
as well, avoiding to really fetch the whole sorted set, but directly
obtaining the specified range.

Because in this case the sorting is always deterministic, no
post-sorting activity is performed when SORT is called from a Lua
script.

This commit fixes issue #98.

2ba96271

01 10月, 2012 1 次提交
- A
  Revert "Scripting: redis.NIL to return nil bulk replies." · 151b606c
  由 antirez 提交于 10月 01, 2012
```
This reverts commit e061d797.

Conflicts:

	src/scripting.c
```
  151b606c
28 9月, 2012 2 次提交

Scripting: add helper functions redis.error_reply() and redis.status_reply(). · f1466e11

由 antirez 提交于 9月 28, 2012

A previous commit introduced Redis.NIL. This commit adds similar helper
functions to return tables with a single field set to the specified
string so that instead of using 'return {err="My Error"}' it is possible
to use a more idiomatic form:

    return redis.error_reply("My Error")
    return redis.status_reply("OK")

f1466e11

Scripting: redis.NIL to return nil bulk replies. · e061d797

由 antirez 提交于 9月 28, 2012

Lua arrays can't contain nil elements (see
http://www.lua.org/pil/19.1.html for more information), so Lua scripts
were not able to return a multi-bulk reply containing nil bulk
elements inside.

This commit introduces a special conversion: a table with just
a "nilbulk" field set to a boolean value is converted by Redis as a nil
bulk reply, but at the same time for Lua this type is not a "nil" so can
be used inside Lua arrays.

This type is also assigned to redis.NIL, so the following two forms
are equivalent and will be able to return a nil bulk reply as second
element of a three elements array:

    EVAL "return {1,redis.NIL,3}" 0
    EVAL "return {1,{nilbulk=true},3}" 0

The result in redis-cli will be:

    1) (integer) 1
    2) (nil)
    3) (integer) 3

e061d797

27 9月, 2012 34 次提交

E

Fixed some spelling errors in the comments · 04779bdf
由 Erik Dubbelboer 提交于 4月 07, 2012

04779bdf
E

Added consts keyword where possible · e04be06e
由 Erik Dubbelboer 提交于 3月 30, 2012

e04be06e

Final merge of Sentinel into 2.6. · c4cbffa3

由 antirez 提交于 9月 27, 2012

After cherry-picking Sentinel commits a few spurious issues remained
about references to Redis Cluster that is not present in the 2.6 branch.

c4cbffa3

A

Sentinel: Support for AUTH. · dfb7194c
由 antirez 提交于 9月 26, 2012

dfb7194c

Sentinel: reply -IDONTKNOW to get-master-addr-by-name on lack of info. · b8ce9a84

由 antirez 提交于 9月 04, 2012

If we don't have any clue about a master since it never replied to INFO
so far, reply with an -IDONTKNOW error to SENTINEL
get-master-addr-by-name requests.

b8ce9a84

Sentinel: more easy master redirection if master is a slave. · 1f8bd823

由 antirez 提交于 9月 04, 2012

Before this commit Sentienl used to redirect master ip/addr if the
current instance reported to be a slave only if this was the first INFO
output received, and the role was found to be slave.

Now instead also if we find that the runid is different, and the
reported role is slave, we also redirect to the reported master ip/addr.

This unifies the behavior of Sentinel in the case of a reboot (where it
will see the first INFO output with the wrong role and will perform the
redirection), with the behavior of Sentinel in the case of a change in
what it sees in the INFO output of the master.

1f8bd823

Sentinel: do not crash against slaves not publishing the runid. · ef792fc9

由 antirez 提交于 8月 30, 2012

Older versions of Redis (before 2.4.17) don't publish the runid field in
INFO. This commit makes Sentinel able to handle that without crashing.

ef792fc9

A

Sentinel: INFO command implementation. · de499f7f
由 antirez 提交于 8月 29, 2012

de499f7f

Sentinel: add Redis execution mode to INFO output. · b65f3c21

由 antirez 提交于 8月 29, 2012

The new "redis_mode" field in the INFO output will show if Redis is
running in standalone mode, cluster, or sentinel mode.

b65f3c21

Sentinel: Sentinel-side support for slave priority. · 161e137c

由 antirez 提交于 8月 28, 2012

The slave priority that is now published by Redis in INFO output is
now used by Sentinel in order to select the slave with minimum priority
for promotion, and in order to consider slaves with priority set to 0 as
not able to play the role of master (they will never be promoted by
Sentinel).

The "slave-priority" field is now one of the fileds that Sentinel
publishes when describing an instance via the SENTINEL commands such as
"SENTINEL slaves mastername".

161e137c

A
Sentinel: suppress harmless warning by initializing 'table' to NULL. · d480b9ce
由 antirez 提交于 8月 28, 2012
```
Note that the assertion guarantees that one of the if branches setting
table is always entered.
```
d480b9ce

Sentinel: send SCRIPT KILL on -BUSY reply and SDOWN instance. · fa23fc33

由 antirez 提交于 8月 24, 2012

From the point of view of Redis an instance replying -BUSY is down,
since it is effectively not able to reply to user requests. However
a looping script is a recoverable condition in Redis if the script still
did not performed any write to the dataset. In that case performing a
fail over is not optimal, so Sentinel now tries to restore the normal server
condition killing the script with a SCRIPT KILL command.

If the script already performed some write before entering an infinite
(or long enough to timeout) loop, SCRIPT KILL will not work and the
fail over will be triggered anyway.

fa23fc33

Sentinel: fixed a crash on script execution. · fc0a0d4a

由 antirez 提交于 8月 24, 2012

The call to sentinelScheduleScriptExecution() lacked the final NULL
argument to signal the end of arguments. This resulted into a crash.

fc0a0d4a

Sentinel: SENTINEL FAILOVER command implemented. · ea9bec50

由 antirez 提交于 8月 03, 2012

This command can be used in order to force a Sentinel instance to start
a failover for the specified master, as leader, forcing the failover
even if the master is up.

The commit also adds some minor refactoring and other improvements to
functions already implemented that make them able to work when the
master is not in SDOWN condition. For instance slave selection
assumed that we ask INFO every second to every slave, this is true
only when the master is in SDOWN condition, so slave selection did not
worked when the master was not in SDOWN condition.

ea9bec50

Sentinel: client reconfiguration script execution. · 26a34009

由 antirez 提交于 8月 02, 2012

This commit adds support to optionally execute a script when one of the
following events happen:

* The failover starts (with a slave already promoted).
* The failover ends.
* The failover is aborted.

The script is called with enough parameters (documented in the example
sentinel.conf file) to provide information about the old and new ip:port
pair of the master, the role of the sentinel (leader or observer) and
the name of the master.

The goal of the script is to inform clients of the configuration change
in a way specific to the environment Sentinel is running, that can't be
implemented in a genereal way inside Sentinel itself.

26a34009

Sentinel: when leader in wait-start, sense another leader as race. · 524b79d2

由 antirez 提交于 7月 31, 2012

When we are in wait start, if another leader (or any other external
entity) turns a slave into a master, abort the failover, and detect it
as an observer.

Note that the wait-start state is mainly there for this reason but the
abort was yet not implemented.

This adds a new sentinel event -failover-abort-race.

524b79d2

A

Sentinel: sentinelRefreshInstanceInfo() comments improved a bit. · 201ed6d4
由 antirez 提交于 7月 31, 2012

201ed6d4
A

Sentinel: sentinel.conf self-documenation improved. · 7c9bfe10
由 antirez 提交于 7月 31, 2012

7c9bfe10

Sentinel: abort failover when in wait-start if master is back. · 3da75e2c

由 antirez 提交于 7月 31, 2012

When we are a Leader Sentinel in wait-start state, starting with this
commit the failover is aborted if the master returns online.

This improves the way we handle a notable case of net split, that is the
split between Sentinels and Redis servers, that will be a very common
case of split becase Sentinels will often be installed in the client's
network and servers can be in a differnt arm of the network.

When Sentinels and Redis servers are isolated the master is in ODOWN
condition since the Sentinels can agree about this state, however the
failover does not start since there are no good slaves to promote (in
this specific case all the slaves are unreachable).

However when the split is resolved, Sentinels may sense the slave back
a moment before they sense the master is back, so the failover may start
without a good reason (since the master is actually working too).

Now this condition is reversible, so the failover will be aborted
immediately after if the master is detected to be working again, that
is, not in SDOWN nor in ODOWN condition.

3da75e2c

Sentinel: scripts execution engine improved. · e328e41a

由 antirez 提交于 7月 27, 2012

We no longer use a vanilla fork+execve but take a queue of jobs of
scripts to execute, with retry on error, timeouts, and so forth.

Currently this is used only for notifications but soon the ability to
also call clients reconfiguration scripts will be added.

e328e41a

J
Include sys/wait.h to avoid compiler warning · 8a8e560b
由 Jan-Erik Rediger 提交于 7月 28, 2012
```
gcc warned about an implicit declaration of function 'wait3'.
Including this header fixes this.
```
8a8e560b
A

Sentinel: don't start a failover as leader if there is no good slave. · 0d0975f2
由 antirez 提交于 7月 26, 2012

0d0975f2
J
comment fix · af41f6cf
由 Jeremy Zawodny 提交于 7月 25, 2012
```
improve English a bit. :-)
```
af41f6cf
A

Sentinel: ability to execute notification scripts. · 999fe0d3
由 antirez 提交于 7月 25, 2012

999fe0d3
M

Fix warning in redis.c for sentinel config load · f1057534
由 mrb 提交于 7月 25, 2012

f1057534
M

Some cleanup in sentinel.conf · fcc8bf99
由 mrb 提交于 7月 25, 2012

fcc8bf99

Sentinel: abort failover if no good slave is available. · 374eed7d

由 antirez 提交于 7月 25, 2012

The previous behavior of the state machine was to wait some time and
retry the slave selection, but this is not robust enough against drastic
changes in the conditions of the monitored instances.

What we do now when the slave selection fails is to abort the failover
and return back monitoring the master. If the ODOWN condition is still
present a new failover will be triggered and so forth.

This commit also refactors the code we use to abort a failover.

374eed7d

A

Sentinel: reset pending_commands in a more generic way. · 2085fdb1
由 antirez 提交于 7月 24, 2012

2085fdb1

Prevent a spurious +sdown event on switch. · f8a19e32

由 antirez 提交于 7月 24, 2012

When we reset the master we should start with clean timestamps for ping
replies otherwise we'll detect a spurious +sdown event, because on
+master-switch event the previous master instance was probably in +sdown
condition. Since we updated the address we should count time from
scratch again.

Also this commit makes sure to explicitly reset the count of pending
commands, now we can do this because of the new way the hiredis link
is closed.

f8a19e32

A

Sentinel: debugging message removed. · 7c39b55d
由 antirez 提交于 7月 24, 2012

7c39b55d

Sentinel: changes to connection handling and redirection. · e47236d8

由 antirez 提交于 7月 24, 2012

We disconnect the Redis instances hiredis link in a more robust way now.
Also we change the way we perform the redirection for the +switch-master
event, that is not just an instance reset with an address change.

Using the same system we now implement the +redirect-to-master event
that is triggered by an instance that is configured to be master but
found to be a slave at the first INFO reply. In that case we monitor the
master instead, logging the incident as an event.

e47236d8

A
Sentinel: check that instance still exists in reply callbacks. · 8ab7e998
由 antirez 提交于 7月 24, 2012
```
We can't be sure the instance object still exists when the reply
callback is called.
```
8ab7e998

Sentinel: more robust failover detection as observer. · e01a415d

由 antirez 提交于 7月 24, 2012

Sentinel observers detect failover checking if a slave attached to the
monitored master turns into its replication state from slave to master.
However while this change may in theory only happen after a SLAVEOF NO
ONE command, in practie it is very easy to reboot a slave instance with
a wrong configuration that turns it into a master, especially if it was
a past master before a successfull failover.

This commit changes the detection policy so that if an instance goes
from slave to master, but at the same time the runid has changed, we
sense a reboot, and in that case we don't detect a failover at all.

This commit also introduces the "reboot" sentinel event, that is logged
at "warning" level (so this will trigger an admin notification).

The commit also fixes a problem in the disconnect handler that assumed
that the instance object always existed, that is not the case. Now we
no longer assume that redisAsyncFree() will call the disconnection
handler before returning.

e01a415d

A

Fixed an error in the example sentinel.conf. · d26a8fb4
由 antirez 提交于 7月 23, 2012

d26a8fb4