1. 27 9月, 2012 33 次提交
    • A
      Sentinel: Support for AUTH. · dfb7194c
      antirez 提交于
      dfb7194c
    • A
      Sentinel: reply -IDONTKNOW to get-master-addr-by-name on lack of info. · b8ce9a84
      antirez 提交于
      If we don't have any clue about a master since it never replied to INFO
      so far, reply with an -IDONTKNOW error to SENTINEL
      get-master-addr-by-name requests.
      b8ce9a84
    • A
      Sentinel: more easy master redirection if master is a slave. · 1f8bd823
      antirez 提交于
      Before this commit Sentienl used to redirect master ip/addr if the
      current instance reported to be a slave only if this was the first INFO
      output received, and the role was found to be slave.
      
      Now instead also if we find that the runid is different, and the
      reported role is slave, we also redirect to the reported master ip/addr.
      
      This unifies the behavior of Sentinel in the case of a reboot (where it
      will see the first INFO output with the wrong role and will perform the
      redirection), with the behavior of Sentinel in the case of a change in
      what it sees in the INFO output of the master.
      1f8bd823
    • A
      Sentinel: do not crash against slaves not publishing the runid. · ef792fc9
      antirez 提交于
      Older versions of Redis (before 2.4.17) don't publish the runid field in
      INFO. This commit makes Sentinel able to handle that without crashing.
      ef792fc9
    • A
      Sentinel: INFO command implementation. · de499f7f
      antirez 提交于
      de499f7f
    • A
      Sentinel: add Redis execution mode to INFO output. · b65f3c21
      antirez 提交于
      The new "redis_mode" field in the INFO output will show if Redis is
      running in standalone mode, cluster, or sentinel mode.
      b65f3c21
    • A
      Sentinel: Sentinel-side support for slave priority. · 161e137c
      antirez 提交于
      The slave priority that is now published by Redis in INFO output is
      now used by Sentinel in order to select the slave with minimum priority
      for promotion, and in order to consider slaves with priority set to 0 as
      not able to play the role of master (they will never be promoted by
      Sentinel).
      
      The "slave-priority" field is now one of the fileds that Sentinel
      publishes when describing an instance via the SENTINEL commands such as
      "SENTINEL slaves mastername".
      161e137c
    • A
      Sentinel: suppress harmless warning by initializing 'table' to NULL. · d480b9ce
      antirez 提交于
      Note that the assertion guarantees that one of the if branches setting
      table is always entered.
      d480b9ce
    • A
      Sentinel: send SCRIPT KILL on -BUSY reply and SDOWN instance. · fa23fc33
      antirez 提交于
      From the point of view of Redis an instance replying -BUSY is down,
      since it is effectively not able to reply to user requests. However
      a looping script is a recoverable condition in Redis if the script still
      did not performed any write to the dataset. In that case performing a
      fail over is not optimal, so Sentinel now tries to restore the normal server
      condition killing the script with a SCRIPT KILL command.
      
      If the script already performed some write before entering an infinite
      (or long enough to timeout) loop, SCRIPT KILL will not work and the
      fail over will be triggered anyway.
      fa23fc33
    • A
      Sentinel: fixed a crash on script execution. · fc0a0d4a
      antirez 提交于
      The call to sentinelScheduleScriptExecution() lacked the final NULL
      argument to signal the end of arguments. This resulted into a crash.
      fc0a0d4a
    • A
      Sentinel: SENTINEL FAILOVER command implemented. · ea9bec50
      antirez 提交于
      This command can be used in order to force a Sentinel instance to start
      a failover for the specified master, as leader, forcing the failover
      even if the master is up.
      
      The commit also adds some minor refactoring and other improvements to
      functions already implemented that make them able to work when the
      master is not in SDOWN condition. For instance slave selection
      assumed that we ask INFO every second to every slave, this is true
      only when the master is in SDOWN condition, so slave selection did not
      worked when the master was not in SDOWN condition.
      ea9bec50
    • A
      Sentinel: client reconfiguration script execution. · 26a34009
      antirez 提交于
      This commit adds support to optionally execute a script when one of the
      following events happen:
      
      * The failover starts (with a slave already promoted).
      * The failover ends.
      * The failover is aborted.
      
      The script is called with enough parameters (documented in the example
      sentinel.conf file) to provide information about the old and new ip:port
      pair of the master, the role of the sentinel (leader or observer) and
      the name of the master.
      
      The goal of the script is to inform clients of the configuration change
      in a way specific to the environment Sentinel is running, that can't be
      implemented in a genereal way inside Sentinel itself.
      26a34009
    • A
      Sentinel: when leader in wait-start, sense another leader as race. · 524b79d2
      antirez 提交于
      When we are in wait start, if another leader (or any other external
      entity) turns a slave into a master, abort the failover, and detect it
      as an observer.
      
      Note that the wait-start state is mainly there for this reason but the
      abort was yet not implemented.
      
      This adds a new sentinel event -failover-abort-race.
      524b79d2
    • A
      201ed6d4
    • A
      Sentinel: sentinel.conf self-documenation improved. · 7c9bfe10
      antirez 提交于
      7c9bfe10
    • A
      Sentinel: abort failover when in wait-start if master is back. · 3da75e2c
      antirez 提交于
      When we are a Leader Sentinel in wait-start state, starting with this
      commit the failover is aborted if the master returns online.
      
      This improves the way we handle a notable case of net split, that is the
      split between Sentinels and Redis servers, that will be a very common
      case of split becase Sentinels will often be installed in the client's
      network and servers can be in a differnt arm of the network.
      
      When Sentinels and Redis servers are isolated the master is in ODOWN
      condition since the Sentinels can agree about this state, however the
      failover does not start since there are no good slaves to promote (in
      this specific case all the slaves are unreachable).
      
      However when the split is resolved, Sentinels may sense the slave back
      a moment before they sense the master is back, so the failover may start
      without a good reason (since the master is actually working too).
      
      Now this condition is reversible, so the failover will be aborted
      immediately after if the master is detected to be working again, that
      is, not in SDOWN nor in ODOWN condition.
      3da75e2c
    • A
      Sentinel: scripts execution engine improved. · e328e41a
      antirez 提交于
      We no longer use a vanilla fork+execve but take a queue of jobs of
      scripts to execute, with retry on error, timeouts, and so forth.
      
      Currently this is used only for notifications but soon the ability to
      also call clients reconfiguration scripts will be added.
      e328e41a
    • J
      Include sys/wait.h to avoid compiler warning · 8a8e560b
      Jan-Erik Rediger 提交于
      gcc warned about an implicit declaration of function 'wait3'.
      Including this header fixes this.
      8a8e560b
    • A
      0d0975f2
    • J
      comment fix · af41f6cf
      Jeremy Zawodny 提交于
      improve English a bit. :-)
      af41f6cf
    • A
      Sentinel: ability to execute notification scripts. · 999fe0d3
      antirez 提交于
      999fe0d3
    • M
      Fix warning in redis.c for sentinel config load · f1057534
      mrb 提交于
      f1057534
    • M
      Some cleanup in sentinel.conf · fcc8bf99
      mrb 提交于
      fcc8bf99
    • A
      Sentinel: abort failover if no good slave is available. · 374eed7d
      antirez 提交于
      The previous behavior of the state machine was to wait some time and
      retry the slave selection, but this is not robust enough against drastic
      changes in the conditions of the monitored instances.
      
      What we do now when the slave selection fails is to abort the failover
      and return back monitoring the master. If the ODOWN condition is still
      present a new failover will be triggered and so forth.
      
      This commit also refactors the code we use to abort a failover.
      374eed7d
    • A
      Sentinel: reset pending_commands in a more generic way. · 2085fdb1
      antirez 提交于
      2085fdb1
    • A
      Prevent a spurious +sdown event on switch. · f8a19e32
      antirez 提交于
      When we reset the master we should start with clean timestamps for ping
      replies otherwise we'll detect a spurious +sdown event, because on
      +master-switch event the previous master instance was probably in +sdown
      condition. Since we updated the address we should count time from
      scratch again.
      
      Also this commit makes sure to explicitly reset the count of pending
      commands, now we can do this because of the new way the hiredis link
      is closed.
      f8a19e32
    • A
      Sentinel: debugging message removed. · 7c39b55d
      antirez 提交于
      7c39b55d
    • A
      Sentinel: changes to connection handling and redirection. · e47236d8
      antirez 提交于
      We disconnect the Redis instances hiredis link in a more robust way now.
      Also we change the way we perform the redirection for the +switch-master
      event, that is not just an instance reset with an address change.
      
      Using the same system we now implement the +redirect-to-master event
      that is triggered by an instance that is configured to be master but
      found to be a slave at the first INFO reply. In that case we monitor the
      master instead, logging the incident as an event.
      e47236d8
    • A
      Sentinel: check that instance still exists in reply callbacks. · 8ab7e998
      antirez 提交于
      We can't be sure the instance object still exists when the reply
      callback is called.
      8ab7e998
    • A
      Sentinel: more robust failover detection as observer. · e01a415d
      antirez 提交于
      Sentinel observers detect failover checking if a slave attached to the
      monitored master turns into its replication state from slave to master.
      However while this change may in theory only happen after a SLAVEOF NO
      ONE command, in practie it is very easy to reboot a slave instance with
      a wrong configuration that turns it into a master, especially if it was
      a past master before a successfull failover.
      
      This commit changes the detection policy so that if an instance goes
      from slave to master, but at the same time the runid has changed, we
      sense a reboot, and in that case we don't detect a failover at all.
      
      This commit also introduces the "reboot" sentinel event, that is logged
      at "warning" level (so this will trigger an admin notification).
      
      The commit also fixes a problem in the disconnect handler that assumed
      that the instance object always existed, that is not the case. Now we
      no longer assume that redisAsyncFree() will call the disconnection
      handler before returning.
      e01a415d
    • A
      Fixed an error in the example sentinel.conf. · d26a8fb4
      antirez 提交于
      d26a8fb4
    • A
      Typo. · 5b5eb192
      antirez 提交于
      5b5eb192
    • A
      First implementation of Redis Sentinel. · 120ba392
      antirez 提交于
      This commit implements the first, beta quality implementation of Redis
      Sentinel, a distributed monitoring system for Redis with notification
      and automatic failover capabilities.
      
      More info at http://redis.io/topics/sentinel
      120ba392
  2. 21 9月, 2012 3 次提交
    • A
      Test for SRANDMEMBER with <count>. · 2812b945
      antirez 提交于
      2812b945
    • A
      SRANDMEMBER <count> leak fixed. · 31fe053a
      antirez 提交于
      For "CASE 4" (see code) we need to free the element if it's already in
      the result dictionary and adding it failed.
      31fe053a
    • A
      Added the SRANDMEMBER key <count> variant. · dd947715
      antirez 提交于
      SRANDMEMBER called with just the key argument can just return a single
      random element from a Redis Set. However many users need to return
      multiple unique elements from a Set, this is not a trivial problem to
      handle in the client side, and for truly good performance a C
      implementation was required.
      
      After many requests for this feature it was finally implemented.
      
      The problem implementing this command is the strategy to follow when
      the number of elements the user asks for is near to the number of
      elements that are already inside the set. In this case asking random
      elements to the dictionary API, and trying to add it to a temporary set,
      may result into an extremely poor performance, as most add operations
      will be wasted on duplicated elements.
      
      For this reason this implementation uses a different strategy in this
      case: the Set is copied, and random elements are returned to reach the
      specified count.
      
      The code actually uses 4 different algorithms optimized for the
      different cases.
      
      If the count is negative, the command changes behavior and allows for
      duplicated elements in the returned subset.
      dd947715
  3. 17 9月, 2012 4 次提交
    • A
      Fix compilation on FreeBSD. Thanks to @koobs on twitter. · 8b6b1b27
      antirez 提交于
      8b6b1b27
    • A
      Redis 2.5.13 (2.6.0 RC7). · 44038626
      antirez 提交于
      44038626
    • A
      .gitignore modified to be more general with less entries. · 174518ff
      antirez 提交于
      174518ff
    • A
      A reimplementation of blocking operation internals. · f444e2af
      antirez 提交于
      Redis provides support for blocking operations such as BLPOP or BRPOP.
      This operations are identical to normal LPOP and RPOP operations as long
      as there are elements in the target list, but if the list is empty they
      block waiting for new data to arrive to the list.
      
      All the clients blocked waiting for th same list are served in a FIFO
      way, so the first that blocked is the first to be served when there is
      more data pushed by another client into the list.
      
      The previous implementation of blocking operations was conceived to
      serve clients in the context of push operations. For for instance:
      
      1) There is a client "A" blocked on list "foo".
      2) The client "B" performs `LPUSH foo somevalue`.
      3) The client "A" is served in the context of the "B" LPUSH,
      synchronously.
      
      Processing things in a synchronous way was useful as if "A" pushes a
      value that is served by "B", from the point of view of the database is a
      NOP (no operation) thing, that is, nothing is replicated, nothing is
      written in the AOF file, and so forth.
      
      However later we implemented two things:
      
      1) Variadic LPUSH that could add multiple values to a list in the
      context of a single call.
      2) BRPOPLPUSH that was a version of BRPOP that also provided a "PUSH"
      side effect when receiving data.
      
      This forced us to make the synchronous implementation more complex. If
      client "B" is waiting for data, and "A" pushes three elemnents in a
      single call, we needed to propagate an LPUSH with a missing argument
      in the AOF and replication link. We also needed to make sure to
      replicate the LPUSH side of BRPOPLPUSH, but only if in turn did not
      happened to serve another blocking client into another list ;)
      
      This were complex but with a few of mutually recursive functions
      everything worked as expected... until one day we introduced scripting
      in Redis.
      
      Scripting + synchronous blocking operations = Issue #614.
      
      Basically you can't "rewrite" a script to have just a partial effect on
      the replicas and AOF file if the script happened to serve a few blocked
      clients.
      
      The solution to all this problems, implemented by this commit, is to
      change the way we serve blocked clients. Instead of serving the blocked
      clients synchronously, in the context of the command performing the PUSH
      operation, it is now an asynchronous and iterative process:
      
      1) If a key that has clients blocked waiting for data is the subject of
      a list push operation, We simply mark keys as "ready" and put it into a
      queue.
      2) Every command pushing stuff on lists, as a variadic LPUSH, a script,
      or whatever it is, is replicated verbatim without any rewriting.
      3) Every time a Redis command, a MULTI/EXEC block, or a script,
      completed its execution, we run the list of keys ready to serve blocked
      clients (as more data arrived), and process this list serving the
      blocked clients.
      4) As a result of "3" maybe more keys are ready again for other clients
      (as a result of BRPOPLPUSH we may have push operations), so we iterate
      back to step "3" if it's needed.
      
      The new code has a much simpler semantics, and a simpler to understand
      implementation, with the disadvantage of not being able to "optmize out"
      a PUSH+BPOP as a No OP.
      
      This commit will be tested with care before the final merge, more tests
      will be added likely.
      f444e2af