1. 29 5月, 2018 3 次提交
    • A
      Don't expire keys while loading RDB from AOF preamble. · 01407a3a
      antirez 提交于
      The AOF tail of a combined RDB+AOF is based on the premise of applying
      the AOF commands to the exact state that there was in the server while
      the RDB was persisted. By expiring keys while loading the RDB file, we
      change the state, so applying the AOF tail later may change the state.
      
      Test case:
      
      * Time1: SET a 10
      * Time2: EXPIREAT a $time5
      * Time3: INCR a
      * Time4: PERSIT A. Start bgrewiteaof with RDB preamble. The value of a is 11 without expire time.
      * Time5: Restart redis from the RDB+AOF: consistency violation.
      
      Thanks to @soloestoy for providing the patch.
      Thanks to @trevor211 for the original issue report and the initial fix.
      
      Check issue #4950 for more info.
      01407a3a
    • W
      Fix rdb save by allowing dumping of expire keys, so that when · fb5408cf
      WuYunlong 提交于
      we add a new slave, and do a failover, eighter by manual or
      not, other local slaves will delete the expired keys properly.
      fb5408cf
    • A
      Backport hiredis issue 525 fix to compile on FreeBSD. · 0b8b6df4
      antirez 提交于
      Close #4947.
      0b8b6df4
  2. 23 5月, 2018 4 次提交
    • A
      Add INIT INFO to the provided init script. · e98627c5
      antirez 提交于
      e98627c5
    • A
      Fix ae.c when a timer finalizerProc adds an event. · 17f5de89
      antirez 提交于
      While this feature is not used by Redis, ae.c implements the ability for
      a timer to call a finalizer callback when an timer event is deleted.
      This feature was bugged since the start, and because it was never used
      we never noticed a problem. However Anthony LaTorre was using the same
      library in order to implement a different system: he found a bug that he
      describes as follows, and which he fixed with the patch in this commit,
      sent me by private email:
      
          --- Anthony email ---
      
      've found one bug in the current implementation of the timed events.
      It's possible to lose track of a timed event if an event is added in
      the finalizerProc of another event.
      
      For example, suppose you start off with three timed events 1, 2, and
      3. Then the linked list looks like:
      
      3 -> 2 -> 1
      
      Then, you run processTimeEvents and events 2 and 3 finish, so now the
      list looks like:
      
      -1 -> -1 -> 2
      
      Now, on the next iteration of processTimeEvents it starts by deleting
      the first event, and suppose this finalizerProc creates a new event,
      so that the list looks like this:
      
      4 -> -1 -> 2
      
      On the next iteration of the while loop, when it gets to the second
      event, the variable prev is still set to NULL, so that the head of the
      event loop after the next event will be set to 2, i.e. after deleting
      the next event the event loop will look like:
      
      2
      
      and the event with id 4 will be lost.
      
      I've attached an example program to illustrate the issue. If you run
      it you will see that it prints:
      
      ```
      foo id = 0
      spam!
      ```
      
      But if you uncomment line 29 and run it again it won't print "spam!".
      
          --- End of email ---
      
      Test.c source code is as follows:
      
          #include "ae.h"
          #include <stdio.h>
      
          aeEventLoop *el;
      
          int foo(struct aeEventLoop *el, long long id, void *data)
          {
      	printf("foo id = %lld\n", id);
      
      	return AE_NOMORE;
          }
      
          int spam(struct aeEventLoop *el, long long id, void *data)
          {
      	printf("spam!\n");
      
      	return AE_NOMORE;
          }
      
          void bar(struct aeEventLoop *el, void *data)
          {
      	aeCreateTimeEvent(el, 0, spam, NULL, NULL);
          }
      
          int main(int argc, char **argv)
          {
      	el = aeCreateEventLoop(100);
      
      	//aeCreateTimeEvent(el, 0, foo, NULL, NULL);
      	aeCreateTimeEvent(el, 0, foo, NULL, bar);
      
      	aeMain(el);
      
      	return 0;
          }
      
      Anthony fixed the problem by using a linked list for the list of timers, and
      sent me back this patch after he tested the code in production for some time.
      The code looks sane to me, so committing it to Redis.
      17f5de89
    • A
      Sentinel: fix delay in detecting ODOWN. · 266e6423
      antirez 提交于
      See issue #2819 for details. The gist is that when we want to send INFO
      because we are over the time, we used to send only INFO commands, no
      longer sending PING commands. However if a master fails exactly when we
      are about to send an INFO command, the PING times will result zero
      because the PONG reply was already received, and we'll fail to send more
      PINGs, since we try only to send INFO commands: the failure detector
      will delay until the connection is closed and re-opened for "long
      timeout".
      
      This commit changes the logic so that we can send the three kind of
      messages regardless of the fact we sent another one already in the same
      code path. It could happen that we go over the message limit for the
      link by a few messages, but this is not significant. However now we'll
      not introduce delays in sending commands just because there was
      something else to send at the same time.
      266e6423
    • Z
      AOF & RDB: be compatible with rdbchecksum no · eafaf172
      zhaozhao.zz 提交于
      eafaf172
  3. 08 5月, 2018 1 次提交
  4. 27 3月, 2018 1 次提交
  5. 26 3月, 2018 2 次提交
  6. 14 3月, 2018 2 次提交
    • A
      Cluster: add test for the nofailover flag. · 8d92885b
      antirez 提交于
      8d92885b
    • A
      Cluster: ability to prevent slaves from failing over their masters. · 70597a30
      antirez 提交于
      This commit, in some parts derived from PR #3041 which is no longer
      possible to merge (because the user deleted the original branch),
      implements the ability of slaves to have a special configuration
      preventing that they try to start a failover when the master is failing.
      
      There are multiple reasons for wanting this, and the feautre was
      requested in issue #3021 time ago.
      
      The differences between this patch and the original PR are the
      following:
      
      1. The flag is saved/loaded on the nodes configuration.
      2. The 'myself' node is now flag-aware, the flag is updated as needed
         when the configuration is changed via CONFIG SET.
      3. The flag name uses NOFAILOVER instead of NO_FAILOVER to be consistent
         with existing NOADDR.
      4. The redis.conf documentation was rewritten.
      
      Thanks to @deep011 for the original patch.
      70597a30
  7. 02 3月, 2018 2 次提交
  8. 01 3月, 2018 3 次提交
  9. 28 2月, 2018 1 次提交
  10. 27 2月, 2018 5 次提交
    • A
      ae.c: insetad of not firing, on AE_BARRIER invert the sequence. · 1e2f0d69
      antirez 提交于
      AE_BARRIER was implemented like:
      
          - Fire the readable event.
          - Do not fire the writabel event if the readable fired.
      
      However this may lead to the writable event to never be called if the
      readable event is always fired. There is an alterantive, we can just
      invert the sequence of the calls in case AE_BARRIER is set. This commit
      does that.
      1e2f0d69
    • A
      AOF: fix a bug that may prevent proper fsyncing when fsync=always. · b2e4aad9
      antirez 提交于
      In case the write handler is already installed, it could happen that we
      serve the reply of a query in the same event loop cycle we received it,
      preventing beforeSleep() from guaranteeing that we do the AOF fsync
      before sending the reply to the client.
      
      The AE_BARRIER mechanism, introduced in a previous commit, prevents this
      problem. This commit makes actual use of this new feature to fix the
      bug.
      b2e4aad9
    • A
      Cluster: improve crash-recovery safety after failover auth vote. · 93bad8ae
      antirez 提交于
      Add AE_BARRIER to the writable event loop so that slaves requesting
      votes can't be served before we re-enter the event loop in the next
      iteration, so clusterBeforeSleep() will fsync to disk in time.
      Also add the call to explicitly fsync, given that we modified the last
      vote epoch variable.
      93bad8ae
    • A
      ae.c: introduce the concept of read->write barrier. · e32752e8
      antirez 提交于
      AOF fsync=always, and certain Redis Cluster bus operations, require to
      fsync data on disk before replying with an acknowledge.
      In such case, in order to implement Group Commits, we want to be sure
      that queries that are read in a given cycle of the event loop, are never
      served to clients in the same event loop iteration. This way, by using
      the event loop "before sleep" callback, we can fsync the information
      just one time before returning into the event loop for the next cycle.
      This is much more efficient compared to calling fsync() multiple times.
      
      Unfortunately because of a bug, this was not always guaranteed: the
      actual way the events are installed was the sole thing that could
      control. Normally this problem is hard to trigger when AOF is enabled
      with fsync=always, because we try to flush the output buffers to the
      socekt directly in the beforeSleep() function of Redis. However if the
      output buffers are full, we actually install a write event, and in such
      a case, this bug could happen.
      
      This change to ae.c modifies the event loop implementation to make this
      concept explicit. Write events that are registered with:
      
          AE_WRITABLE|AE_BARRIER
      
      Are guaranteed to never fire after the readable event was fired for the
      same file descriptor. In this way we are sure that data is persisted to
      disk before the client performing the operation receives an
      acknowledged.
      
      However note that this semantics does not provide all the guarantees
      that one may believe are automatically provided. Take the example of the
      blocking list operations in Redis.
      
      With AOF and fsync=always we could have:
      
          Client A doing: BLPOP myqueue 0
          Client B doing: RPUSH myqueue a b c
      
      In this scenario, Client A will get the "a" elements immediately after
      the Client B RPUSH will be executed, even before the operation is persisted.
      However when Client B will get the acknowledge, it can be sure that
      "b,c" are already safe on disk inside the list.
      
      What to note here is that it cannot be assumed that Client A receiving
      the element is a guaranteed that the operation succeeded from the point
      of view of Client B.
      
      This is due to the fact that the barrier exists within the same socket,
      and not between different sockets. However in the case above, the
      element "a" was not going to be persisted regardless, so it is a pretty
      synthetic argument.
      e32752e8
    • A
      Fix ziplist prevlen encoding description. See #4705. · 262f4039
      antirez 提交于
      262f4039
  11. 19 2月, 2018 1 次提交
    • A
      Track number of logically expired keys still in memory. · 83923afa
      antirez 提交于
      This commit adds two new fields in the INFO output, stats section:
      
      expired_stale_perc:0.34
      expired_time_cap_reached_count:58
      
      The first field is an estimate of the number of keys that are yet in
      memory but are already logically expired. They reason why those keys are
      yet not reclaimed is because the active expire cycle can't spend more
      time on the process of reclaiming the keys, and at the same time nobody
      is accessing such keys. However as the active expire cycle runs, while
      it will eventually have to return to the caller, because of time limit
      or because there are less than 25% of keys logically expired in each
      given database, it collects the stats in order to populate this INFO
      field.
      
      Note that expired_stale_perc is a running average, where the current
      sample accounts for 5% and the history for 95%, so you'll see it
      changing smoothly over time.
      
      The other field, expired_time_cap_reached_count, counts the number
      of times the expire cycle had to stop, even if still it was finding a
      sizeable number of keys yet to expire, because of the time limit.
      This allows people handling operations to understand if the Redis
      server, during mass-expiration events, is able to collect keys fast
      enough usually. It is normal for this field to increment during mass
      expires, but normally it should very rarely increment. When instead it
      constantly increments, it means that the current workloads is using
      a very important percentage of CPU time to expire keys.
      
      This feature was created thanks to the hints of Rashmi Ramesh and
      Bart Robinson from Twitter. In private email exchanges, they noted how
      it was important to improve the observability of this parameter in the
      Redis server. Actually in big deployments, the amount of keys that are
      yet to expire in each server, even if they are logically expired, may
      account for a very big amount of wasted memory.
      83923afa
  12. 16 2月, 2018 13 次提交
  13. 13 2月, 2018 2 次提交