提交 · 01407a3a42b753ea0aea1774322a58a4d37a04e4 · xindoo / redis

29 5月, 2018 3 次提交

Don't expire keys while loading RDB from AOF preamble. · 01407a3a

由 antirez 提交于 5月 29, 2018

The AOF tail of a combined RDB+AOF is based on the premise of applying
the AOF commands to the exact state that there was in the server while
the RDB was persisted. By expiring keys while loading the RDB file, we
change the state, so applying the AOF tail later may change the state.

Test case:

* Time1: SET a 10
* Time2: EXPIREAT a $time5
* Time3: INCR a
* Time4: PERSIT A. Start bgrewiteaof with RDB preamble. The value of a is 11 without expire time.
* Time5: Restart redis from the RDB+AOF: consistency violation.

Thanks to @soloestoy for providing the patch.
Thanks to @trevor211 for the original issue report and the initial fix.

Check issue #4950 for more info.

01407a3a

Fix rdb save by allowing dumping of expire keys, so that when · fb5408cf

由 WuYunlong 提交于 5月 26, 2018

we add a new slave, and do a failover, eighter by manual or
not, other local slaves will delete the expired keys properly.

fb5408cf

A
Backport hiredis issue 525 fix to compile on FreeBSD. · 0b8b6df4
由 antirez 提交于 5月 25, 2018
```
Close #4947.
```
0b8b6df4

23 5月, 2018 4 次提交

A

Add INIT INFO to the provided init script. · e98627c5
由 antirez 提交于 3月 26, 2018

e98627c5

Fix ae.c when a timer finalizerProc adds an event. · 17f5de89

由 antirez 提交于 3月 28, 2018

While this feature is not used by Redis, ae.c implements the ability for
a timer to call a finalizer callback when an timer event is deleted.
This feature was bugged since the start, and because it was never used
we never noticed a problem. However Anthony LaTorre was using the same
library in order to implement a different system: he found a bug that he
describes as follows, and which he fixed with the patch in this commit,
sent me by private email:

    --- Anthony email ---

've found one bug in the current implementation of the timed events.
It's possible to lose track of a timed event if an event is added in
the finalizerProc of another event.

For example, suppose you start off with three timed events 1, 2, and
3. Then the linked list looks like:

3 -> 2 -> 1

Then, you run processTimeEvents and events 2 and 3 finish, so now the
list looks like:

-1 -> -1 -> 2

Now, on the next iteration of processTimeEvents it starts by deleting
the first event, and suppose this finalizerProc creates a new event,
so that the list looks like this:

4 -> -1 -> 2

On the next iteration of the while loop, when it gets to the second
event, the variable prev is still set to NULL, so that the head of the
event loop after the next event will be set to 2, i.e. after deleting
the next event the event loop will look like:

2

and the event with id 4 will be lost.

I've attached an example program to illustrate the issue. If you run
it you will see that it prints:

```
foo id = 0
spam!
```

But if you uncomment line 29 and run it again it won't print "spam!".

    --- End of email ---

Test.c source code is as follows:

    #include "ae.h"
    #include <stdio.h>

    aeEventLoop *el;

    int foo(struct aeEventLoop *el, long long id, void *data)
    {
	printf("foo id = %lld\n", id);

	return AE_NOMORE;
    }

    int spam(struct aeEventLoop *el, long long id, void *data)
    {
	printf("spam!\n");

	return AE_NOMORE;
    }

    void bar(struct aeEventLoop *el, void *data)
    {
	aeCreateTimeEvent(el, 0, spam, NULL, NULL);
    }

    int main(int argc, char **argv)
    {
	el = aeCreateEventLoop(100);

	//aeCreateTimeEvent(el, 0, foo, NULL, NULL);
	aeCreateTimeEvent(el, 0, foo, NULL, bar);

	aeMain(el);

	return 0;
    }

Anthony fixed the problem by using a linked list for the list of timers, and
sent me back this patch after he tested the code in production for some time.
The code looks sane to me, so committing it to Redis.

17f5de89

Sentinel: fix delay in detecting ODOWN. · 266e6423

由 antirez 提交于 5月 23, 2018

See issue #2819 for details. The gist is that when we want to send INFO
because we are over the time, we used to send only INFO commands, no
longer sending PING commands. However if a master fails exactly when we
are about to send an INFO command, the PING times will result zero
because the PONG reply was already received, and we'll fail to send more
PINGs, since we try only to send INFO commands: the failure detector
will delay until the connection is closed and re-opened for "long
timeout".

This commit changes the logic so that we can send the three kind of
messages regardless of the fact we sent another one already in the same
code path. It could happen that we go over the message limit for the
link by a few messages, but this is not significant. However now we'll
not introduce delays in sending commands just because there was
something else to send at the same time.

266e6423

Z

AOF & RDB: be compatible with rdbchecksum no · eafaf172
由 zhaozhao.zz 提交于 5月 08, 2018

eafaf172

08 5月, 2018 1 次提交
- H
  
  fix int overflow problem in freeMemoryIfNeeded · 4630da37
  由 huijing.whj 提交于 1月 10, 2018
  
  4630da37
27 3月, 2018 1 次提交
- A
  
  Redis 4.0.9 · 3150c672
  由 antirez 提交于 3月 26, 2018
  
  3150c672
26 3月, 2018 2 次提交
- Z
  
  fix missed call on freeaddrinfo · 5b722bd7
  由 zhaozhao.zz 提交于 3月 21, 2018
  
  5b722bd7
- Z
  
  anet: avoid double close · 2551b0f6
  由 zhaozhao.zz 提交于 3月 21, 2018
  
  2551b0f6
14 3月, 2018 2 次提交

A

Cluster: add test for the nofailover flag. · 8d92885b
由 antirez 提交于 3月 14, 2018

8d92885b

Cluster: ability to prevent slaves from failing over their masters. · 70597a30

由 antirez 提交于 3月 14, 2018

This commit, in some parts derived from PR #3041 which is no longer
possible to merge (because the user deleted the original branch),
implements the ability of slaves to have a special configuration
preventing that they try to start a failover when the master is failing.

There are multiple reasons for wanting this, and the feautre was
requested in issue #3021 time ago.

The differences between this patch and the original PR are the
following:

1. The flag is saved/loaded on the nodes configuration.
2. The 'myself' node is now flag-aware, the flag is updated as needed
   when the configuration is changed via CONFIG SET.
3. The flag name uses NOFAILOVER instead of NO_FAILOVER to be consistent
   with existing NOADDR.
4. The redis.conf documentation was rewritten.

Thanks to @deep011 for the original patch.

70597a30

02 3月, 2018 2 次提交
- A
  
  redis-cli: fix missed unit in array. Change define name. · 16cad10a
  由 antirez 提交于 3月 01, 2018
  
  16cad10a
- C
  
  fix-out-of-index-range-for-redis-cli-findbigkey · 640fa434
  由 charsyam 提交于 2月 27, 2018
  
  640fa434
01 3月, 2018 3 次提交
- A
  
  expireIfNeeded() needed a top comment documenting the behavior. · 83390f55
  由 antirez 提交于 2月 27, 2018
  
  83390f55
- A
  
  expireIfNeeded() comment: claim -> pretend. · 888039ca
  由 antirez 提交于 2月 27, 2018
  
  888039ca
- A
  Actually use ae_flags to add AE_BARRIER if needed. · e09c8c10
  由 antirez 提交于 2月 28, 2018
```
Many thanks to @Plasma that spotted this problem reviewing the code.
```
  e09c8c10
28 2月, 2018 1 次提交
- C
  
  refactoring-make-condition-clear-for-rdb · fb7560bc
  由 charsyam 提交于 2月 27, 2018
  
  fb7560bc
27 2月, 2018 5 次提交

ae.c: insetad of not firing, on AE_BARRIER invert the sequence. · 1e2f0d69

由 antirez 提交于 2月 27, 2018

AE_BARRIER was implemented like:

    - Fire the readable event.
    - Do not fire the writabel event if the readable fired.

However this may lead to the writable event to never be called if the
readable event is always fired. There is an alterantive, we can just
invert the sequence of the calls in case AE_BARRIER is set. This commit
does that.

1e2f0d69

AOF: fix a bug that may prevent proper fsyncing when fsync=always. · b2e4aad9

由 antirez 提交于 2月 27, 2018

In case the write handler is already installed, it could happen that we
serve the reply of a query in the same event loop cycle we received it,
preventing beforeSleep() from guaranteeing that we do the AOF fsync
before sending the reply to the client.

The AE_BARRIER mechanism, introduced in a previous commit, prevents this
problem. This commit makes actual use of this new feature to fix the
bug.

b2e4aad9

Cluster: improve crash-recovery safety after failover auth vote. · 93bad8ae

由 antirez 提交于 2月 27, 2018

Add AE_BARRIER to the writable event loop so that slaves requesting
votes can't be served before we re-enter the event loop in the next
iteration, so clusterBeforeSleep() will fsync to disk in time.
Also add the call to explicitly fsync, given that we modified the last
vote epoch variable.

93bad8ae

ae.c: introduce the concept of read->write barrier. · e32752e8

由 antirez 提交于 2月 23, 2018

AOF fsync=always, and certain Redis Cluster bus operations, require to
fsync data on disk before replying with an acknowledge.
In such case, in order to implement Group Commits, we want to be sure
that queries that are read in a given cycle of the event loop, are never
served to clients in the same event loop iteration. This way, by using
the event loop "before sleep" callback, we can fsync the information
just one time before returning into the event loop for the next cycle.
This is much more efficient compared to calling fsync() multiple times.

Unfortunately because of a bug, this was not always guaranteed: the
actual way the events are installed was the sole thing that could
control. Normally this problem is hard to trigger when AOF is enabled
with fsync=always, because we try to flush the output buffers to the
socekt directly in the beforeSleep() function of Redis. However if the
output buffers are full, we actually install a write event, and in such
a case, this bug could happen.

This change to ae.c modifies the event loop implementation to make this
concept explicit. Write events that are registered with:

    AE_WRITABLE|AE_BARRIER

Are guaranteed to never fire after the readable event was fired for the
same file descriptor. In this way we are sure that data is persisted to
disk before the client performing the operation receives an
acknowledged.

However note that this semantics does not provide all the guarantees
that one may believe are automatically provided. Take the example of the
blocking list operations in Redis.

With AOF and fsync=always we could have:

    Client A doing: BLPOP myqueue 0
    Client B doing: RPUSH myqueue a b c

In this scenario, Client A will get the "a" elements immediately after
the Client B RPUSH will be executed, even before the operation is persisted.
However when Client B will get the acknowledge, it can be sure that
"b,c" are already safe on disk inside the list.

What to note here is that it cannot be assumed that Client A receiving
the element is a guaranteed that the operation succeeded from the point
of view of Client B.

This is due to the fact that the barrier exists within the same socket,
and not between different sockets. However in the case above, the
element "a" was not going to be persisted regardless, so it is a pretty
synthetic argument.

e32752e8

A

Fix ziplist prevlen encoding description. See #4705. · 262f4039
由 antirez 提交于 2月 23, 2018

262f4039

19 2月, 2018 1 次提交

Track number of logically expired keys still in memory. · 83923afa

由 antirez 提交于 2月 19, 2018

This commit adds two new fields in the INFO output, stats section:

expired_stale_perc:0.34
expired_time_cap_reached_count:58

The first field is an estimate of the number of keys that are yet in
memory but are already logically expired. They reason why those keys are
yet not reclaimed is because the active expire cycle can't spend more
time on the process of reclaiming the keys, and at the same time nobody
is accessing such keys. However as the active expire cycle runs, while
it will eventually have to return to the caller, because of time limit
or because there are less than 25% of keys logically expired in each
given database, it collects the stats in order to populate this INFO
field.

Note that expired_stale_perc is a running average, where the current
sample accounts for 5% and the history for 95%, so you'll see it
changing smoothly over time.

The other field, expired_time_cap_reached_count, counts the number
of times the expire cycle had to stop, even if still it was finding a
sizeable number of keys yet to expire, because of the time limit.
This allows people handling operations to understand if the Redis
server, during mass-expiration events, is able to collect keys fast
enough usually. It is normal for this field to increment during mass
expires, but normally it should very rarely increment. When instead it
constantly increments, it means that the current workloads is using
a very important percentage of CPU time to expire keys.

This feature was created thanks to the hints of Rashmi Ramesh and
Bart Robinson from Twitter. In private email exchanges, they noted how
it was important to improve the observability of this parameter in the
Redis server. Actually in big deployments, the amount of keys that are
yet to expire in each server, even if they are logically expired, may
account for a very big amount of wasted memory.

83923afa

16 2月, 2018 13 次提交
- A
  
  Remove non semantical spaces from module.c. · 256ddbf6
  由 antirez 提交于 2月 15, 2018
  
  256ddbf6
- A
  
  Fix typo in notifyKeyspaceEvent() comment. · 280c3e39
  由 antirez 提交于 2月 15, 2018
  
  280c3e39
- D
  
  Add doc comment about notification flags · 7c4623b0
  由 Dvir Volk 提交于 2月 14, 2018
  
  7c4623b0
- D
  
  Fix indentation and comment style in testmodule · f4e7502e
  由 Dvir Volk 提交于 12月 07, 2017
  
  f4e7502e
- D
  
  Use one static client for all keyspace notification callbacks · 3c8456c6
  由 Dvir Volk 提交于 12月 07, 2017
  
  3c8456c6
- D
  
  Remove the NOTIFY_MODULE flag and simplify the module notification flow if there aren't subscribers · aaaff8bd
  由 Dvir Volk 提交于 12月 07, 2017
  
  aaaff8bd
- D
  
  Document flags for notifications · 0be51b8f
  由 Dvir Volk 提交于 11月 27, 2017
  
  0be51b8f
- D
  
  removed some trailing whitespaces · 3b95c89c
  由 Dvir Volk 提交于 11月 27, 2017
  
  3b95c89c
- D
  
  removed hellonotify.c · 84c6f1e3
  由 Dvir Volk 提交于 11月 27, 2017
  
  84c6f1e3
- D
  
  fixed test · 53b85e53
  由 Dvir Volk 提交于 11月 27, 2017
  
  53b85e53
- D
  
  finished implementation of notifications. Tests unfinished · b43f66c9
  由 Dvir Volk 提交于 11月 27, 2017
  
  b43f66c9
- A
  More verbose logging when slave sends errors to master. · eddf5deb
  由 antirez 提交于 2月 13, 2018
```
See #3832.
```
  eddf5deb
- O
  when a slave experiances an error on commands that come from master, print to the log · c09cc0a9
  由 oranagra 提交于 2月 23, 2017
```
since slave isn't replying to it's master, these errors go unnoticed.
since we don't expect the master to send garbadge to the slave, this should be safe.
(as long as we don't log OOM errors there)
```
  c09cc0a9
13 2月, 2018 2 次提交
- C
  
  getting rid of duplicated code · 5c374f94
  由 charsyam 提交于 2月 14, 2018
  
  5c374f94
- G
  
  enlarged buffer given to ld2string · a64f36e5
  由 Guy Benoish 提交于 1月 11, 2017
  
  a64f36e5