Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
xindoo
redis
提交
0ce76798
R
redis
项目概览
xindoo
/
redis
通知
2
Star
2
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
R
redis
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
前往新版Gitcode,体验更适合开发者的 AI 搜索 >>
提交
0ce76798
编写于
4月 29, 2010
作者:
A
antirez
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Added more information about slave election in Redis Cluster alternative doc
上级
5bdb384f
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
62 addition
and
0 deletion
+62
-0
design-documents/REDIS-CLUSTER-2
design-documents/REDIS-CLUSTER-2
+62
-0
未找到文件。
design-documents/REDIS-CLUSTER-2
浏览文件 @
0ce76798
...
...
@@ -278,4 +278,66 @@ to the same hash slot. In order to guarantee this, key tags can be used,
where when a specific pattern is present in the key name, only that part is
hashed in order to obtain the hash index.
Random remarks
==============
- It's still not clear how to perform an atomic election of a slave to master.
- In normal conditions (all the nodes working) this new design is just
K clients talking to N nodes without intermediate layers, no routes:
this means it is horizontally scalable with O(1) lookups.
- The cluster should optionally be able to work with manual fail over
for environments where it's desirable to do so. For instance it's possible
to setup periodic checks on all the nodes, and switch IPs when needed
or other advanced configurations that can not be the default as they
are too environment dependent.
A few ideas about client-side slave election
============================================
Detecting failures in a collaborative way
-----------------------------------------
In order to take the node failure detection and slave election a distributed
effort, without any "control program" that is in some way a single point
of failure (the cluster will not stop when it stops, but errors are not
corrected without it running), it's possible to use a few consensus-alike
algorithms.
For instance all the nodes may take a list of errors detected by clients.
If Client-1 detects some failure accessing Node-3, for instance a connection
refused error or a timeout, it logs what happened with LPUSH commands against
all the other nodes. This "error messages" will have a timestamp and the Node
id. Something like:
LPUSH __cluster__:errors 3:1272545939
So if the error is reported many times in a small amount of time, at some
point a client can have enough hints about the need of performing a
slave election.
Atomic slave election
---------------------
In order to avoid races when electing a slave to master (that is in order to
avoid that some client can still contact the old master for that node in
the 10 seconds timeframe), the client performing the election may write
some hint in the configuration, change the configuration SHA1 accordingly and
wait for more than 10 seconds, in order to be sure all the clients will
refresh the configuration before a new access.
The config hint may be something like:
"we are switching to a new master, that is x.y.z.k:port, in a few seconds"
When a client updates the config and finds such a flag set, it starts to
continuously refresh the config until a change is noticed (this will take
at max 10-15 seconds).
The client performing the election will wait that famous 10 seconds time frame
and finally will update the config in a definitive way setting the new
slave as mater. All the clients at this point are guaranteed to have the new
config either because they refreshed or because in the next query their config
is already expired and they'll update the configuration.
EOF
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录