CN110431533A - 故障恢复的方法、设备和系统 - Google Patents
故障恢复的方法、设备和系统 Download PDFInfo
- Publication number
- CN110431533A CN110431533A CN201680091858.0A CN201680091858A CN110431533A CN 110431533 A CN110431533 A CN 110431533A CN 201680091858 A CN201680091858 A CN 201680091858A CN 110431533 A CN110431533 A CN 110431533A
- Authority
- CN
- China
- Prior art keywords
- node
- log
- leader
- log entry
- voting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/40—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/30—Decision processes by autonomous network management units using voting and bidding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/142—Reconfiguring to eliminate the error
- G06F11/1425—Reconfiguring to eliminate the error by reconfiguration of node membership
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/1438—Restarting or rejuvenating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1471—Saving, restoring, recovering or retrying involving logging of persistent data for recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0659—Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
- H04L41/0661—Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities by reconfiguring faulty entities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/069—Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0895—Configuration of virtualised networks or elements, e.g. virtualised network function or OpenFlow elements
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Hardware Redundancy (AREA)
Abstract
一种故障恢复的方法,应用在分布式集群系统,所述分布式集群系统包括的拥有最新日志的节点数量会影响其中一个拥有最新日志的节点故障重启后选举一个没有最新日志的节点成为领导者Leader,所述分布式集群系统至少包括第一节点、第二节点和第三节点,其中第一节点和第二节点拥有所述故障前的最新日志,第三节点没有所述最新日志,该方法包括:第一节点故障重启后,投票状态设置为不能投票,投票状态用于指示第一节点是否可以在所述分布式集群系统选举Leader的过程中进行投票;第一节点接收来自第二节点的复制日志条目消息,将第一节点的投票状态设置为可以投票,第二节点为Leader。该方法有助于提高分布式集群系统的安全性。
Description
PCT国内申请,说明书已公开。
Claims (23)
- PCT国内申请,权利要求书已公开。
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2016/113848 WO2018120174A1 (zh) | 2016-12-30 | 2016-12-30 | 故障恢复的方法、设备和系统 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110431533A true CN110431533A (zh) | 2019-11-08 |
CN110431533B CN110431533B (zh) | 2021-09-14 |
Family
ID=62706721
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201680091858.0A Active CN110431533B (zh) | 2016-12-30 | 2016-12-30 | 故障恢复的方法、设备和系统 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11102084B2 (zh) |
EP (1) | EP3553669B1 (zh) |
CN (1) | CN110431533B (zh) |
WO (1) | WO2018120174A1 (zh) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111538763A (zh) * | 2020-04-24 | 2020-08-14 | 咪咕文化科技有限公司 | 一种确定集群中主节点的方法、电子设备和存储介质 |
CN112601216A (zh) * | 2020-12-10 | 2021-04-02 | 苏州浪潮智能科技有限公司 | 一种基于Zigbee的可信平台告警方法与系统 |
CN112865995A (zh) * | 2019-11-27 | 2021-05-28 | 上海哔哩哔哩科技有限公司 | 分布式主从系统 |
CN113014634A (zh) * | 2021-02-20 | 2021-06-22 | 成都新希望金融信息有限公司 | 集群选举处理方法、装置、设备及存储介质 |
CN113742254A (zh) * | 2021-01-19 | 2021-12-03 | 北京沃东天骏信息技术有限公司 | 内存碎片治理方法、装置和系统 |
CN114299655A (zh) * | 2020-09-23 | 2022-04-08 | 成都中科信息技术有限公司 | 一种电子投票系统及其工作方法 |
CN114518973A (zh) * | 2022-02-18 | 2022-05-20 | 成都西南信息控制研究院有限公司 | 分布式集群节点宕机重启恢复方法 |
CN115794478A (zh) * | 2023-02-06 | 2023-03-14 | 天翼云科技有限公司 | 系统配置方法、装置、电子设备及存储介质 |
CN116028250A (zh) * | 2021-10-26 | 2023-04-28 | 慧与发展有限责任合伙企业 | 具有多个集群级别的分解式存储 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10595363B2 (en) * | 2018-05-11 | 2020-03-17 | At&T Intellectual Property I, L.P. | Autonomous topology management for wireless radio user equipment |
CN114189421B (zh) * | 2022-02-17 | 2022-05-31 | 江西农业大学 | 一种领导者节点选举方法、系统、存储介质及设备 |
CN114448996B (zh) * | 2022-03-08 | 2022-11-11 | 南京大学 | 基于计算存储分离框架下的冗余存储资源的共识方法和系统 |
CN114406409B (zh) * | 2022-03-30 | 2022-07-12 | 中国船级社 | 一种焊接机故障状态的确定方法、装置及设备 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050132154A1 (en) * | 2003-10-03 | 2005-06-16 | International Business Machines Corporation | Reliable leader election in storage area network |
CN103763155A (zh) * | 2014-01-24 | 2014-04-30 | 国家电网公司 | 分布式云存储系统多服务心跳监测方法 |
CN103793517A (zh) * | 2014-02-12 | 2014-05-14 | 浪潮电子信息产业股份有限公司 | 一种基于监控机制的文件系统日志转储动态增容方法 |
CN104994168A (zh) * | 2015-07-14 | 2015-10-21 | 苏州科达科技股份有限公司 | 分布式存储方法及分布式存储系统 |
CN105512266A (zh) * | 2015-12-03 | 2016-04-20 | 曙光信息产业(北京)有限公司 | 一种实现分布式数据库操作一致性的方法及装置 |
US9507843B1 (en) * | 2013-09-20 | 2016-11-29 | Amazon Technologies, Inc. | Efficient replication of distributed storage changes for read-only nodes of a distributed database |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7558883B1 (en) * | 2002-06-28 | 2009-07-07 | Microsoft Corporation | Fast transaction commit |
CN103152434A (zh) * | 2013-03-27 | 2013-06-12 | 江苏辰云信息科技有限公司 | 一种分布式云系统中的领导节点更替方法 |
US9047246B1 (en) * | 2014-07-31 | 2015-06-02 | Splunk Inc. | High availability scheduler |
CN105511987A (zh) * | 2015-12-08 | 2016-04-20 | 上海爱数信息技术股份有限公司 | 一种强一致性且高可用的分布式任务管理系统 |
US10503427B2 (en) * | 2017-03-10 | 2019-12-10 | Pure Storage, Inc. | Synchronously replicating datasets and other managed objects to cloud-based storage systems |
-
2016
- 2016-12-30 WO PCT/CN2016/113848 patent/WO2018120174A1/zh unknown
- 2016-12-30 CN CN201680091858.0A patent/CN110431533B/zh active Active
- 2016-12-30 EP EP16925925.6A patent/EP3553669B1/en active Active
-
2019
- 2019-06-28 US US16/456,679 patent/US11102084B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050132154A1 (en) * | 2003-10-03 | 2005-06-16 | International Business Machines Corporation | Reliable leader election in storage area network |
US9507843B1 (en) * | 2013-09-20 | 2016-11-29 | Amazon Technologies, Inc. | Efficient replication of distributed storage changes for read-only nodes of a distributed database |
CN103763155A (zh) * | 2014-01-24 | 2014-04-30 | 国家电网公司 | 分布式云存储系统多服务心跳监测方法 |
CN103793517A (zh) * | 2014-02-12 | 2014-05-14 | 浪潮电子信息产业股份有限公司 | 一种基于监控机制的文件系统日志转储动态增容方法 |
CN104994168A (zh) * | 2015-07-14 | 2015-10-21 | 苏州科达科技股份有限公司 | 分布式存储方法及分布式存储系统 |
CN105512266A (zh) * | 2015-12-03 | 2016-04-20 | 曙光信息产业(北京)有限公司 | 一种实现分布式数据库操作一致性的方法及装置 |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112865995A (zh) * | 2019-11-27 | 2021-05-28 | 上海哔哩哔哩科技有限公司 | 分布式主从系统 |
CN112865995B (zh) * | 2019-11-27 | 2022-10-14 | 上海哔哩哔哩科技有限公司 | 分布式主从系统 |
CN111538763A (zh) * | 2020-04-24 | 2020-08-14 | 咪咕文化科技有限公司 | 一种确定集群中主节点的方法、电子设备和存储介质 |
CN111538763B (zh) * | 2020-04-24 | 2023-08-15 | 咪咕文化科技有限公司 | 一种确定集群中主节点的方法、电子设备和存储介质 |
CN114299655B (zh) * | 2020-09-23 | 2023-09-05 | 成都中科信息技术有限公司 | 一种电子投票系统及其工作方法 |
CN114299655A (zh) * | 2020-09-23 | 2022-04-08 | 成都中科信息技术有限公司 | 一种电子投票系统及其工作方法 |
CN112601216A (zh) * | 2020-12-10 | 2021-04-02 | 苏州浪潮智能科技有限公司 | 一种基于Zigbee的可信平台告警方法与系统 |
CN112601216B (zh) * | 2020-12-10 | 2022-06-21 | 苏州浪潮智能科技有限公司 | 一种基于Zigbee的可信平台告警方法与系统 |
CN113742254A (zh) * | 2021-01-19 | 2021-12-03 | 北京沃东天骏信息技术有限公司 | 内存碎片治理方法、装置和系统 |
CN113014634A (zh) * | 2021-02-20 | 2021-06-22 | 成都新希望金融信息有限公司 | 集群选举处理方法、装置、设备及存储介质 |
CN116028250B (zh) * | 2021-10-26 | 2024-06-11 | 慧与发展有限责任合伙企业 | 具有多个集群级别的分解式存储 |
CN116028250A (zh) * | 2021-10-26 | 2023-04-28 | 慧与发展有限责任合伙企业 | 具有多个集群级别的分解式存储 |
CN114518973A (zh) * | 2022-02-18 | 2022-05-20 | 成都西南信息控制研究院有限公司 | 分布式集群节点宕机重启恢复方法 |
CN114518973B (zh) * | 2022-02-18 | 2024-07-30 | 成都西南信息控制研究院有限公司 | 分布式集群节点宕机重启恢复方法 |
CN115794478A (zh) * | 2023-02-06 | 2023-03-14 | 天翼云科技有限公司 | 系统配置方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN110431533B (zh) | 2021-09-14 |
US11102084B2 (en) | 2021-08-24 |
EP3553669A4 (en) | 2019-10-16 |
WO2018120174A1 (zh) | 2018-07-05 |
EP3553669A1 (en) | 2019-10-16 |
US20190386893A1 (en) | 2019-12-19 |
EP3553669B1 (en) | 2024-09-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110431533B (zh) | 故障恢复的方法、设备和系统 | |
CN113014634B (zh) | 集群选举处理方法、装置、设备及存储介质 | |
EP3928208B1 (en) | System and method for self-healing in decentralized model building for machine learning using blockchain | |
US7249280B2 (en) | Cheap paxos | |
US7856502B2 (en) | Cheap paxos | |
US7711825B2 (en) | Simplified Paxos | |
US9465650B2 (en) | Executing distributed globally-ordered transactional workloads in replicated state machines | |
EP2434729A2 (en) | Method for providing access to data items from a distributed storage system | |
WO2014197963A1 (en) | Failover system and method | |
CN110865907B (zh) | 在主服务器与从服务器之间提供服务冗余的方法和系统 | |
EP4191429B1 (en) | Techniques to achieve cache coherency across distributed storage clusters | |
CN114554593A (zh) | 数据处理方法及装置 | |
US11010086B2 (en) | Data synchronization method and out-of-band management device | |
US11522966B2 (en) | Methods, devices and systems for non-disruptive upgrades to a replicated state machine in a distributed computing environment | |
EP3140735A1 (en) | System and method for running application processes | |
CN110781039B (zh) | 哨兵进程选举方法及装置 | |
CN100442248C (zh) | 用于避免竞争的计算机系统同步单元 | |
CN113157494B (zh) | 区块链系统中数据备份的方法及装置 | |
CN117215833A (zh) | 分布式数据备份方法、系统、设备及存储介质 | |
US20240111747A1 (en) | Optimizing the operation of a microservice cluster | |
CN118331644A (zh) | 一种交互控制方法、装置、设备及介质 | |
Zhu | Shaft: Serializable, highly available and fault tolerant concurrency control in the cloud | |
CN114721869A (zh) | 一种账户余额处理方法及系统 | |
CN115344424A (zh) | 一种同步状态恢复方法、装置、设备及存储介质 | |
CN113568710A (zh) | 一种虚拟机高可用实现方法、装置和设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |