WO2019175624A1 - Chained replication service - Google Patents
Chained replication service Download PDFInfo
- Publication number
- WO2019175624A1 WO2019175624A1 PCT/IB2018/051628 IB2018051628W WO2019175624A1 WO 2019175624 A1 WO2019175624 A1 WO 2019175624A1 IB 2018051628 W IB2018051628 W IB 2018051628W WO 2019175624 A1 WO2019175624 A1 WO 2019175624A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cluster
- nodes
- replica
- task
- client
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/40—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/18—Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
- G06F11/182—Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits based on mutual exchange of the output between redundant processing components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2053—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
- G06F11/2094—Redundant storage or storage space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45562—Creating, deleting, cloning virtual machine instances
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45575—Starting, stopping, suspending or resuming virtual machine instances
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45591—Monitoring or debugging support
Definitions
- a chained replication service which is implemented by n replicas or virtual nodes in a chain or cluster that execute operations requested by clients.
- Replica nodes and clients run in different clusters in cloud and are connected by a network.
- a node in a cluster can inconsistently appear both failed and functioning to failure-detection systems, presenting different symptoms to different observers. It is difficult for the other nodes to declare it failed and shut it out of the network, because they need to first reach a consensus regarding which node is failed in the first place.
- the term Byzantine Failure is derived from the Byzantine Generals' Problem, where actors must agree on a concerted strategy to avoid catastrophic system failure, but some of the actors are unreliable.
- Byzantine Fault Tolerance implements a form of state machine replication that allows replication of services that perform arbitrary computations provided they are deterministic, that is, replica nodes must produce the same sequence of results when they process the same sequence of operations.
- Byzantine Fault Tolerance provides both safety and liveness properties assuming no more than [(n- l)/3] replica nodes are faulty over the lifetime of the replica cluster.
- Client will publish a sequence of operations consisting of a single task or assignment to the cluster of n replica nodes with a task identifier or task sequence number.
- Each of replica nodes or virtual nodes in the chain or cluster will be registered as subscribers with the client and hence will pick up those sequence of operations along with the task identifier or sequence number.
- Each of the replica node in the cluster performs the sequence of operations or task and returns the result back to the client along with the task sequence number or identifier.
- the client must get at least [(2n+l)/3] same replies to assume that no more than [(n-l)/3] replica nodes were faulty and the client will remove all the fault replica nodes from the replica chain or cluster if any. Now if more than [(n-l)/3] replica nodes are faulty, then the client will create a new chain or cluster of replica nodes from the latest snapshot it has of the chain or cluster of replica nodes and will again notify each of the replica nodes of the previous uncompleted task. Snapshot of the replica chain or cluster will be taken on each completed task after the result is verified by the client.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Hardware Redundancy (AREA)
Abstract
Here we have a chained replication service using Byzantine Fault Tolerance Algorithm which is implemented by n replicas or virtual nodes in a chain or cluster that execute operations requested by clients. Client will publish a sequence of operations consisting of a single task or assignment to the cluster of n replica nodes with a task identifier or task sequence number. Each of the replica node in the cluster performs the sequence of operations or task and returns the result back to the client along with the task sequence number or identifier. Now the client must get at least [(2n+1)/3] same replies to assume that no more than [(n-1)/3] replica nodes were faulty and the client will remove all the fault replica nodes from the replica chain or cluster if any.
Description
Chained Replication Service
In this invention we have a chained replication service which is implemented by n replicas or virtual nodes in a chain or cluster that execute operations requested by clients. Replica nodes and clients run in different clusters in cloud and are connected by a network. Now in a Byzantine failure, a node in a cluster can inconsistently appear both failed and functioning to failure-detection systems, presenting different symptoms to different observers. It is difficult for the other nodes to declare it failed and shut it out of the network, because they need to first reach a consensus regarding which node is failed in the first place. The term Byzantine Failure is derived from the Byzantine Generals' Problem, where actors must agree on a concerted strategy to avoid catastrophic system failure, but some of the actors are unreliable. Here we use Byzantine Fault Tolerance algorithm to build a chain or cluster of virtual machine nodes such that we can handle faulty nodes and imperfect information on whether a node in the cluster is failed. Byzantine Fault Tolerance implements a form of state machine replication that allows replication of services that perform arbitrary computations provided they are deterministic, that is, replica nodes must produce the same sequence of results when they process the same sequence of operations.
Byzantine Fault Tolerance provides both safety and liveness properties assuming no more than [(n- l)/3] replica nodes are faulty over the lifetime of the replica cluster. Client will publish a sequence of operations consisting of a single task or assignment to the cluster of n replica nodes with a task identifier or task sequence number. Each of replica nodes or virtual nodes in the chain or cluster will be registered as subscribers with the client and hence will pick up those sequence of operations along with the task identifier or sequence number. Each of the replica node in the cluster performs the sequence of operations or task and returns the result back to the client along with the task sequence number or identifier. Now the client must get at least [(2n+l)/3] same replies to assume that no more than [(n-l)/3] replica nodes were faulty and the client will remove all the fault replica nodes from the replica chain or cluster if any. Now if more than [(n-l)/3] replica nodes are faulty, then the client will create a new chain or cluster of replica nodes from the latest snapshot it has of the chain or cluster of replica nodes and will again notify each of the replica nodes of the previous uncompleted task. Snapshot of the replica chain or cluster will be taken on each completed task after the result is verified by the client.
Claims
1. In this invention we have a chained replication service which is implemented by n replicas or virtual nodes in a chain or cluster that execute operations requested by clients. Replica nodes and clients run in different clusters in cloud and are connected by a network. Now in a Byzantine failure, a node in a cluster can inconsistently appear both failed and functioning to failure-detection systems, presenting different symptoms to different observers. It is difficult for the other nodes to declare it failed and shut it out of the network, because they need to first reach a consensus regarding which node is failed in the first place. The term Byzantine Failure is derived from the Byzantine Generals' Problem, where actors must agree on a concerted strategy to avoid catastrophic system failure, but some of the actors are unreliable. Here we use Byzantine Fault Tolerance algorithm to build a chain or cluster of virtual machine nodes such that we can handle faulty nodes and imperfect information on whether a node in the cluster is failed.
Byzantine Fault Tolerance implements a form of state machine replication that allows replication of services that perform arbitrary computations provided they are deterministic, that is, replica nodes must produce the same sequence of results when they process the same sequence of operations. Byzantine Fault Tolerance provides both safety and liveness properties assuming no more than [(n- l)/3] replica nodes are faulty over the lifetime of the replica cluster. Client will publish a sequence of operations consisting of a single task or assignment to the cluster of n replica nodes with a task identifier or task sequence number. Each of replica nodes or virtual nodes in the chain or cluster will be registered as subscribers with the client and hence will pick up those sequence of operations along with the task identifier or sequence number. Each of the replica node in the cluster performs the sequence of operations or task and returns the result back to the client along with the task sequence number or identifier. Now the client must get at least [(2n+l)/3] same replies to assume that no more than [(n-l)/3] replica nodes were faulty and the client will remove all the fault replica nodes from the replica chain or cluster if any. Now if more than [(n-l)/3] replica nodes are faulty, then the client will create a new chain or cluster of replica nodes from the latest snapshot it has of the chain or cluster of replica nodes and will again notify each of the replica nodes of the previous uncompleted task. Snapshot of the replica chain or cluster will be taken on each completed task after the result is verified by the client. The above novel technique of providing Chained Replication Service with the help of n virtual machine nodes in a chain or cluster is the claim for this invention.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/IB2018/051628 WO2019175624A1 (en) | 2018-03-12 | 2018-03-12 | Chained replication service |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/IB2018/051628 WO2019175624A1 (en) | 2018-03-12 | 2018-03-12 | Chained replication service |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019175624A1 true WO2019175624A1 (en) | 2019-09-19 |
Family
ID=67907469
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2018/051628 WO2019175624A1 (en) | 2018-03-12 | 2018-03-12 | Chained replication service |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2019175624A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130036323A1 (en) * | 2011-03-28 | 2013-02-07 | Siemens Corporation | Fault-tolerant replication architecture |
US9753792B2 (en) * | 2013-03-20 | 2017-09-05 | Nec Europe Ltd. | Method and system for byzantine fault tolerant data replication |
-
2018
- 2018-03-12 WO PCT/IB2018/051628 patent/WO2019175624A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130036323A1 (en) * | 2011-03-28 | 2013-02-07 | Siemens Corporation | Fault-tolerant replication architecture |
US9753792B2 (en) * | 2013-03-20 | 2017-09-05 | Nec Europe Ltd. | Method and system for byzantine fault tolerant data replication |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2019203861B2 (en) | System and method for ending view change protocol | |
CN111181715B (en) | Multi-party cross-linking method based on consistent Hash consensus protocol | |
JP6968166B2 (en) | Byzantine Disability Tolerant Replicating Methods and Systems | |
US20080052327A1 (en) | Secondary Backup Replication Technique for Clusters | |
EP3433759A1 (en) | Method and apparatus for expanding high-availability server cluster | |
US9633100B2 (en) | System and method for data structure synchronization | |
WO2019072294A3 (en) | Achieving consensus among network nodes in a distributed system | |
WO2017067484A1 (en) | Virtualization data center scheduling system and method | |
WO2003039071A1 (en) | Method to manage high availability equipments | |
CN106547643B (en) | Recovery method and device of abnormal data | |
CN105406980A (en) | Multi-node backup method and multi-node backup device | |
CN109845192B (en) | Computer system and method for dynamically adapting a network and computer readable medium | |
CN111460039A (en) | Relational database processing system, client, server and method | |
WO2012069091A1 (en) | Real time database system | |
Mohan et al. | Primary-backup controller mapping for byzantine fault tolerance in software defined networks | |
CN109039748B (en) | Method for dynamically adding and deleting nodes by PBFT protocol | |
EP2874377A1 (en) | Method for controlling operations of server cluster | |
Cowling et al. | Census: Location-aware membership management for large-scale distributed systems | |
Bezerra et al. | Ridge: high-throughput, low-latency atomic multicast | |
WO2019175624A1 (en) | Chained replication service | |
CN112565314B (en) | Computing cluster and computing nodes in computing cluster | |
Ma et al. | Scheme for optical network recovery schedule to restore virtual networks after a disaster | |
Rodrigues et al. | From spontaneous total order to uniform total order: different degrees of optimistic delivery | |
CN105141445A (en) | Method and device for realizing multiple backups of multiple flow groups in high-availability cluster system | |
WO2017000845A1 (en) | Traffic control method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18909867 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18909867 Country of ref document: EP Kind code of ref document: A1 |