WO2019175624A1 - Chained replication service - Google Patents

Chained replication service Download PDF

Info

Publication number
WO2019175624A1
WO2019175624A1 PCT/IB2018/051628 IB2018051628W WO2019175624A1 WO 2019175624 A1 WO2019175624 A1 WO 2019175624A1 IB 2018051628 W IB2018051628 W IB 2018051628W WO 2019175624 A1 WO2019175624 A1 WO 2019175624A1
Authority
WO
WIPO (PCT)
Prior art keywords
cluster
nodes
replica
task
client
Prior art date
Application number
PCT/IB2018/051628
Other languages
French (fr)
Inventor
Pratik Sharma
Original Assignee
Pratik Sharma
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pratik Sharma filed Critical Pratik Sharma
Priority to PCT/IB2018/051628 priority Critical patent/WO2019175624A1/en
Publication of WO2019175624A1 publication Critical patent/WO2019175624A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
    • G06F11/182Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits based on mutual exchange of the output between redundant processing components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45575Starting, stopping, suspending or resuming virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45591Monitoring or debugging support

Definitions

  • a chained replication service which is implemented by n replicas or virtual nodes in a chain or cluster that execute operations requested by clients.
  • Replica nodes and clients run in different clusters in cloud and are connected by a network.
  • a node in a cluster can inconsistently appear both failed and functioning to failure-detection systems, presenting different symptoms to different observers. It is difficult for the other nodes to declare it failed and shut it out of the network, because they need to first reach a consensus regarding which node is failed in the first place.
  • the term Byzantine Failure is derived from the Byzantine Generals' Problem, where actors must agree on a concerted strategy to avoid catastrophic system failure, but some of the actors are unreliable.
  • Byzantine Fault Tolerance implements a form of state machine replication that allows replication of services that perform arbitrary computations provided they are deterministic, that is, replica nodes must produce the same sequence of results when they process the same sequence of operations.
  • Byzantine Fault Tolerance provides both safety and liveness properties assuming no more than [(n- l)/3] replica nodes are faulty over the lifetime of the replica cluster.
  • Client will publish a sequence of operations consisting of a single task or assignment to the cluster of n replica nodes with a task identifier or task sequence number.
  • Each of replica nodes or virtual nodes in the chain or cluster will be registered as subscribers with the client and hence will pick up those sequence of operations along with the task identifier or sequence number.
  • Each of the replica node in the cluster performs the sequence of operations or task and returns the result back to the client along with the task sequence number or identifier.
  • the client must get at least [(2n+l)/3] same replies to assume that no more than [(n-l)/3] replica nodes were faulty and the client will remove all the fault replica nodes from the replica chain or cluster if any. Now if more than [(n-l)/3] replica nodes are faulty, then the client will create a new chain or cluster of replica nodes from the latest snapshot it has of the chain or cluster of replica nodes and will again notify each of the replica nodes of the previous uncompleted task. Snapshot of the replica chain or cluster will be taken on each completed task after the result is verified by the client.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Hardware Redundancy (AREA)

Abstract

Here we have a chained replication service using Byzantine Fault Tolerance Algorithm which is implemented by n replicas or virtual nodes in a chain or cluster that execute operations requested by clients. Client will publish a sequence of operations consisting of a single task or assignment to the cluster of n replica nodes with a task identifier or task sequence number. Each of the replica node in the cluster performs the sequence of operations or task and returns the result back to the client along with the task sequence number or identifier. Now the client must get at least [(2n+1)/3] same replies to assume that no more than [(n-1)/3] replica nodes were faulty and the client will remove all the fault replica nodes from the replica chain or cluster if any.

Description

Chained Replication Service
In this invention we have a chained replication service which is implemented by n replicas or virtual nodes in a chain or cluster that execute operations requested by clients. Replica nodes and clients run in different clusters in cloud and are connected by a network. Now in a Byzantine failure, a node in a cluster can inconsistently appear both failed and functioning to failure-detection systems, presenting different symptoms to different observers. It is difficult for the other nodes to declare it failed and shut it out of the network, because they need to first reach a consensus regarding which node is failed in the first place. The term Byzantine Failure is derived from the Byzantine Generals' Problem, where actors must agree on a concerted strategy to avoid catastrophic system failure, but some of the actors are unreliable. Here we use Byzantine Fault Tolerance algorithm to build a chain or cluster of virtual machine nodes such that we can handle faulty nodes and imperfect information on whether a node in the cluster is failed. Byzantine Fault Tolerance implements a form of state machine replication that allows replication of services that perform arbitrary computations provided they are deterministic, that is, replica nodes must produce the same sequence of results when they process the same sequence of operations.
Byzantine Fault Tolerance provides both safety and liveness properties assuming no more than [(n- l)/3] replica nodes are faulty over the lifetime of the replica cluster. Client will publish a sequence of operations consisting of a single task or assignment to the cluster of n replica nodes with a task identifier or task sequence number. Each of replica nodes or virtual nodes in the chain or cluster will be registered as subscribers with the client and hence will pick up those sequence of operations along with the task identifier or sequence number. Each of the replica node in the cluster performs the sequence of operations or task and returns the result back to the client along with the task sequence number or identifier. Now the client must get at least [(2n+l)/3] same replies to assume that no more than [(n-l)/3] replica nodes were faulty and the client will remove all the fault replica nodes from the replica chain or cluster if any. Now if more than [(n-l)/3] replica nodes are faulty, then the client will create a new chain or cluster of replica nodes from the latest snapshot it has of the chain or cluster of replica nodes and will again notify each of the replica nodes of the previous uncompleted task. Snapshot of the replica chain or cluster will be taken on each completed task after the result is verified by the client.

Claims

Claims Following is the claim for this invention: -
1. In this invention we have a chained replication service which is implemented by n replicas or virtual nodes in a chain or cluster that execute operations requested by clients. Replica nodes and clients run in different clusters in cloud and are connected by a network. Now in a Byzantine failure, a node in a cluster can inconsistently appear both failed and functioning to failure-detection systems, presenting different symptoms to different observers. It is difficult for the other nodes to declare it failed and shut it out of the network, because they need to first reach a consensus regarding which node is failed in the first place. The term Byzantine Failure is derived from the Byzantine Generals' Problem, where actors must agree on a concerted strategy to avoid catastrophic system failure, but some of the actors are unreliable. Here we use Byzantine Fault Tolerance algorithm to build a chain or cluster of virtual machine nodes such that we can handle faulty nodes and imperfect information on whether a node in the cluster is failed.
Byzantine Fault Tolerance implements a form of state machine replication that allows replication of services that perform arbitrary computations provided they are deterministic, that is, replica nodes must produce the same sequence of results when they process the same sequence of operations. Byzantine Fault Tolerance provides both safety and liveness properties assuming no more than [(n- l)/3] replica nodes are faulty over the lifetime of the replica cluster. Client will publish a sequence of operations consisting of a single task or assignment to the cluster of n replica nodes with a task identifier or task sequence number. Each of replica nodes or virtual nodes in the chain or cluster will be registered as subscribers with the client and hence will pick up those sequence of operations along with the task identifier or sequence number. Each of the replica node in the cluster performs the sequence of operations or task and returns the result back to the client along with the task sequence number or identifier. Now the client must get at least [(2n+l)/3] same replies to assume that no more than [(n-l)/3] replica nodes were faulty and the client will remove all the fault replica nodes from the replica chain or cluster if any. Now if more than [(n-l)/3] replica nodes are faulty, then the client will create a new chain or cluster of replica nodes from the latest snapshot it has of the chain or cluster of replica nodes and will again notify each of the replica nodes of the previous uncompleted task. Snapshot of the replica chain or cluster will be taken on each completed task after the result is verified by the client. The above novel technique of providing Chained Replication Service with the help of n virtual machine nodes in a chain or cluster is the claim for this invention.
PCT/IB2018/051628 2018-03-12 2018-03-12 Chained replication service WO2019175624A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2018/051628 WO2019175624A1 (en) 2018-03-12 2018-03-12 Chained replication service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2018/051628 WO2019175624A1 (en) 2018-03-12 2018-03-12 Chained replication service

Publications (1)

Publication Number Publication Date
WO2019175624A1 true WO2019175624A1 (en) 2019-09-19

Family

ID=67907469

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2018/051628 WO2019175624A1 (en) 2018-03-12 2018-03-12 Chained replication service

Country Status (1)

Country Link
WO (1) WO2019175624A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130036323A1 (en) * 2011-03-28 2013-02-07 Siemens Corporation Fault-tolerant replication architecture
US9753792B2 (en) * 2013-03-20 2017-09-05 Nec Europe Ltd. Method and system for byzantine fault tolerant data replication

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130036323A1 (en) * 2011-03-28 2013-02-07 Siemens Corporation Fault-tolerant replication architecture
US9753792B2 (en) * 2013-03-20 2017-09-05 Nec Europe Ltd. Method and system for byzantine fault tolerant data replication

Similar Documents

Publication Publication Date Title
AU2019203861B2 (en) System and method for ending view change protocol
CN111181715B (en) Multi-party cross-linking method based on consistent Hash consensus protocol
JP6968166B2 (en) Byzantine Disability Tolerant Replicating Methods and Systems
US20080052327A1 (en) Secondary Backup Replication Technique for Clusters
EP3433759A1 (en) Method and apparatus for expanding high-availability server cluster
US9633100B2 (en) System and method for data structure synchronization
WO2019072294A3 (en) Achieving consensus among network nodes in a distributed system
WO2017067484A1 (en) Virtualization data center scheduling system and method
WO2003039071A1 (en) Method to manage high availability equipments
CN106547643B (en) Recovery method and device of abnormal data
CN105406980A (en) Multi-node backup method and multi-node backup device
CN109845192B (en) Computer system and method for dynamically adapting a network and computer readable medium
CN111460039A (en) Relational database processing system, client, server and method
WO2012069091A1 (en) Real time database system
Mohan et al. Primary-backup controller mapping for byzantine fault tolerance in software defined networks
CN109039748B (en) Method for dynamically adding and deleting nodes by PBFT protocol
EP2874377A1 (en) Method for controlling operations of server cluster
Cowling et al. Census: Location-aware membership management for large-scale distributed systems
Bezerra et al. Ridge: high-throughput, low-latency atomic multicast
WO2019175624A1 (en) Chained replication service
CN112565314B (en) Computing cluster and computing nodes in computing cluster
Ma et al. Scheme for optical network recovery schedule to restore virtual networks after a disaster
Rodrigues et al. From spontaneous total order to uniform total order: different degrees of optimistic delivery
CN105141445A (en) Method and device for realizing multiple backups of multiple flow groups in high-availability cluster system
WO2017000845A1 (en) Traffic control method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18909867

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18909867

Country of ref document: EP

Kind code of ref document: A1