WO2019175624A1

WO2019175624A1 - Chained replication service

Info

Publication number: WO2019175624A1
Application number: PCT/IB2018/051628
Authority: WO
Inventors: Pratik Sharma
Original assignee: Pratik Sharma
Priority date: 2018-03-12
Filing date: 2018-03-12
Publication date: 2019-09-19

Abstract

Here we have a chained replication service using Byzantine Fault Tolerance Algorithm which is implemented by n replicas or virtual nodes in a chain or cluster that execute operations requested by clients. Client will publish a sequence of operations consisting of a single task or assignment to the cluster of n replica nodes with a task identifier or task sequence number. Each of the replica node in the cluster performs the sequence of operations or task and returns the result back to the client along with the task sequence number or identifier. Now the client must get at least [(2n+1)/3] same replies to assume that no more than [(n-1)/3] replica nodes were faulty and the client will remove all the fault replica nodes from the replica chain or cluster if any.

Description

Chained Replication Service

In this invention we have a chained replication service which is implemented by n replicas or virtual nodes in a chain or cluster that execute operations requested by clients. Replica nodes and clients run in different clusters in cloud and are connected by a network. Now in a Byzantine failure, a node in a cluster can inconsistently appear both failed and functioning to failure-detection systems, presenting different symptoms to different observers. It is difficult for the other nodes to declare it failed and shut it out of the network, because they need to first reach a consensus regarding which node is failed in the first place. The term Byzantine Failure is derived from the Byzantine Generals' Problem, where actors must agree on a concerted strategy to avoid catastrophic system failure, but some of the actors are unreliable. Here we use Byzantine Fault Tolerance algorithm to build a chain or cluster of virtual machine nodes such that we can handle faulty nodes and imperfect information on whether a node in the cluster is failed. Byzantine Fault Tolerance implements a form of state machine replication that allows replication of services that perform arbitrary computations provided they are deterministic, that is, replica nodes must produce the same sequence of results when they process the same sequence of operations.

Byzantine Fault Tolerance provides both safety and liveness properties assuming no more than [(n- l)/3] replica nodes are faulty over the lifetime of the replica cluster. Client will publish a sequence of operations consisting of a single task or assignment to the cluster of n replica nodes with a task identifier or task sequence number. Each of replica nodes or virtual nodes in the chain or cluster will be registered as subscribers with the client and hence will pick up those sequence of operations along with the task identifier or sequence number. Each of the replica node in the cluster performs the sequence of operations or task and returns the result back to the client along with the task sequence number or identifier. Now the client must get at least [(2n+l)/3] same replies to assume that no more than [(n-l)/3] replica nodes were faulty and the client will remove all the fault replica nodes from the replica chain or cluster if any. Now if more than [(n-l)/3] replica nodes are faulty, then the client will create a new chain or cluster of replica nodes from the latest snapshot it has of the chain or cluster of replica nodes and will again notify each of the replica nodes of the previous uncompleted task. Snapshot of the replica chain or cluster will be taken on each completed task after the result is verified by the client.

Claims

Claims Following is the claim for this invention: -

1. In this invention we have a chained replication service which is implemented by n replicas or virtual nodes in a chain or cluster that execute operations requested by clients. Replica nodes and clients run in different clusters in cloud and are connected by a network. Now in a Byzantine failure, a node in a cluster can inconsistently appear both failed and functioning to failure-detection systems, presenting different symptoms to different observers. It is difficult for the other nodes to declare it failed and shut it out of the network, because they need to first reach a consensus regarding which node is failed in the first place. The term Byzantine Failure is derived from the Byzantine Generals' Problem, where actors must agree on a concerted strategy to avoid catastrophic system failure, but some of the actors are unreliable. Here we use Byzantine Fault Tolerance algorithm to build a chain or cluster of virtual machine nodes such that we can handle faulty nodes and imperfect information on whether a node in the cluster is failed.

Byzantine Fault Tolerance implements a form of state machine replication that allows replication of services that perform arbitrary computations provided they are deterministic, that is, replica nodes must produce the same sequence of results when they process the same sequence of operations. Byzantine Fault Tolerance provides both safety and liveness properties assuming no more than [(n- l)/3] replica nodes are faulty over the lifetime of the replica cluster. Client will publish a sequence of operations consisting of a single task or assignment to the cluster of n replica nodes with a task identifier or task sequence number. Each of replica nodes or virtual nodes in the chain or cluster will be registered as subscribers with the client and hence will pick up those sequence of operations along with the task identifier or sequence number. Each of the replica node in the cluster performs the sequence of operations or task and returns the result back to the client along with the task sequence number or identifier. Now the client must get at least [(2n+l)/3] same replies to assume that no more than [(n-l)/3] replica nodes were faulty and the client will remove all the fault replica nodes from the replica chain or cluster if any. Now if more than [(n-l)/3] replica nodes are faulty, then the client will create a new chain or cluster of replica nodes from the latest snapshot it has of the chain or cluster of replica nodes and will again notify each of the replica nodes of the previous uncompleted task. Snapshot of the replica chain or cluster will be taken on each completed task after the result is verified by the client. The above novel technique of providing Chained Replication Service with the help of n virtual machine nodes in a chain or cluster is the claim for this invention.