WO2019180477A1

WO2019180477A1 - Distributed group membership service

Info

Publication number: WO2019180477A1
Application number: PCT/IB2018/051785
Authority: WO
Inventors: Pratik Sharma
Original assignee: Pratik Sharma
Priority date: 2018-03-17
Filing date: 2018-03-17
Publication date: 2019-09-26

Abstract

Here we have a group of processes each of which runs on a different virtual machine node in order to complete specific set of tasks. We consider here an asynchronous distributed system, where processes communicate by exchanging messages. Processes running on different virtual machine nodes are identified by their unique identifiers along with the IP address of the virtual machine node where they are running. Every pair of processes is connected by a communication channel. To track failures on the same virtual machine node we use local failure detectors or programs which register for such failure events of the process with the kernel. Also local failure detectors corresponding to each process in the group running on different virtual machine nodes in the cluster can communicate with each other which helps the processes to get a view of adjacent up and running processes and which processes have failed.

Description

Distributed Group Membership Service

In this invention we have a group of processes each of which runs on a different virtual machine node in order to complete specific set of tasks. We consider here an asynchronous distributed system, where processes communicate by exchanging messages. Processes running on different virtual machine nodes are identified by their unique identifiers along with the Internet Protocol (IP) address of the virtual machine node (or its identifier) where they are running. Every pair of processes is connected by a communication channel. That is, every process can send messages to and can receive messages from any other. The failure model we assume allows processes to crash, silently halting their execution. To track such failures on the same virtual machine node we use local failure detectors or programs which register for such failure events with the kernel when a process crashes and halts, or when a process gets suspended due to another process crash, etc. Also local failure detectors corresponding to each process in the group running on different virtual machine nodes in the cluster can communicate with each other. We assume that a process communicates with its local failure detector through a special receive-only channel on which the local failure detector may place a new list of identifiers of processes along with identifiers or IP addresses of the virtual machine where they are running on and those processes are not suspected to have crashed. We call this list the adjacency view of the process. Also the local failure detector can share the adjacency view of the process along with the current state of the process itself (whether it is failed or not) with all other failure detectors running on different virtual machine nodes which it can reach out. This way all processes in the group running on different virtual machine nodes in the cluster have a consistent view of the up and running processes and all those processes will agree with a consensus to revoke the

membership of the failed process and distribute its pending tasks among themselves or create a new process on a virtual machine to achieve the same. Also to handle virtual machine node failures, we have a Health Check Service which periodically checks the health of each virtual machine node in the cluster and for a fixed number of consecutive cycles if a virtual machine node does not respond, the Health Check Service assumes that the virtual machine is down and hence updates about the same to all local failure detectors of all other virtual machines in the cluster.

Claims

Claims Following is the claim for this invention: -

1 . In this invention we have a group of processes each of which runs on a different virtual machine node in order to complete specific set of tasks. We consider here an asynchronous distributed system, where processes communicate by exchanging messages. Processes running on different virtual machine nodes are identified by their unique identifiers along with the Internet Protocol (IP) address of the virtual machine node (or its identifier) where they are running. Every pair of processes is connected by a communication channel. That is, every process can send messages to and can receive messages from any other. The failure model we assume allows processes to crash, silently halting their execution. To track such failures on the same virtual machine node we use local failure detectors or programs which register for such failure events with the kernel when a process crashes and halts, or when a process gets suspended due to another process crash, etc. Also local failure detectors corresponding to each process in the group running on different virtual machine nodes in the cluster can communicate with each other. We assume that a process communicates with its local failure detector through a special receive-only channel on which the local failure detector may place a new list of identifiers of processes along with identifiers or IP addresses of the virtual machine where they are running on and those processes are not suspected to have crashed. We call this list the adjacency view of the process. Also the local failure detector can share the adjacency view of the process along with the current state of the process itself (whether it is failed or not) with all other failure detectors running on different virtual machine nodes which it can reach out. This way all processes in the group running on different virtual machine nodes in the cluster have a consistent view of the up and running processes and all those processes will agree with a consensus to revoke the membership of the failed process and distribute its pending tasks among themselves or create a new process on a virtual machine to achieve the same. Also to handle virtual machine node failures, we have a Health Check Service which periodically checks the health of each virtual machine node in the cluster and for a fixed number of consecutive cycles if a virtual machine node does not respond, the Health Check Service assumes that the virtual machine is down and hence updates about the same to all local failure detectors of all other virtual machines in the cluster. The above novel technique of providing

membership service for a group of processes running on different virtual machine nodes in the cluster is the claim for this invention.