WO2014171810A2 - A system and method of fault tolerant for distributed applications in a virtualized environment - Google Patents

A system and method of fault tolerant for distributed applications in a virtualized environment Download PDF

Info

Publication number
WO2014171810A2
WO2014171810A2 PCT/MY2014/000035 MY2014000035W WO2014171810A2 WO 2014171810 A2 WO2014171810 A2 WO 2014171810A2 MY 2014000035 W MY2014000035 W MY 2014000035W WO 2014171810 A2 WO2014171810 A2 WO 2014171810A2
Authority
WO
WIPO (PCT)
Prior art keywords
task
peer
application agent
hash table
virtual machines
Prior art date
Application number
PCT/MY2014/000035
Other languages
French (fr)
Other versions
WO2014171810A3 (en
Inventor
Mohd Amril Nurman MOHD NAZIR
Hong Hoe ONG
Ettikan Kandasamy Karuppiah
Yaszrina MOHAMAD YASSIN
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2014171810A2 publication Critical patent/WO2014171810A2/en
Publication of WO2014171810A3 publication Critical patent/WO2014171810A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1482Generic software techniques for error detection or fault masking by means of middleware or OS functionality
    • G06F11/1484Generic software techniques for error detection or fault masking by means of middleware or OS functionality involving virtual machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1492Generic software techniques for error detection or fault masking by run-time replication performed by the application software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant

Definitions

  • the present invention relates to a system and method of fault tolerant for distributed applications in a virtualized environment.
  • the invention relates to systems and methods that utilize Application Agent (AA) of the application Peer-to-Peer (P2P) overlay network.
  • AA Application Agent
  • P2P Peer-to-Peer
  • Fault-tolerance is particularly sought-after in critical distributed applications that require interactive visualization and remote computational steering (e.g., finance, video security, simulations etc.)
  • the lack of capability of fault-tolerance for such critical applications may cause performance problems and may stop execution of application in a virtualized environment.
  • fault tolerance strategies being employed for distributed applications running in a virtualized environment e.g., cloud systems.
  • Existing mechanisms to provide solutions for lack of fault tolerance strategies are based on advance reservation and workflow. However, these approaches are impractical as users are required to specify exactly how long each task of the job runs. If task runs longer than its prescheduled time, it has to be rescheduled by the Resource Manager. Further, fault-tolerance from machine and/or network crash is possible as it is provided at the expense of re-scheduling resources for completion of the job.
  • US 2005/0027863 A1 hereby denoted as the US 863 Publication, which provides a method to deploy application tasks over multiple machines while ensuring fault tolerance.
  • Deployment of application tasks over multiple machines is based on distributed resource management node, a contract generation engine, and a contract repository.
  • a contract is generated for interactive session specifying resource allocations and authorizations and resources are allocated in accordance to the contract.
  • the present invention involves replication of computational tasks and data items in a DHT (Distributed Hash Table) based peer to peer (P2P) overlay network with small overhead by utilizing an Application Agent (AA) in the peer to peer (P2P) overlay network.
  • DHT Distributed Hash Table
  • P2P peer to peer
  • AA Application Agent
  • US 2010/0262882 A1 Another mechanism was proposed in the United States Patent Publication No. US 2010/0262882 A1 , hereby denoted as US 882 Publication. It relates generally to network layer optimization through the use of network accelerator devices (NAD), and particularly to methods, systems and computer program products for enabling reliable packet transmission in a network using a set of network accelerator devices attached to a switch port.
  • the network accelerator device is used for identifying a data transmission, copying data packets from the data transmission into the memory buffer and in response to at least one of a missing data packet and a corrupt data packet identified during the data transmission, sending a copied data packet corresponding to the at least one of the missing data packet and the corrupt data packet.
  • computer program product is utilized for managing a memory buffer in a network device.
  • the US 882 Publication does not utilize DHTs (Distributed Hash Tables) and peer to peer (P2P) overlay network to provide long term fault tolerance as compared to the present invention which involves replication of computational tasks and data items in the DHT (Distributed Hash Table )-based peer to peer (P2P) overlay network with small overhead.
  • DHTs Distributed Hash Tables
  • P2P peer to peer
  • a peer to peer (P2P) replication based on a DHT was proposed in an IEEE paper entitled "A Peer-to-Peer Replica Location Service Based on A Distributed Hash Table" authored by Min Chai, Ann Chervenak and Martin Frank.
  • peer to peer (P2P) replication based on a DHT is based on Replica Location Service (RLS) which allows registration and discovery of data replicas.
  • RLS Replica Location Service
  • Chord algorithm which uses consistent hashing and virtual nodes is utilized to balance the number of keys stored on each node; wherein the number of routing hops required to search for a key is 0(log N).
  • the said paper utilizes Peer-to-Peer Replica Location Service (P-RLS) design that uses the overlay network of the Chord peer-to-peer system to self-organize Peer-to- Peer Replica Location Service (P-RLS) servers.
  • P-RLS Peer-to-Peer Replica Location Service
  • the present invention extends the Application Agent (AA) which initiates deployment of virtual machines (VMs) based on task requirement to provide degrees of fault tolerance through replication of computational task and datasets during runtime.
  • AA Application Agent
  • the present invention relates to a system and method of fault tolerant for distributed applications in a virtualized environment.
  • the invention relates to systems and methods that utilize Application Agent (AA) of the application Peer-to-Peer (P2P) overlay network.
  • AA Application Agent
  • P2P Peer-to-Peer
  • One aspect of the present invention provides a system (200) of fault tolerant for distributed applications in a virtualized environment.
  • the system comprising at least one application Peer-to-Peer (P2P) overlay network; said application Peer-to-Peer (P2P) overlay network comprises at least one Application Agent (AA) (210); and a plurality of Virtual Machines (VM) (214, 216).
  • P2P application Peer-to-Peer
  • AA Application Agent
  • VM Virtual Machines
  • the at least one Application Agent (AA) (210) having means for configuring new overlay of application Peer-to-Peer (P2P) overlay network; configuring at least one sub-overlay of Registration Task DHT (Distributed Hash Table) into Application Agent's (AA's) overlay network, registering and inserting completed task to DHT (Distributed Hash Table); replicating output and distributing completed task to said plurality of Virtual Machines; and removing completed task from Task Registration DHT (Distributed Hash Table) table.
  • Another aspect the invention provides a method of fault tolerant for distributed applications in a virtualized environment.
  • the method comprising steps of pre- deployment of Virtual Machine (VM) images (402); spawning tasks during execution of application (404); allocating computational task to Virtual Machines (VM) (406); registering completed task (408); and retrieving output data (410).
  • the step of pre- deployment of Virtual Machine (VM) images further comprising steps of executing application by User by invoking at least one Application Agent (502); contacting nearest front end node by Application Agent (AA) (504); requesting deployment of Virtual Machines by Application Agent (AA) based on task requirement upon receipt of response from front end node (506); forming a structured overlay network based on Virtual Machines allocated by said front end node (508) and tracking status of Virtual Machines by Application Agent (AA) (510).
  • step for spawning tasks during execution of application which further comprising replicating computational tasks and data items in DHT (Distributed Hash Table)-based peer to peer (P2P) overlay network with small overhead.
  • step for replicating computational tasks and data items in DHT (Distributed Hash Table)-based peer to peer (P2P) overlay network with small overhead is provided.
  • the said step further comprises steps of reporting increase in processing at runtime by spawning new tasks by Application Agent (602); identifying resource requirements by Application Agent (AA) (604); determining if peer to peer (P2P) overlay network has been created by Application Agent (AA) (606); creating new Registration Task DHT overlay network by Application Agent (AA) if peer to peer (P2P) overlay network has not been created by Application Agent (AA) (608); inserting spawned task to Registration Task DHT (610) upon confirmation that peer to peer (P2P) overlay network has been created by Application Agent (AA); and upon creation of new Registration Task DHT overlay network by Application Agent (AA); replicating computational task and distributing said task to a plurality of Virtual Machines (612); and allocating computational tasks to Virtual Machines, monitoring status of said Virtual Machines and scheduling task on Virtual Machines by Application Agent (AA) (614).
  • the said step further comprises steps of determining if primary task is available (704); running task on primary Virtual Machine (VM) if task is available (710); utilizing DHT (Distributed Hash Table)- based peer to peer (P2P) lookup table to lookup alternative task from overlay of peer to peer (P2P) network (706); determining if task is located (708); running task on alternative Virtual Machine (VM) if task is located (712); and deploying new Virtual Machine (VM) and scheduling of new task for execution by Application Agent (AA) (714).
  • DHT Distributed Hash Table
  • P2P peer to peer
  • the step for registering completed task further comprises steps of identifying task completion by Application Agent (AA) (802); determining if output data of DHT (Distributed Hash Table) has been created for Application Agent (AA) (804); creating output data DHT (Distributed Hash Table) overlay network by Application Agent (AA) (806); inserting completed task to output data DHT (Distributed Hash Table) (808) upon confirmation that output data DHT (Distributed Hash Table) has been created for Application Agent (AA) and upon creation of output data DHT (Distributed Hash Table) overlay network by Application Agent (AA); replicating output and distributing said completed task to a plurality of Virtual Machines (810); and removing completed task from Task Registration DHT (Distributed Hash Table) table (812).
  • the step for retrieving output data further comprises steps of determining if primary data output is available (904); collecting data from primary Virtual Machine if primary output data is available (906); utilizing DHT (Distributed Hash Table)-based peer to peer (P2P) lookup table to lookup alternative task from output data DHT (Distributed Hash Table) (908); determining if data output is located (910); collecting output data from Virtual Machine if data output is located (912); and retrieving computational task from Registration Task DHT and scheduling new task by Application Agent (AA) (914).
  • DHT Distributed Hash Table
  • P2P peer to peer
  • FIG. 1.0 illustrates the architecture of the present invention.
  • FIG. 2.0 illustrates the architecture of the present invention against present solution of a prior art.
  • FIG. 3.0 illustrates the block diagram of the components of the present invention which includes an application-specific structured DHT-based P2P overlay routing.
  • FIG. 4.0 is a flowchart illustrating the methodology of fault tolerant for distributed applications in a virtualized environment of the present invention.
  • FIG. 5.0 is a flowchart illustrating the steps of pre-deployment of Virtual Machine (VM) images of the present invention.
  • VM Virtual Machine
  • FIG. 6.0 is a flowchart illustrating the steps of replicating computational tasks and data items in a DHT (Distributed Hash Table)-based peer to peer (P2P) overlay network of the present invention.
  • DHT Distributed Hash Table
  • P2P peer to peer
  • FIG. 7.0 is a flowchart illustrating the steps of allocating computational tasks to Virtual Machines, monitoring status of said Virtual Machines and scheduling task on Virtual Machines by Application Agent (AA) of the present invention.
  • FIG. 8.0 is a flowchart illustrating the steps of registering completed task of the present invention.
  • FIG. 9.0 is a flowchart illustrating the steps for retrieving output data of the present invention.
  • the present invention provides a system and method of fault tolerant for distributed applications in a virtualized environment.
  • the invention relates to systems and methods that utilize Application Agent (AA) of the application Peer-to-Peer (P2P) overlay network.
  • AA Application Agent
  • P2P Peer-to-Peer
  • the system (100, 200, 300) provides for an application Peer-to- Peer (P2P) overlay network wherein said application Peer-to-Peer (P2P) overlay network comprises one Application Agent (AA) (210); and Virtual Machines (VM) (214, 216).
  • P2P application Peer-to- Peer
  • AA Application Agent
  • VM Virtual Machines
  • the Application Agent (AA) (210) having means for configuring new overlay of application Peer-to-Peer (P2P) overlay network; configuring at least one sub- overlay of Registration Task DHT (Distributed Hash Table) into Application Agent's (AA's) overlay network, registering and inserting completed task to DHT (Distributed Hash Table); replicating output and distributing completed task to said plurality of Virtual Machines; and removing completed task from Task Registration DHT (Distributed Hash Table) table.
  • the Application Agent (AA) provides fault tolerance during execution of application through computational replication and data replication.
  • Each new task spawned by master is replicated in accordance to a DHT (Distributed Hash Table) table- based structured peer to peer (P2P) network) with small overhead during computational replication.
  • DHT Distributed Hash Table
  • P2P peer to peer
  • each output data is stored on virtual machines (VMs) together with the processing in parallel to ensure successful execution of the application with small overhead.
  • the invention includes the steps of pre-deployment of Virtual Machine (VM) images by executing application by User by invoking Application Agent (502) and thereafter the Application Agent (AA) contact the nearest front end node (504).
  • the Application Agent (AA) request for deployment of virtual machines (VMs) based on task requirement upon receipt of response from front end node (506).
  • a structured overlay network is formed based on virtual machines allocated by front end node (508).
  • the Application Agent (AA) tracks the status of virtual machines (VMs).
  • VM virtual machine
  • tasks are spawn during execution of application (404) by replicating computational tasks and data items in DHT (Distributed Hash Table)-based peer to peer (P2P) overlay network with small overhead. Thereafter computational tasks are allocated to virtual machines (VMs) (406). Completed task are registered accordingly (408) upon successful allocation of said tasks. Further, the Application Agent (AA) retrieves output data of each completed task.
  • DHT Distributed Hash Table
  • P2P peer to peer
  • FIG. 6.0 A more detailed description of replication of computational tasks and data items in DHT (Distributed Hash Table)-based peer to peer (P2P) overlay network with small overhead is illustrated in FIG. 6.0.
  • DHT Distributed Hash Table
  • P2P peer to peer
  • spawned task Upon creating new Registration Task DHT (Distributed Hash Table) overlay network, spawned task is inserted to Registration Task DHT (Distributed Hash Table) (610).
  • Registration Task DHT Distributionted Hash Table
  • spawned task is inserted directly to Registration Task DHT (Distributed Hash Table) (610).
  • spawned task is inserted directly to Registration Task DHT (Distributed Hash Table) (610).
  • computational tasks and data items are replicated and distributed to virtual machines (VMs) (612).
  • the said computational tasks are allocated to virtual machines (VMs) and status of said virtual machines (VMs) are monitored accordingly (614). Tasks are accordingly scheduled on virtual machines (VMs) by the Application Agent (AA) (616).
  • the method for allocating computational tasks to virtual machines involves task assignment from the Application Agent (AA) to the actual virtual machine (VM).
  • VM virtual machine
  • FIG. 7.0 A more detailed description for allocation of computational tasks to virtual machines (VMs) is illustrated in FIG. 7.0.
  • the Application Agent (AA) schedules computational tasks (702) by first determining if primary task is available (704). Upon availability of primary task, task is run on primary virtual machine (VM) (710). If primary task is not available, DHT (Distributed Hash Table)-based peer to peer (P2P) lookup table is utilized to lookup alternative task from overlay of peer to peer (P2P) network (706) to determine if task is located (708).
  • DHT Distributed Hash Table
  • P2P peer to peer
  • lookup table is utilized to lookup alternative task from overlay of peer to peer (P2P) network (706) to determine if task is located (708).
  • VM alternative virtual machine
  • Application Agent (AA) deploys new virtual machine (VM) and schedules
  • the method to register completed task is further illustrated in FIG. 8.0.
  • the method involves replication of result and output data to peer to peer (P2P) structured overlay network upon task completion to ensure that Application Agent (AA) is able to retrieve results during network failure and machine crash.
  • Application Agent (AA) identifies task completion (802) and determines if output data DHT (Distributed Hash Table) has been created for the said Application Agent (AA).
  • output data DHT distributed Hash Table
  • completed task is inserted to output data DHT (Distributed Hash Table) (808).
  • output data DHT distributed Hash Table
  • Application Agent (AA) first creates output data DHT (Distributed Hash Table) overlay network (806) before proceeding to insert completed task to output data DHT (Distributed Hash Table) (808). Thereafter, output data is replicated and distributed to multiple virtual machines (810). Upon replication of output data, completed tasks are removed from Task Registration DHT (Distributed Hash Table) table (812).
  • FIG. 9.0 Application Agent (AA) retrieves output data for application (902) by first determining if primary data output is available (904). Data is directly collected from primary virtual machine (VM) if primary data output is available (906).
  • DHT Distributed Hash Table
  • P2P peer to peer
  • lookup table is utilize to lookup alternative task from output data DHT (Distributed Hash Table) (908). It is further determined if data output is located (910). Upon locating data output, said data output is collected from virtual machine (VM) (912). Thereafter computational task is retrieved from Registration Task DHT (Distributed Hash Table) and new task is scheduled by Application Agent (AA) (914).
  • VM virtual machine
  • AA Application Agent
  • the present invention provides for pre-deployment of virtual machine (VM) images which enables Application Agent (AA) to initiate deployment of virtual machines (VMs) based on task requirement and tracking the deployment of the status of virtual machine (VM).
  • VM virtual machine
  • DHTs Distributed Hash Tables

Abstract

A system and method of fault tolerant for distributed applications in a virtualized environment is provided by utilizing Application Agent (AA) of the application Peer-to-Peer (P2P) overlay network. The system and method of the present invention includes the steps of pre-deployment of Virtual Machine (VM) images by executing application by User by invoking the Application Agent (502) and the Application Agent (AA) contact the nearest front end node (504). The Application Agent (AA) request for deployment of virtual machines (VMs) based on task requirement upon receipt of response from front end node (506). Thereafter, a structured overlay network is formed based on virtual machines (VMs) allocated by the front end node (508) and Application Agent (AA) further tracks the status of virtual machines (VMs). Upon successful deployment of virtual machine (VM) images, tasks are spawn during execution of application (404) by replicating computational tasks and data items in DHT (Distributed Hash Table)-based peer to peer (P2P) overlay network with small overhead. Computational tasks are allocated to virtual machines (VMs) (406) and completed task are registered accordingly (408) upon successful allocation of said tasks. Further, the Application Agent (AA) retrieves output data of each completed task. Pre-deployment of virtual machine (VM) images enables Application Agent (AA) to initiate deployment of virtual machines (VMs) based on task requirement and tracking the deployment of VM status. Further, Distributed Hash Tables (DHTs) are leverage to provide long-term fault tolerance which enables remote computational steering without advance reservation.

Description

A SYSTEM AND METHOD OF FAULT TOLERANT FOR DISTRIBUTED APPLICATIONS IN A VIRTUALIZED ENVIRONMENT
FIELD OF INVENTION
The present invention relates to a system and method of fault tolerant for distributed applications in a virtualized environment. In particular, the invention relates to systems and methods that utilize Application Agent (AA) of the application Peer-to-Peer (P2P) overlay network.
BACKGROUND ART
Fault-tolerance is particularly sought-after in critical distributed applications that require interactive visualization and remote computational steering (e.g., finance, video security, simulations etc.) The lack of capability of fault-tolerance for such critical applications may cause performance problems and may stop execution of application in a virtualized environment. Presently, there are no fault tolerance strategies being employed for distributed applications running in a virtualized environment e.g., cloud systems. Existing mechanisms to provide solutions for lack of fault tolerance strategies are based on advance reservation and workflow. However, these approaches are impractical as users are required to specify exactly how long each task of the job runs. If task runs longer than its prescheduled time, it has to be rescheduled by the Resource Manager. Further, fault-tolerance from machine and/or network crash is possible as it is provided at the expense of re-scheduling resources for completion of the job.
One example of resource allocation management in interactive grid computing systems was proposed in United States Patent Publication No. US 2005/0027863 A1 hereby denoted as the US 863 Publication, which provides a method to deploy application tasks over multiple machines while ensuring fault tolerance. Deployment of application tasks over multiple machines is based on distributed resource management node, a contract generation engine, and a contract repository. A contract is generated for interactive session specifying resource allocations and authorizations and resources are allocated in accordance to the contract. In contrast, the present invention involves replication of computational tasks and data items in a DHT (Distributed Hash Table) based peer to peer (P2P) overlay network with small overhead by utilizing an Application Agent (AA) in the peer to peer (P2P) overlay network.
Another mechanism was proposed in the United States Patent Publication No. US 2010/0262882 A1 , hereby denoted as US 882 Publication. It relates generally to network layer optimization through the use of network accelerator devices (NAD), and particularly to methods, systems and computer program products for enabling reliable packet transmission in a network using a set of network accelerator devices attached to a switch port. The network accelerator device is used for identifying a data transmission, copying data packets from the data transmission into the memory buffer and in response to at least one of a missing data packet and a corrupt data packet identified during the data transmission, sending a copied data packet corresponding to the at least one of the missing data packet and the corrupt data packet. Further, computer program product is utilized for managing a memory buffer in a network device. The US 882 Publication does not utilize DHTs (Distributed Hash Tables) and peer to peer (P2P) overlay network to provide long term fault tolerance as compared to the present invention which involves replication of computational tasks and data items in the DHT (Distributed Hash Table )-based peer to peer (P2P) overlay network with small overhead.
A peer to peer (P2P) replication based on a DHT (Distributed Hash Table) was proposed in an IEEE paper entitled "A Peer-to-Peer Replica Location Service Based on A Distributed Hash Table" authored by Min Chai, Ann Chervenak and Martin Frank. In the said paper, peer to peer (P2P) replication based on a DHT (Distributed Hash Table) is based on Replica Location Service (RLS) which allows registration and discovery of data replicas. Further, Chord algorithm which uses consistent hashing and virtual nodes is utilized to balance the number of keys stored on each node; wherein the number of routing hops required to search for a key is 0(log N). In brief, the said paper utilizes Peer-to-Peer Replica Location Service (P-RLS) design that uses the overlay network of the Chord peer-to-peer system to self-organize Peer-to- Peer Replica Location Service (P-RLS) servers. In contrast, the present invention extends the Application Agent (AA) which initiates deployment of virtual machines (VMs) based on task requirement to provide degrees of fault tolerance through replication of computational task and datasets during runtime.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.
SUMMARY OF INVENTION
The present invention relates to a system and method of fault tolerant for distributed applications in a virtualized environment. In particular, the invention relates to systems and methods that utilize Application Agent (AA) of the application Peer-to-Peer (P2P) overlay network.
One aspect of the present invention provides a system (200) of fault tolerant for distributed applications in a virtualized environment. The system comprising at least one application Peer-to-Peer (P2P) overlay network; said application Peer-to-Peer (P2P) overlay network comprises at least one Application Agent (AA) (210); and a plurality of Virtual Machines (VM) (214, 216). The at least one Application Agent (AA) (210) having means for configuring new overlay of application Peer-to-Peer (P2P) overlay network; configuring at least one sub-overlay of Registration Task DHT (Distributed Hash Table) into Application Agent's (AA's) overlay network, registering and inserting completed task to DHT (Distributed Hash Table); replicating output and distributing completed task to said plurality of Virtual Machines; and removing completed task from Task Registration DHT (Distributed Hash Table) table. Another aspect the invention provides a method of fault tolerant for distributed applications in a virtualized environment. The method comprising steps of pre- deployment of Virtual Machine (VM) images (402); spawning tasks during execution of application (404); allocating computational task to Virtual Machines (VM) (406); registering completed task (408); and retrieving output data (410). The step of pre- deployment of Virtual Machine (VM) images further comprising steps of executing application by User by invoking at least one Application Agent (502); contacting nearest front end node by Application Agent (AA) (504); requesting deployment of Virtual Machines by Application Agent (AA) based on task requirement upon receipt of response from front end node (506); forming a structured overlay network based on Virtual Machines allocated by said front end node (508) and tracking status of Virtual Machines by Application Agent (AA) (510). In another aspect of the invention there is provided the step for spawning tasks during execution of application which further comprising replicating computational tasks and data items in DHT (Distributed Hash Table)-based peer to peer (P2P) overlay network with small overhead. In yet another aspect of the invention is the step for replicating computational tasks and data items in DHT (Distributed Hash Table)-based peer to peer (P2P) overlay network with small overhead. The said step further comprises steps of reporting increase in processing at runtime by spawning new tasks by Application Agent (602); identifying resource requirements by Application Agent (AA) (604); determining if peer to peer (P2P) overlay network has been created by Application Agent (AA) (606); creating new Registration Task DHT overlay network by Application Agent (AA) if peer to peer (P2P) overlay network has not been created by Application Agent (AA) (608); inserting spawned task to Registration Task DHT (610) upon confirmation that peer to peer (P2P) overlay network has been created by Application Agent (AA); and upon creation of new Registration Task DHT overlay network by Application Agent (AA); replicating computational task and distributing said task to a plurality of Virtual Machines (612); and allocating computational tasks to Virtual Machines, monitoring status of said Virtual Machines and scheduling task on Virtual Machines by Application Agent (AA) (614).
In still another aspect of the invention there is provided with the step for allocating computational tasks to Virtual Machines, monitoring status of said Virtual Machines and scheduling task on Virtual Machines by Application Agent (AA). The said step further comprises steps of determining if primary task is available (704); running task on primary Virtual Machine (VM) if task is available (710); utilizing DHT (Distributed Hash Table)- based peer to peer (P2P) lookup table to lookup alternative task from overlay of peer to peer (P2P) network (706); determining if task is located (708); running task on alternative Virtual Machine (VM) if task is located (712); and deploying new Virtual Machine (VM) and scheduling of new task for execution by Application Agent (AA) (714).
In a further aspect of the invention there is provided with the step for registering completed task. The said step further comprises steps of identifying task completion by Application Agent (AA) (802); determining if output data of DHT (Distributed Hash Table) has been created for Application Agent (AA) (804); creating output data DHT (Distributed Hash Table) overlay network by Application Agent (AA) (806); inserting completed task to output data DHT (Distributed Hash Table) (808) upon confirmation that output data DHT (Distributed Hash Table) has been created for Application Agent (AA) and upon creation of output data DHT (Distributed Hash Table) overlay network by Application Agent (AA); replicating output and distributing said completed task to a plurality of Virtual Machines (810); and removing completed task from Task Registration DHT (Distributed Hash Table) table (812).
In another aspect of the invention there is provided with the step for retrieving output data. The said step further comprises steps of determining if primary data output is available (904); collecting data from primary Virtual Machine if primary output data is available (906); utilizing DHT (Distributed Hash Table)-based peer to peer (P2P) lookup table to lookup alternative task from output data DHT (Distributed Hash Table) (908); determining if data output is located (910); collecting output data from Virtual Machine if data output is located (912); and retrieving computational task from Registration Task DHT and scheduling new task by Application Agent (AA) (914).
The present invention consists of features and a combination of parts hereinafter fully described and illustrated in the accompanying drawings, it being understood that various changes in the details may be made without departing from the scope of the invention or sacrificing any of the advantages of the present invention. BRIEF DESCRIPTION OF ACCOMPANYING DRAWINGS
To further clarify various aspects of some embodiments of the present invention, a more particular description of the invention will be rendered by references to specific embodiments thereof, which are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the accompanying drawings in which: FIG. 1.0 illustrates the architecture of the present invention.
FIG. 2.0 illustrates the architecture of the present invention against present solution of a prior art. FIG. 3.0 illustrates the block diagram of the components of the present invention which includes an application-specific structured DHT-based P2P overlay routing.
FIG. 4.0 is a flowchart illustrating the methodology of fault tolerant for distributed applications in a virtualized environment of the present invention.
FIG. 5.0 is a flowchart illustrating the steps of pre-deployment of Virtual Machine (VM) images of the present invention.
FIG. 6.0 is a flowchart illustrating the steps of replicating computational tasks and data items in a DHT (Distributed Hash Table)-based peer to peer (P2P) overlay network of the present invention.
FIG. 7.0 is a flowchart illustrating the steps of allocating computational tasks to Virtual Machines, monitoring status of said Virtual Machines and scheduling task on Virtual Machines by Application Agent (AA) of the present invention.
FIG. 8.0 is a flowchart illustrating the steps of registering completed task of the present invention. FIG. 9.0 is a flowchart illustrating the steps for retrieving output data of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present invention provides a system and method of fault tolerant for distributed applications in a virtualized environment. In particular, the invention relates to systems and methods that utilize Application Agent (AA) of the application Peer-to-Peer (P2P) overlay network.
Hereinafter, this specification will describe the present invention according to the preferred embodiments. It is to be understood that limiting the description to the preferred embodiments of the invention is merely to facilitate discussion of the present invention and it is envisioned without departing from the scope of the appended claims.
Referring to FIG. 1.0, 2.0 and 3.0, the system (100, 200, 300) according to the present invention is illustrated. The system (100, 200, 300) provides for an application Peer-to- Peer (P2P) overlay network wherein said application Peer-to-Peer (P2P) overlay network comprises one Application Agent (AA) (210); and Virtual Machines (VM) (214, 216). The Application Agent (AA) (210) having means for configuring new overlay of application Peer-to-Peer (P2P) overlay network; configuring at least one sub- overlay of Registration Task DHT (Distributed Hash Table) into Application Agent's (AA's) overlay network, registering and inserting completed task to DHT (Distributed Hash Table); replicating output and distributing completed task to said plurality of Virtual Machines; and removing completed task from Task Registration DHT (Distributed Hash Table) table. The Application Agent (AA) provides fault tolerance during execution of application through computational replication and data replication. Each new task spawned by master is replicated in accordance to a DHT (Distributed Hash Table) table- based structured peer to peer (P2P) network) with small overhead during computational replication. As for data replication of results, each output data is stored on virtual machines (VMs) together with the processing in parallel to ensure successful execution of the application with small overhead.
Referring to FIG. 4.0 and 5.0, an embodiment of the method (400, 500) of the invention is illustrated. Generally, the invention includes the steps of pre-deployment of Virtual Machine (VM) images by executing application by User by invoking Application Agent (502) and thereafter the Application Agent (AA) contact the nearest front end node (504). The Application Agent (AA) request for deployment of virtual machines (VMs) based on task requirement upon receipt of response from front end node (506). Thereafter, a structured overlay network is formed based on virtual machines allocated by front end node (508). Further, the Application Agent (AA) tracks the status of virtual machines (VMs). Upon successful deployment of virtual machine (VM) images, tasks are spawn during execution of application (404) by replicating computational tasks and data items in DHT (Distributed Hash Table)-based peer to peer (P2P) overlay network with small overhead. Thereafter computational tasks are allocated to virtual machines (VMs) (406). Completed task are registered accordingly (408) upon successful allocation of said tasks. Further, the Application Agent (AA) retrieves output data of each completed task.
A more detailed description of replication of computational tasks and data items in DHT (Distributed Hash Table)-based peer to peer (P2P) overlay network with small overhead is illustrated in FIG. 6.0. In order to replicate computational tasks during execution of application, an increase in processing at runtime is reported by spawning new tasks by Application Agent (602). Thereafter, resource requirements are identified by Application Agent (AA) (604) to determine if peer to peer (P2P) overlay network has been created by Application Agent (AA) (606). Application Agent (AA) proceeds to create new Registration Task DHT (Distributed Hash Table) overlay network upon confirmation that no peer to peer (P2P) overlay network has been created by Application Agent (AA) (608). Upon creating new Registration Task DHT (Distributed Hash Table) overlay network, spawned task is inserted to Registration Task DHT (Distributed Hash Table) (610). In a scenario where peer to peer (P2P) overlay network is confirmed to be created by Application Agent (AA), spawned task is inserted directly to Registration Task DHT (Distributed Hash Table) (610). Upon insertion of spawned task to Registration Task DHT (Distributed Hash Table), computational tasks and data items are replicated and distributed to virtual machines (VMs) (612). The said computational tasks are allocated to virtual machines (VMs) and status of said virtual machines (VMs) are monitored accordingly (614). Tasks are accordingly scheduled on virtual machines (VMs) by the Application Agent (AA) (616). The method for allocating computational tasks to virtual machines (VMs) involves task assignment from the Application Agent (AA) to the actual virtual machine (VM). A more detailed description for allocation of computational tasks to virtual machines (VMs) is illustrated in FIG. 7.0. Upon allocation of computational tasks to virtual machines (VMs), the Application Agent (AA) schedules computational tasks (702) by first determining if primary task is available (704). Upon availability of primary task, task is run on primary virtual machine (VM) (710). If primary task is not available, DHT (Distributed Hash Table)-based peer to peer (P2P) lookup table is utilized to lookup alternative task from overlay of peer to peer (P2P) network (706) to determine if task is located (708). Upon locating available task, task is run on alternative virtual machine (VM) (712). If task is still not found, Application Agent (AA) deploys new virtual machine (VM) and schedules new task for execution (714).
The method to register completed task is further illustrated in FIG. 8.0. The method involves replication of result and output data to peer to peer (P2P) structured overlay network upon task completion to ensure that Application Agent (AA) is able to retrieve results during network failure and machine crash. As illustrated in FIG. 8.0, Application Agent (AA) identifies task completion (802) and determines if output data DHT (Distributed Hash Table) has been created for the said Application Agent (AA). Upon confirmation that output data DHT (Distributed Hash Table) has been created for the said Application Agent (AA), completed task is inserted to output data DHT (Distributed Hash Table) (808). However, if output data DHT (Distributed Hash Table) has not been created for the Application Agent (AA), Application Agent (AA) first creates output data DHT (Distributed Hash Table) overlay network (806) before proceeding to insert completed task to output data DHT (Distributed Hash Table) (808). Thereafter, output data is replicated and distributed to multiple virtual machines (810). Upon replication of output data, completed tasks are removed from Task Registration DHT (Distributed Hash Table) table (812). A more detailed description to retrieve output data is illustrated in FIG. 9.0 wherein Application Agent (AA) retrieves output data for application (902) by first determining if primary data output is available (904). Data is directly collected from primary virtual machine (VM) if primary data output is available (906). In a scenario where primary output data is not available, DHT (Distributed Hash Table)-based peer to peer (P2P) lookup table is utilize to lookup alternative task from output data DHT (Distributed Hash Table) (908). It is further determined if data output is located (910). Upon locating data output, said data output is collected from virtual machine (VM) (912). Thereafter computational task is retrieved from Registration Task DHT (Distributed Hash Table) and new task is scheduled by Application Agent (AA) (914).
In short, the present invention provides for pre-deployment of virtual machine (VM) images which enables Application Agent (AA) to initiate deployment of virtual machines (VMs) based on task requirement and tracking the deployment of the status of virtual machine (VM). Further, Distributed Hash Tables (DHTs) are leverage to provide long- term fault tolerance which enables remote computational steering without advance reservation.
Unless the context requires otherwise or specifically stated to the contrary, integers, steps or elements of the invention recited herein as singular integers, steps or elements clearly encompass both singular and plural forms of the recited integers, steps or elements.
Throughout this specification, unless the context requires otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated step or element or integer or group of steps or elements or integers, but not the exclusion of any other step or element or integer or group of steps, elements or integers. Thus, in the context of this specification, the term "comprising" is used in an inclusive sense and thus should be understood as meaning "including principally, but not necessarily solely".
It will be appreciated that the foregoing description has been given by way of illustrative example of the invention and that all such modifications and variations thereto as would be apparent to persons of skill in the art are deemed to fall within the broad scope and ambit of the invention as herein set forth.

Claims

1. A system (200) of fault tolerant for distributed applications in a virtualized environment, the system comprising at least one application Peer-to-Peer overlay network; said application Peer-to-Peer overlay network comprises:
at least one Application Agent (210); and
a plurality of Virtual Machines (214, 216)
characterized in that
the at least one Application Agent (210) having means for:
configuring new overlay of application Peer-to-Peer overlay network;
configuring at least one sub-overlay of Registration Task
Distributed Hash Table into Application Agent's overlay network, registering and inserting completed task to Distributed Hash Table;
replicating output and distributing completed task to said plurality of Virtual Machines; and
removing completed task from Task Registration Distributed Hash Table.
2. A method (400) of fault tolerant for distributed applications in a virtualized environment, the method comprising steps of:
pre-deployment of Virtual Machine images (402);
spawning tasks during execution of application (404);
allocating computational task to Virtual Machines (406);
registering completed task (408); and
retrieving output data (410)
characterized in that
pre-deployment of Virtual Machine images further comprising steps of:
executing application by User by invoking at least one
Application Agent (502);
contacting nearest front end node by Application Agent (504); requesting deployment of Virtual Machines by Application Agent based on task requirement upon receipt of response from front end node (506);
forming a structured overlay network based on Virtual Machines allocated by said front end node (508); and tracking status of Virtual Machines by Application Agent (510).
A method according to Claim 2, wherein spawning tasks during execution of application further comprising replicating computational tasks and data items in Distributed Hash Table based peer to peer (P2P) overlay network with small overhead.
A method (600) according to Claim 3, wherein replicating computational tasks and data items in Distributed Hash Table based peer to peer overlay network with small overhead further comprises steps of:
reporting increase in processing at runtime by spawning new tasks by
Application Agent (602);
identifying resource requirements by Application Agent (604);
determining if peer to peer overlay network has been created by Application Agent (606);
creating new Registration Task Distributed Hash Table overlay network by Application Agent if peer to peer overlay network has not been created by Application Agent (608);
inserting spawned task to Registration Task Distributed Hash Table (610) upon confirmation that peer to peer overlay network has been created by Application Agent; and
upon creation of new Registration Task Distributed Hash Table overlay network by Application Agent
replicating computational task and distributing said task to a plurality of Virtual Machines (612); and
allocating computational tasks to Virtual Machines, monitoring status of said Virtual Machines and scheduling task on Virtual Machines by Application Agent (614). A method (700) according to Claim 4, wherein allocating computational tasks to Virtual Machines, monitoring status of said Virtual Machines and scheduling task on Virtual Machines by Application Agent further comprises steps of:
determining if primary task is available (704);
running task on primary Virtual Machine if task is available (710);
utilizing Distributed Hash Table based peer to peer lookup table to lookup alternative task from overlay of peer to peer network (706);
determining if task is located (708);
running task on alternative Virtual Machine if task is located (712); and deploying new Virtual Machine and scheduling of new task for execution by Application Agent (714).
A method (800) according to Claim 2, wherein registering completed task further comprises steps of:
identifying task completion by Application Agent (802);
determining if output data of Distributed Hash Table has been created for Application Agent (804);
creating output data Distributed Hash Table overlay network by Application Agent (806);
inserting completed task to output data Distributed Hash Table (808) upon confirmation that output data Distributed Hash Table has been created for Application Agent; and
upon creation of output data Distributed Hash Table overlay network by Application Agent
replicating output and distributing said completed task to a plurality of
Virtual Machines (810); and
removing completed task from Task Registration Distributed Hash Table table (812).
A method (900) according to Claim 2, wherein retrieving output data further comprises steps of:
determining if primary data output is available (904);
collecting data from primary Virtual Machine if primary output data is available (906); utilizing Distributed Hash Table based peer to peer lookup table to lookup alternative task from output data Distributed Hash Table (908);
determining if data output is located (910);
collecting output data from Virtual Machine if data output is
located (912); and
retrieving computational task from Registration Task Distributed Hash Table and scheduling new task by Application Agent (914).
PCT/MY2014/000035 2013-04-16 2014-03-18 A system and method of fault tolerant for distributed applications in a virtualized environment WO2014171810A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2013001351A MY184239A (en) 2013-04-16 2013-04-16 A system and method of fault tolerance for distributed applications in a virtualized environment
MYPI2013001351 2013-04-16

Publications (2)

Publication Number Publication Date
WO2014171810A2 true WO2014171810A2 (en) 2014-10-23
WO2014171810A3 WO2014171810A3 (en) 2015-07-16

Family

ID=50729750

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2014/000035 WO2014171810A2 (en) 2013-04-16 2014-03-18 A system and method of fault tolerant for distributed applications in a virtualized environment

Country Status (2)

Country Link
MY (1) MY184239A (en)
WO (1) WO2014171810A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005506A (en) * 2015-07-17 2015-10-28 中国人民解放军国防科学技术大学 Virtual cloud fault-tolerant resource supply method
WO2016195562A1 (en) * 2015-06-03 2016-12-08 Telefonaktiebolaget Lm Ericsson (Publ) Allocating or announcing availability of a software container
CN109885533A (en) * 2019-02-22 2019-06-14 深圳市网心科技有限公司 A kind of data deployment method based on DHT network, node device, data deployment system and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050027863A1 (en) 2003-07-31 2005-02-03 Vanish Talwar Resource allocation management in interactive grid computing systems
US20100262882A1 (en) 2009-04-13 2010-10-14 International Business Machines Corporation Protocols for high performance computing visualization, computational steering and forward progress

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050027863A1 (en) 2003-07-31 2005-02-03 Vanish Talwar Resource allocation management in interactive grid computing systems
US20100262882A1 (en) 2009-04-13 2010-10-14 International Business Machines Corporation Protocols for high performance computing visualization, computational steering and forward progress

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016195562A1 (en) * 2015-06-03 2016-12-08 Telefonaktiebolaget Lm Ericsson (Publ) Allocating or announcing availability of a software container
US10528379B2 (en) 2015-06-03 2020-01-07 Telefonaktiebolaget Lm Ericsson (Publ) Allocating or announcing availability of a software container
CN105005506A (en) * 2015-07-17 2015-10-28 中国人民解放军国防科学技术大学 Virtual cloud fault-tolerant resource supply method
CN105005506B (en) * 2015-07-17 2017-11-10 中国人民解放军国防科学技术大学 Fault-tolerant resource provision method in one kind virtualization cloud
CN109885533A (en) * 2019-02-22 2019-06-14 深圳市网心科技有限公司 A kind of data deployment method based on DHT network, node device, data deployment system and storage medium

Also Published As

Publication number Publication date
WO2014171810A3 (en) 2015-07-16
MY184239A (en) 2021-03-29

Similar Documents

Publication Publication Date Title
EP3356937B1 (en) Distributed stream-based database triggers
Almeida et al. ChainReaction: a causal+ consistent datastore based on chain replication
US8381015B2 (en) Fault tolerance for map/reduce computing
Abbes et al. A decentralized and fault‐tolerant Desktop Grid system for distributed applications
EP3539261B1 (en) System and method for network-scale reliable parallel computing
Sun et al. Key Technologies for Big Data Stream Computing.
Rouzaud-Cornabas A distributed and collaborative dynamic load balancer for virtual machine
US20140245077A1 (en) Providing high availability for state-aware applications
Birman et al. The Virtual Synchrony Execution Model
US9921878B1 (en) Singleton coordination in an actor-based system
Mohamed et al. MidCloud: an agent‐based middleware for effective utilization of replicated Cloud services
Meroufel et al. Optimization of checkpointing/recovery strategy in cloud computing with adaptive storage management
WO2014171810A2 (en) A system and method of fault tolerant for distributed applications in a virtualized environment
Costa et al. Large-scale volunteer computing over the Internet
e Silva et al. Application execution management on the InteGrade opportunistic grid middleware
US9348672B1 (en) Singleton coordination in an actor-based system
Bouabache et al. Hierarchical replication techniques to ensure checkpoint storage reliability in grid environment
Gankevich et al. Factory: master node high-availability for big data applications and beyond
Limam et al. A self-adaptive conflict resolution with flexible consistency guarantee in the cloud computing
Sumangali et al. Advanced cloud fault tolerance system
Caban et al. Dependability Analysis of Systems Based on the Microservice Architecture
Cogorno et al. Fault tolerance in Hadoop MapReduce implementation
Meling et al. A distributed approach to autonomous fault treatment in spread
Meroufel et al. Adaptive checkpointing with reliable storage in cloud environment
Saini et al. A Load Balancing Based Cost-Effective Multi-tenant Fault Tolerant System

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14724168

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 14724168

Country of ref document: EP

Kind code of ref document: A2