AU2021104045A4 - Failure nodes detection and recovery system in cloud computing to improve resources reliability - Google Patents

Failure nodes detection and recovery system in cloud computing to improve resources reliability Download PDF

Info

Publication number
AU2021104045A4
AU2021104045A4 AU2021104045A AU2021104045A AU2021104045A4 AU 2021104045 A4 AU2021104045 A4 AU 2021104045A4 AU 2021104045 A AU2021104045 A AU 2021104045A AU 2021104045 A AU2021104045 A AU 2021104045A AU 2021104045 A4 AU2021104045 A4 AU 2021104045A4
Authority
AU
Australia
Prior art keywords
nodes
node
failure
cloud computing
resources
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2021104045A
Inventor
G. Radha Devi
Aditya Kumar GUPTA
Sushma Jaiswal
Tarun JAISWAL
Jaylakshmi Jiddu
B. Hari Krishna
D. Sai Kumar
Pilli Lalitha Kumari
Sakuntala Mahapatra
Ashwini S.
Rabinarayan Satpathy
G. Vani
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Devi GRadha
Krishna BHari
Kumar DSai
Original Assignee
Devi GRadha
Krishna BHari
Kumar DSai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Devi GRadha, Krishna BHari, Kumar DSai filed Critical Devi GRadha
Priority to AU2021104045A priority Critical patent/AU2021104045A4/en
Application granted granted Critical
Publication of AU2021104045A4 publication Critical patent/AU2021104045A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

FAILURE NODES DETECTION AND RECOVERY SYSTEM IN CLOUD COMPUTING TO IMPROVE RESOURCES RELIABILITY ABSTRACT Intelligently connected machines such as servers, virtual machines and load balancer provides different computing resources to users in cloud computing. The Cloud Computing responds and provides resources upon user request. The cloud is not able to respond to requests because of the heavy load on the cloud nodes when several requests are received by users or due to the nodes failure. The major challenge of the Cloud Computing is to schedule user tasks as fast as users require by detecting and recovering failure nodes, while retaining a high degree of service quality (QoS) and Resources Reliability. Because cloud computing relies on a large number of nodes to execute high performance applications, it is critical to distribute resources throughout the network's nodes and creating a system and the method for failure nodes detection and recovery in the event of a node failure. The present invention disclosed herein is Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task Assignment System (204), Node Data (205), LSTM-RF (206), Vector Matrix (207), Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery (209); provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing. The present invention uses combined network of Long Short-Term Memory and Random Forest (LSTM-RF) to analyze the Node Data, Greedy Search Firefly Optimization (GSFO) to provide global optimal ranking of nodes and facilitate the failure nodes detection and recovery. The performance parameters of the present invention are validated experimentally by considering the 100 dead nodes on CloudSim, out of 100 dead nodes only 4 nodes remains as dead nodes, yield node recovery rate of 96% with reduced energy consumption of 0.45 Joules for dead nodes. The Resources Reliable Routing through active nodes improves the task scheduling and execution of 100 tasks in 2.109 Seconds. 1/2 FAILURE NODES DETECTION AND RECOVERY SYSTEM IN CLOUD COMPUTING TO IMPROVE RESOURCES RELIABILITY DRAWINGS 102 r, TASKS 103 TASK MANAGER TASKSCHEDULER USER 101 TASKS ASSIGN 105 VM Figure 1: Task Level Scheduling in Cloud Computing. 202 V SCHEDULING POLICIES 201 204 205 USER TASKS TASKSCHEDULER TASK ASSIGNMENT \ NODE DATA U T SYSTEM -D 209 20 207 206 FAILURE NODE GREEDYSEARCH DETECTION AND - FIREFLYOPTIIZATION VECTOR MATRIX LSTM-RF RECOVERY Figure 2: Block Diagram of Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability.

Description

1/2
FAILURE NODES DETECTION AND RECOVERY SYSTEM IN CLOUD COMPUTING TO IMPROVE RESOURCES RELIABILITY DRAWINGS
102 r, TASKS 103 TASK MANAGER TASKSCHEDULER USER 101
TASKS ASSIGN 105
VM
Figure 1: Task Level Scheduling in Cloud Computing.
202 V
SCHEDULING POLICIES
201 204 205
USER TASKS TASKSCHEDULER TASK ASSIGNMENT \ NODE DATA U T SYSTEM -D
209 20 207 206
FAILURE NODE GREEDYSEARCH DETECTION AND - FIREFLYOPTIIZATION VECTOR MATRIX LSTM-RF RECOVERY
Figure 2: Block Diagram of Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability.
FAILURE NODES DETECTION AND RECOVERY SYSTEM IN CLOUD COMPUTING TO IMPROVE RESOURCES RELIABILITY FIELD OF INVENTION
[0001] The present invention relates to the technical field of Computer Science Engineering.
[0002] Particularly, the present invention is related to Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability of the broader field of Cloud Computing in Computer Science Engineering.
[0003] More particularly, the present invention is relates to Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing. The Resources Reliable Routing through active nodes is provided by recovering the failed nodes in the Cloud Computing environment.
BACKGROUND OF INVENTION
[0004] In cloud computing, intelligently connected machines such as servers, virtual machines, and load balancers provide users with a variety of computational resources. When a user requests resources, Cloud Computing reacts and offers them. When multiple requests are received by users, the cloud is unable to react due to a strong load on the cloud nodes or due to node failure. The main difficulty of Cloud Computing is to schedule user tasks as quickly as they require by identifying and recovering failed nodes while maintaining high service quality (QoS) and resource reliability. A Node is a worker machine, a VM or a physical machine which contains services to run pods.
[0005] When a high number of nodes are involved, conventional methods are unable to ensure stable end-to-end communication. As a result, metaheuristic algorithms formed the backbone of the algorithms. The major application of metaheuristic algorithms is optimization. The optimization appears to be difficult due to the problem of interest's complexity and nonlinearity. Existing metaheuristic algorithms take a long time to run and have large computing costs. The search algorithms aid in reaching optimality in the context of the problem under issue. The algorithms that are currently available are mostly deterministic or stochastic.
[0006] The unpredictability is introduced at any time in the stochastic algorithm, and it is believed to be an efficient global search method. The type of problem, nature, and desired quality of solutions, as well as the available computing resource, time restriction, and method availability, all influence the optimization algorithms. Existing inventions were unable to strike a balance between the necessary quality and the available computational resources. As a result, the invention is to develop the best possible algorithms that strike a balance between quality and resources, as well as to obtain global best solutions. Special techniques would have to be added to the optimization algorithms in order to apply an approximation in the optimization process and get an ideal design at a lower computational cost. Metaheuristic algorithms are the most extensively used algorithms for optimization, and they have a number of advantages over standard algorithms.
[0007] Cloud service systems continue to have problems and fail to match client demands in practice. Computational node failures in cloud service systems are a common source of these problems. A cloud service system is made up of multiple computational nodes to which virtual machine are allocated. There is a need for a method that can learn the node behaviour and failure nature, and then need to use machine learning algorithms to recover the failed node. Node is controlled by a master who coordinates between all the nodes. A node will contain the following information: Address: Host name and the IP address of the node. The system should able to allocate Virtual Machines to the nodes which are active based on their probability of failure, reducing the frequency of node failures and the duration of VM downtime. In addition, if a node is expected to fail, the system should be able to repair it and allot VMs to improve cloud computing resource reliability. The current invention may be capable of predicting failing nodes in a cloud computing environment and subsequently recovering the nodes.
SUMMARY OF INVENTION
[0008] Intelligently connected machines such as servers, virtual machines and load balancer provides different computing resources to users in cloud computing. The Cloud Computing responds and provides resources upon user request. The cloud is not able to respond to requests because of the heavy load on the cloud nodes when several requests are received by users or due to the nodes failure. The major challenge of the Cloud Computing is to schedule user tasks as fast as users require by detecting and recovering failure nodes, while retaining a high degree of service quality (QoS) and Resources Reliability. Because cloud computing relies on a large number of nodes to execute high-performance applications, it is critical to distribute resources throughout the network's nodes and creating a system and the method for failure nodes detection and recovery in the event of a node failure..
[0009] The present invention and main embodiment of current disclosure that is Block Diagram of Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task Assignment System (204), Node Data (205), LSTM RF (206), Vector Matrix (207), Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery (209); provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing. The User Tasks (201) are submitted by the user from anywhere and at any time using various applications, Users will submit requests to the cloud server in order to gain access to data stored in the cloud. Multiple requests can be sent to the cloud server by users. Based on the Service Level Agreements (SLAs) policies and Scheduling Policies (202), the task manager generates hosts. The task manager provides details about task processing, running applications, and the priority of task scheduling. Task Scheduler (203) is a programme that selects appropriate resources for task execution based on a set of constraints and parameters, generally speaking, the task manager's constraints. Tasks are assigned to Virtual Machines (VMs) by the Task Assignment System (204), the Task Assignment System (204) allocates the tasks to the Virtual Machines (VMs).
Different tasks are assigned to the VMs. The VMs are considered as Nodes (205), the Node Data (205) contains temporal and spatial features by which the status of node in time, Input-Output throughput, resources usage, response delays, local and global relationships between the nodes, and Load balance can be known. All features may not be suitable for training to the machine learning algorithm proposed, so the features are converted into suitable form in the training phase. The main component of the present invention is LSTM-RF (206) which is the combination of the Long Short-Term Memory and Random Forest to analyze the Node Data of each node present in the cloud computing environment. The LSTM-RF (206) is trained by both the temporal and spatial data of the nodes. The temporal features are selected by the LSTM, can be operated in the bidirectional way to select the pattern behind the time-series data. The LSTM-RM (206) comprises of fully connected and dense layers. The spatial features are selected by the Random Forest (RF). The combine model LSTM-RF (206) selects the features separately with feature vector size of 128xl. The feature vectors forms as feature Vector Matrix (207). The feature vector matrix (207) contains the features of each node produced by the LSTM-RM (206) from the node data. Now these feature vectors in the matrix are ranked by the Greedy Search Firefly Optimization (208). The nodes in the cloud environment fails at different timing at different locations, generally VMs are allocated to each node for better Resources Reliability and to obtain this better Resources Reliability and allocation of resources, VMs should be allocated to the healthy nodes not for failure nodes. The Greedy Search Firefly Optimization (208) facilitates the VMs switching between the nodes if node is dead. The Greedy Search Firefly Optimization (208) ranking the node based on the features of the each node and calculates the silence probability to know the failure nodes. The Greedy Search Firefly Optimization (208) selects optimized features, raked top-K nodes as health nodes based on the optimized features and the silence probability. To rank the nodes, the Greedy Search Firefly Optimization (208) learns automatically the behavioral history and the silence probability of each node. The optimal path is routed from the nodes to the server, if the node moves the out of coverage area of the server means spatial location of the node changed then the node is treated as failure node and the its spatial location is restored automatically to make the failure node into healthy node and them VMs are allocated to improve the Resources Reliability. The Failure Node Detection and Recovery (209) provides the number of nodes failed and recovered. The performance parameters of the present invention are validated experimentally by considering the 100 dead nodes on CloudSim, out of 100 dead nodes only 4 nodes remains as dead nodes, yield node recovery rate of 96% with reduced energy consumption of 0.45 Joules for dead nodes. The Resources Reliable Routing through active nodes improves the task scheduling and execution of 100 tasks in 2.109 Seconds.
[0010] The present invention is described in various levels of detail in the Summary of the Invention, as well as the attached sketches and the Detailed Description of the Invention, and the inclusion or omission of components, sections, or other things in this Summary of the Invention is not intended to limit the scope of the present disclosure. For a better understanding of the current disclosure, read the summary of the invention with the detailed description.
BRIEF DESCRIPTION OF DRAWINGS
[0011] To better understand the innovation, the accompanying drawings are used and are incorporated into this specification. The accompanying drawings are included. The drawing shows the exemplary extent of the current disclosure and helps to understand its principles when viewed in conjunction with the explanation. The drawings are only for illustrative purposes and do not in any way limit the extent of the information. Elements that use the same reference numbers are comparable but not identical. In order to define relative components, different reference numerals can, on the other hand, be used. Some embodiments may be lacking of such parts and/or components, while others may make use of elements or components not shown in the drawings.
[0012] Referring to Figure 1, illustrates Task Level Scheduling in Cloud Computing comprising of User (101), Tasks (102), Task Manager (103), Task Scheduling (104), and Task Assign (105), in accordance with another exemplary embodiment of the present disclosure to understand the steps for scheduling the tasks to the virtual machines (VMs) in cloud of the present disclosure, This illustration is offered to aid understanding of the disclosure and should not be regarded as limiting the breadth, scope, or applicability of the disclosure.
[0013] Referring to Figure 2, illustrates the present invention and main embodiment of current disclosure that is Block Diagram of Failure Nodes Detection and Recovery
System in Cloud Computing to Improve Resources Reliability comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task Assignment System (204), Node Data (205), LSTM-RF (206), Vector Matrix (207), Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery (209); provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing, in accordance with an exemplary embodiment of the present disclosure to understand the method of detecting the failure nodes and recovering the nodes for improving the Resources Reliable Routing, performance and Quality of Services (QoS) and accompanied drawing. It shall be understood that the invention does not limit itself to this drawing in all its components in the proposed method, and that illustration is provided for understanding the disclosure and should not be understood to limit the scope or the application of the disclosure. However, some aspects and/or components may not be present in incarnations and others can be used in forms different from those listed in drawings. A plurality of such components or elements, depending on the context, may include the use of one language to describe a component or element and vice versa.
[0014] Referring to Figure 3, illustrates Flow Chart for Greedy Search Firefly Optimization (GSFO) comprising of Start GSFO (301), Search for Nodes (302), Probability Next-Node (303), Routing Table (304), If all nodes Completed (305), Distance among the Nodes (306), Shortest Node (307), Maximum Iterations Performed (308), and Stop (309), in accordance with another exemplary embodiment of the present disclosure to Greedy Search Firefly Optimization (GSFO) of the present disclosure, This illustration is offered to aid understanding of the disclosure and should not be regarded as limiting the breadth, scope, or applicability of the disclosure.
[0015] Referring to Figure 4, illustrates Plot of Node Failure Prediction, in accordance with another exemplary embodiment of the present disclosure to understand node failure percentage detection of the present disclosure, the invention is not limited only to this drawing, and this illustration is provided to assist comprehension of the disclosure and should not be construed as restricting the depth, nature, or applicability of the disclosure.
DETAIL DESCRIPTION OF INVENTION
[0016] The invention will become more well-known as a result of the following extensive description, and objects other than those stated below will become apparent. The appended drawings are used in this description. The invention will become more well-known as a result of the following detailed description, and objects other than those described above will become obvious. This description pertains to the drawings that go along with the invention. In order to offer a complete understanding of embodiments of the current disclosure, certain specifics relating to various components and processes are provided. The information provided in the embodiments should not be construed as limiting the scope of this disclosure, as those skilled in the art will understand. The order of steps revealed in this invention's process and technique should not be interpreted as necessitating the order defined or represented. Alternatives or additional steps should also be considered. While the present invention is described herein using embodiments and illustrative drawings as examples, those skilled in the art will recognize that the invention is not limited to the embodiments or drawings described, and that they are not intended to represent the scale of the various components.
[0017] Referring to Figure 1, illustrates Task Level Scheduling in Cloud Computing comprising of User (101), Tasks (102), Task Manager (103), Task Scheduling (104), and Task Assign (105), in accordance with another exemplary embodiment of the present disclosure to understand the steps for scheduling the tasks to the virtual machines (VMs) in cloud of the present disclosure. Users (101) will submit requests (102) to the cloud server in order to gain access to data stored in the cloud. Multiple requests (102) can be sent to the cloud server by users (101). Users (101) can submit their tasks (102) from anywhere and at any time using various applications. Based on the Service Level Agreements (SLAs), the task manager (103) generates hosts. The task manager (103) provides details about task processing, running applications, and the priority of task scheduling. Task Scheduler (104) is a programme that selects appropriate resources for task execution based on a set of constraints and parameters, generally speaking, the task manager's constraints (103). Tasks are assigned to Virtual Machines (VMs) by the Task Scheduler (104), the Task Scheduler (104) allocates the tasks to the Virtual Machines (VMs). Different tasks are assigned (105) to the VMs.
[0018] Referring to Figure 2, illustrates the present invention and main embodiment of current disclosure that is Block Diagram of Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task Assignment System (204), Node Data (205), LSTM-RF (206), Vector Matrix (207), Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery (209); provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing. The User Tasks (201) are submitted by the user from anywhere and at any time using various applications, Users will submit requests to the cloud server in order to gain access to data stored in the cloud. Multiple requests can be sent to the cloud server by users. Based on the Service Level Agreements (SLAs) policies and Scheduling Policies (202), the task manager generates hosts. The task manager provides details about task processing, running applications, and the priority of task scheduling. Task Scheduler (203) is a programme that selects appropriate resources for task execution based on a set of constraints and parameters, generally speaking, the task manager's constraints. Tasks are assigned to Virtual Machines (VMs) by the Task Assignment System (204), the Task Assignment System (204) allocates the tasks to the Virtual Machines (VMs). Different tasks are assigned to the VMs. The VMs are considered as Nodes (205), the Node Data (205) contains temporal and spatial features by which the status of node in time, Input-Output throughput, resources usage, response delays, local and global relationships between the nodes, and Load balance can be known. All features may not be suitable for training to the machine learning algorithm proposed, so the features are converted into suitable form in the training phase. The main component of the present invention is LSTM-RF (206) which is the combination of the Long Short-Term Memory and Random Forest to analyze the Node Data of each node present in the cloud computing environment. The LSTM-RF (206) is trained by both the temporal and spatial data of the nodes. The temporal features are selected by the LSTM, can be operated in the bidirectional way to select the pattern behind the time-series data. The LSTM-RM (206) comprises of fully connected and dense layers. The spatial features are selected by the Random Forest (RF). The combine model
LSTM-RF (206) selects the features separately with feature vector size of 128xl. The feature vectors forms as feature Vector Matrix (207). The feature vector matrix (207) contains the features of each node produced by the LSTM-RM (206) from the node data. Now these feature vectors in the matrix are ranked by the Greedy Search Firefly Optimization (208). The nodes in the cloud environment fails at different timing at different locations, generally VMs are allocated to each node for better Resources Reliability and to obtain this better Resources Reliability and allocation of resources, VMs should be allocated to the healthy nodes not for failure nodes. The Greedy Search Firefly Optimization (208) facilitates the VMs switching between the nodes if node is dead. The Greedy Search Firefly Optimization (208) ranking the node based on the features of the each node and calculates the silence probability to know the failure nodes. The Greedy Search Firefly Optimization (208) selects optimized features, raked top-K nodes as health nodes based on the optimized features and the silence probability. To rank the nodes, the Greedy Search Firefly Optimization (208) learns automatically the behavioral history and the silence probability of each node. The optimal path is routed from the nodes to the server, if the node moves the out of coverage area of the server means spatial location of the node changed then the node is treated as failure node and the its spatial location is restored automatically to make the failure node into healthy node and them VMs are allocated to improve the Resources Reliability. The Failure Node Detection and Recovery (209) provides the number of nodes failed and recovered. The following Table 1 gives the Failure Node Detection and Recovery rate of the present invention disclosed.
TABLE 1
Parameters obtained in the Present Invention
Present Invention with Greedy Search Parameters Firefly Optimization Number of Tasks 100
Dead Nodes 100 Dead Nodes Remains 04 Node Recovery Rate 96% Energy Consumption 0.45 Joules Execution Time 2.109 Seconds
The performance parameters of the present invention are validated experimentally by considering the 100 dead nodes on CloudSim, out of 100 dead nodes only 4 nodes remains as dead nodes, yield node recovery rate of 96% with reduced energy consumption of 0.45 Joules for dead nodes. The Resources Reliable Routing through active nodes improves the task scheduling and execution of 100 tasks in 2.109 Seconds.
[0019] Referring to Figure 3, illustrates Flow Chart for Greedy Search Firefly Optimization (GSFO) comprising of Start GSFO (301), Search for Nodes (302), Probability Next-Node (303), Routing Table (304), If all nodes Completed (305), Distance among the Nodes (306), Shortest Node (307), Maximum Iterations Performed (308), and Stop (309), in accordance with another exemplary embodiment of the present disclosure to Greedy Search Firefly Optimization (GSFO) of the present disclosure. Initially the Greedy Search Firefly Optimization (GSFO) started (301) to find the failure nodes and to recover the failure nodes. It Search for Nodes (302) data in the form of features and feature vectors. At each node it detects the probability of the visiting the next node (303). The probability of the node failure of the present node is kept at Routing Table (304) and then it moves to the next node to calculate the failure probability. If all nodes are visited (305), the distance among the nodes is recorded (306) and determines the optimal path with shortest node (307) distances. If the Maximum Iterations Performed (308) at each node then the optimization algorithm stops (309). During the calculation of the iterations, the spatial location of the each node is recorded in the table and if any changes in the location of the node then the node is treated as failure node and is recovered back to the original spatial location based on the spatial features extracted.
[0020] Referring to Figure 4, illustrates Plot of Node Failure Prediction, in accordance with another exemplary embodiment of the present disclosure to understand node failure percentage detection of the present disclosure. The plot illustrates that the failure node prediction increases if the number of the nodes increase. This means the present invention can predict more failure nodes as the number of the nodes increases. The prediction rates always more even if we increase the dead nodes in the experiment.
[0021] In order to provide a more detailed understanding of embodiments of the invention, some specific details are set out in the above exemplary description. An ordinary skilled person, on the other hand, might recognize that the existing innovation can be implemented without including any of the specific data presented here. The major embodiments of the present disclosure are for the detection and recovery of node failures. The subsequent description gives the details about the how the nodes are recovered by the Greedy Search Firefly Optimization technique. To predict and recover the nodes which are failed in the cloud environment, the method and the way of the present embodiment is provided in the above layout and it shall not limit the scope of the present disclosure.

Claims (5)

FAILURE NODES DETECTION AND RECOVERY SYSTEM IN CLOUD COMPUTING TO IMPROVE RESOURCES RELIABILITY CLAIMS We claim:
1. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task Assignment System (204), Node Data (205), LSTM-RF (206), Vector Matrix (207), Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery (209); provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing.
2. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability as claimed in claim 1, wherein the Node Data contains temporal and spatial features by which the status of node in time, Input-Output throughput, resources usage, response delays, local and global relationships between the nodes, and Load balance can be known. All features may not be suitable for training to the machine learning algorithm proposed, so the features are converted into suitable form in the training phase.
3. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability as claimed in claim 1, wherein LSTM-RF is the combination of the Long Short-Term Memory and Random Forest to analyze the Node Data of each node present in the cloud computing environment; the LSTM-RF is trained by both the temporal and spatial data of the nodes; the temporal features are selected by the LSTM, can be operated in the bidirectional way to select the pattern behind the time-series data; the LSTM-RM comprises of fully connected and dense layers; the spatial features are selected by the Random Forest (RF), combine model LSTM-RF selects the features separately with feature vector size of 128xl. The feature vectors forms as feature Vector Matrix.
4. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability as claimed in claim 1, wherein the Greedy Search Firefly Optimization ranking the node based on the features of the each node and calculates the silence probability to know the failure nodes, learns automatically the behavioural history and the silence probability of each node, optimal path is routed from the nodes to the server, if the node moves the out of coverage area of the server means spatial location of the node changed then the node is treated as failure node and the its spatial location is restored automatically to make the failure node into healthy node, and them VMs are allocated to improve the Resources Reliability.
5. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability as claimed in claim 1, wherein the performance parameters of the present invention are validated experimentally by considering the 100 dead nodes on CloudSim, out of 100 dead nodes only 4 nodes remains as dead nodes, yield node recovery rate of 96% with reduced energy consumption of 0.45 Joules for dead nodes. The Resources Reliable Routing through active nodes improves the task scheduling and execution of 100 tasks in 2.109 Seconds.
AU2021104045A 2021-07-11 2021-07-11 Failure nodes detection and recovery system in cloud computing to improve resources reliability Ceased AU2021104045A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2021104045A AU2021104045A4 (en) 2021-07-11 2021-07-11 Failure nodes detection and recovery system in cloud computing to improve resources reliability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2021104045A AU2021104045A4 (en) 2021-07-11 2021-07-11 Failure nodes detection and recovery system in cloud computing to improve resources reliability

Publications (1)

Publication Number Publication Date
AU2021104045A4 true AU2021104045A4 (en) 2021-09-09

Family

ID=77563661

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2021104045A Ceased AU2021104045A4 (en) 2021-07-11 2021-07-11 Failure nodes detection and recovery system in cloud computing to improve resources reliability

Country Status (1)

Country Link
AU (1) AU2021104045A4 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117014214A (en) * 2023-08-21 2023-11-07 中山市智牛电子有限公司 Intelligent control system and control method for LED display screen
US11886283B2 (en) 2022-03-30 2024-01-30 International Business Machines Corporation Automatic node crash detection and remediation in distributed computing systems

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11886283B2 (en) 2022-03-30 2024-01-30 International Business Machines Corporation Automatic node crash detection and remediation in distributed computing systems
CN117014214A (en) * 2023-08-21 2023-11-07 中山市智牛电子有限公司 Intelligent control system and control method for LED display screen
CN117014214B (en) * 2023-08-21 2024-04-02 中山市智牛电子有限公司 Intelligent control system and control method for LED display screen

Similar Documents

Publication Publication Date Title
Islam et al. Mobile cloud-based big healthcare data processing in smart cities
US10613883B2 (en) Managing virtual machine migration
AU2021104045A4 (en) Failure nodes detection and recovery system in cloud computing to improve resources reliability
US20200364608A1 (en) Communicating in a federated learning environment
JP6380110B2 (en) Resource control system, control pattern generation device, control device, resource control method, and program
Beck et al. Distributed and scalable embedding of virtual networks
Lakhan et al. Mobility and fault aware adaptive task offloading in heterogeneous mobile cloud environments
CN104679594A (en) Middleware distributed calculating method
Nguyen et al. Studying and developing a resource allocation algorithm in Fog computing
Ray et al. Prioritized fault recovery strategies for multi-access edge computing using probabilistic model checking
Han et al. EdgeTuner: Fast scheduling algorithm tuning for dynamic edge-cloud workloads and resources
Fahimullah et al. A review of resource management in fog computing: Machine learning perspective
Cao et al. Distributed workflow mapping algorithm for maximized reliability under end-to-end delay constraint
Nguyen et al. Elasticity control for latency-intolerant mobile edge applications
Talpur et al. On attack-resilient service placement and availability in edge-enabled iov networks
CN111698580B (en) Method, apparatus and computer readable medium for implementing network planning
Saleh et al. A new grid scheduler with failure recovery and rescheduling mechanisms: discussion and analysis
Yang et al. Satss: A self-adaptive task scheduling scheme for mobile edge computing
Sheetal et al. Priority based resource allocation and scheduling using artificial bee colony (ABC) optimization for cloud computing systems
Chandrika et al. Edge resource slicing approaches for latency optimization in AI-edge orchestration
Choi et al. Group-based resource selection algorithm supporting fault-tolerance in mobile grid
Tam et al. Adaptive Partial Task Offloading and Virtual Resource Placement in SDN/NFV-Based Network Softwarization.
Filho et al. Experiments with a machine-centric approach to realise distributed emergent software systems
Xie et al. A novel independent job rescheduling strategy for cloud resilience in the cloud environment
Rahumath et al. Resource Scalability and Security Using Entropy Based Adaptive Krill Herd Optimization for Auto Scaling in Cloud

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry