AU2021104045A4

AU2021104045A4 - Failure nodes detection and recovery system in cloud computing to improve resources reliability

Info

Publication number: AU2021104045A4
Application number: AU2021104045A
Authority: AU
Inventors: G. Radha Devi; Aditya Kumar GUPTA; Sushma Jaiswal; Tarun JAISWAL; Jaylakshmi Jiddu; B. Hari Krishna; D. Sai Kumar; Pilli Lalitha Kumari; Sakuntala Mahapatra; Ashwini S.; Rabinarayan Satpathy; G. Vani
Original assignee: Devi GRadha; Krishna BHari; Kumar DSai
Current assignee: Devi GRadha; Krishna BHari; Kumar DSai
Priority date: 2021-07-11
Filing date: 2021-07-11
Publication date: 2021-09-09
Anticipated expiration: 2029-07-11

Abstract

FAILURE NODES DETECTION AND RECOVERY SYSTEM IN CLOUD COMPUTING TO IMPROVE RESOURCES RELIABILITY ABSTRACT Intelligently connected machines such as servers, virtual machines and load balancer provides different computing resources to users in cloud computing. The Cloud Computing responds and provides resources upon user request. The cloud is not able to respond to requests because of the heavy load on the cloud nodes when several requests are received by users or due to the nodes failure. The major challenge of the Cloud Computing is to schedule user tasks as fast as users require by detecting and recovering failure nodes, while retaining a high degree of service quality (QoS) and Resources Reliability. Because cloud computing relies on a large number of nodes to execute high performance applications, it is critical to distribute resources throughout the network's nodes and creating a system and the method for failure nodes detection and recovery in the event of a node failure. The present invention disclosed herein is Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task Assignment System (204), Node Data (205), LSTM-RF (206), Vector Matrix (207), Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery (209); provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing. The present invention uses combined network of Long Short-Term Memory and Random Forest (LSTM-RF) to analyze the Node Data, Greedy Search Firefly Optimization (GSFO) to provide global optimal ranking of nodes and facilitate the failure nodes detection and recovery. The performance parameters of the present invention are validated experimentally by considering the 100 dead nodes on CloudSim, out of 100 dead nodes only 4 nodes remains as dead nodes, yield node recovery rate of 96% with reduced energy consumption of 0.45 Joules for dead nodes. The Resources Reliable Routing through active nodes improves the task scheduling and execution of 100 tasks in 2.109 Seconds. 1/2 FAILURE NODES DETECTION AND RECOVERY SYSTEM IN CLOUD COMPUTING TO IMPROVE RESOURCES RELIABILITY DRAWINGS 102 r, TASKS 103 TASK MANAGER TASKSCHEDULER USER 101 TASKS ASSIGN 105 VM Figure 1: Task Level Scheduling in Cloud Computing. 202 V SCHEDULING POLICIES 201 204 205 USER TASKS TASKSCHEDULER TASK ASSIGNMENT \ NODE DATA U T SYSTEM -D 209 20 207 206 FAILURE NODE GREEDYSEARCH DETECTION AND - FIREFLYOPTIIZATION VECTOR MATRIX LSTM-RF RECOVERY Figure 2: Block Diagram of Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability.

Description

1/2

FAILURE NODES DETECTION AND RECOVERY SYSTEM IN CLOUD COMPUTING TO IMPROVE RESOURCES RELIABILITY DRAWINGS

102 r, TASKS 103 TASK MANAGER TASKSCHEDULER USER 101

TASKS ASSIGN 105

VM

Figure 1: Task Level Scheduling in Cloud Computing.

202 V

SCHEDULING POLICIES

201 204 205

USER TASKS TASKSCHEDULER TASK ASSIGNMENT \ NODE DATA U T SYSTEM -D

209 20 207 206

FAILURE NODE GREEDYSEARCH DETECTION AND - FIREFLYOPTIIZATION VECTOR MATRIX LSTM-RF RECOVERY

Figure 2: Block Diagram of Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability.

FAILURE NODES DETECTION AND RECOVERY SYSTEM IN CLOUD COMPUTING TO IMPROVE RESOURCES RELIABILITY FIELD OF INVENTION

[0001] The present invention relates to the technical field of Computer Science Engineering.

[0002] Particularly, the present invention is related to Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability of the broader field of Cloud Computing in Computer Science Engineering.

[0003] More particularly, the present invention is relates to Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing. The Resources Reliable Routing through active nodes is provided by recovering the failed nodes in the Cloud Computing environment.

BACKGROUND OF INVENTION

[0004] In cloud computing, intelligently connected machines such as servers, virtual machines, and load balancers provide users with a variety of computational resources. When a user requests resources, Cloud Computing reacts and offers them. When multiple requests are received by users, the cloud is unable to react due to a strong load on the cloud nodes or due to node failure. The main difficulty of Cloud Computing is to schedule user tasks as quickly as they require by identifying and recovering failed nodes while maintaining high service quality (QoS) and resource reliability. A Node is a worker machine, a VM or a physical machine which contains services to run pods.

[0005] When a high number of nodes are involved, conventional methods are unable to ensure stable end-to-end communication. As a result, metaheuristic algorithms formed the backbone of the algorithms. The major application of metaheuristic algorithms is optimization. The optimization appears to be difficult due to the problem of interest's complexity and nonlinearity. Existing metaheuristic algorithms take a long time to run and have large computing costs. The search algorithms aid in reaching optimality in the context of the problem under issue. The algorithms that are currently available are mostly deterministic or stochastic.

[0006] The unpredictability is introduced at any time in the stochastic algorithm, and it is believed to be an efficient global search method. The type of problem, nature, and desired quality of solutions, as well as the available computing resource, time restriction, and method availability, all influence the optimization algorithms. Existing inventions were unable to strike a balance between the necessary quality and the available computational resources. As a result, the invention is to develop the best possible algorithms that strike a balance between quality and resources, as well as to obtain global best solutions. Special techniques would have to be added to the optimization algorithms in order to apply an approximation in the optimization process and get an ideal design at a lower computational cost. Metaheuristic algorithms are the most extensively used algorithms for optimization, and they have a number of advantages over standard algorithms.

[0007] Cloud service systems continue to have problems and fail to match client demands in practice. Computational node failures in cloud service systems are a common source of these problems. A cloud service system is made up of multiple computational nodes to which virtual machine are allocated. There is a need for a method that can learn the node behaviour and failure nature, and then need to use machine learning algorithms to recover the failed node. Node is controlled by a master who coordinates between all the nodes. A node will contain the following information: Address: Host name and the IP address of the node. The system should able to allocate Virtual Machines to the nodes which are active based on their probability of failure, reducing the frequency of node failures and the duration of VM downtime. In addition, if a node is expected to fail, the system should be able to repair it and allot VMs to improve cloud computing resource reliability. The current invention may be capable of predicting failing nodes in a cloud computing environment and subsequently recovering the nodes.

SUMMARY OF INVENTION

[0008] Intelligently connected machines such as servers, virtual machines and load balancer provides different computing resources to users in cloud computing. The Cloud Computing responds and provides resources upon user request. The cloud is not able to respond to requests because of the heavy load on the cloud nodes when several requests are received by users or due to the nodes failure. The major challenge of the Cloud Computing is to schedule user tasks as fast as users require by detecting and recovering failure nodes, while retaining a high degree of service quality (QoS) and Resources Reliability. Because cloud computing relies on a large number of nodes to execute high-performance applications, it is critical to distribute resources throughout the network's nodes and creating a system and the method for failure nodes detection and recovery in the event of a node failure..

[0009] The present invention and main embodiment of current disclosure that is Block Diagram of Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task Assignment System (204), Node Data (205), LSTM RF (206), Vector Matrix (207), Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery (209); provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing. The User Tasks (201) are submitted by the user from anywhere and at any time using various applications, Users will submit requests to the cloud server in order to gain access to data stored in the cloud. Multiple requests can be sent to the cloud server by users. Based on the Service Level Agreements (SLAs) policies and Scheduling Policies (202), the task manager generates hosts. The task manager provides details about task processing, running applications, and the priority of task scheduling. Task Scheduler (203) is a programme that selects appropriate resources for task execution based on a set of constraints and parameters, generally speaking, the task manager's constraints. Tasks are assigned to Virtual Machines (VMs) by the Task Assignment System (204), the Task Assignment System (204) allocates the tasks to the Virtual Machines (VMs).

Different tasks are assigned to the VMs. The VMs are considered as Nodes (205), the Node Data (205) contains temporal and spatial features by which the status of node in time, Input-Output throughput, resources usage, response delays, local and global relationships between the nodes, and Load balance can be known. All features may not be suitable for training to the machine learning algorithm proposed, so the features are converted into suitable form in the training phase. The main component of the present invention is LSTM-RF (206) which is the combination of the Long Short-Term Memory and Random Forest to analyze the Node Data of each node present in the cloud computing environment. The LSTM-RF (206) is trained by both the temporal and spatial data of the nodes. The temporal features are selected by the LSTM, can be operated in the bidirectional way to select the pattern behind the time-series data. The LSTM-RM (206) comprises of fully connected and dense layers. The spatial features are selected by the Random Forest (RF). The combine model LSTM-RF (206) selects the features separately with feature vector size of 128xl. The feature vectors forms as feature Vector Matrix (207). The feature vector matrix (207) contains the features of each node produced by the LSTM-RM (206) from the node data. Now these feature vectors in the matrix are ranked by the Greedy Search Firefly Optimization (208). The nodes in the cloud environment fails at different timing at different locations, generally VMs are allocated to each node for better Resources Reliability and to obtain this better Resources Reliability and allocation of resources, VMs should be allocated to the healthy nodes not for failure nodes. The Greedy Search Firefly Optimization (208) facilitates the VMs switching between the nodes if node is dead. The Greedy Search Firefly Optimization (208) ranking the node based on the features of the each node and calculates the silence probability to know the failure nodes. The Greedy Search Firefly Optimization (208) selects optimized features, raked top-K nodes as health nodes based on the optimized features and the silence probability. To rank the nodes, the Greedy Search Firefly Optimization (208) learns automatically the behavioral history and the silence probability of each node. The optimal path is routed from the nodes to the server, if the node moves the out of coverage area of the server means spatial location of the node changed then the node is treated as failure node and the its spatial location is restored automatically to make the failure node into healthy node and them VMs are allocated to improve the Resources Reliability. The Failure Node Detection and Recovery (209) provides the number of nodes failed and recovered. The performance parameters of the present invention are validated experimentally by considering the 100 dead nodes on CloudSim, out of 100 dead nodes only 4 nodes remains as dead nodes, yield node recovery rate of 96% with reduced energy consumption of 0.45 Joules for dead nodes. The Resources Reliable Routing through active nodes improves the task scheduling and execution of 100 tasks in 2.109 Seconds.

[0010] The present invention is described in various levels of detail in the Summary of the Invention, as well as the attached sketches and the Detailed Description of the Invention, and the inclusion or omission of components, sections, or other things in this Summary of the Invention is not intended to limit the scope of the present disclosure. For a better understanding of the current disclosure, read the summary of the invention with the detailed description.

BRIEF DESCRIPTION OF DRAWINGS

[0011] To better understand the innovation, the accompanying drawings are used and are incorporated into this specification. The accompanying drawings are included. The drawing shows the exemplary extent of the current disclosure and helps to understand its principles when viewed in conjunction with the explanation. The drawings are only for illustrative purposes and do not in any way limit the extent of the information. Elements that use the same reference numbers are comparable but not identical. In order to define relative components, different reference numerals can, on the other hand, be used. Some embodiments may be lacking of such parts and/or components, while others may make use of elements or components not shown in the drawings.

[0012] Referring to Figure 1, illustrates Task Level Scheduling in Cloud Computing comprising of User (101), Tasks (102), Task Manager (103), Task Scheduling (104), and Task Assign (105), in accordance with another exemplary embodiment of the present disclosure to understand the steps for scheduling the tasks to the virtual machines (VMs) in cloud of the present disclosure, This illustration is offered to aid understanding of the disclosure and should not be regarded as limiting the breadth, scope, or applicability of the disclosure.

[0013] Referring to Figure 2, illustrates the present invention and main embodiment of current disclosure that is Block Diagram of Failure Nodes Detection and Recovery

System in Cloud Computing to Improve Resources Reliability comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task Assignment System (204), Node Data (205), LSTM-RF (206), Vector Matrix (207), Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery (209); provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing, in accordance with an exemplary embodiment of the present disclosure to understand the method of detecting the failure nodes and recovering the nodes for improving the Resources Reliable Routing, performance and Quality of Services (QoS) and accompanied drawing. It shall be understood that the invention does not limit itself to this drawing in all its components in the proposed method, and that illustration is provided for understanding the disclosure and should not be understood to limit the scope or the application of the disclosure. However, some aspects and/or components may not be present in incarnations and others can be used in forms different from those listed in drawings. A plurality of such components or elements, depending on the context, may include the use of one language to describe a component or element and vice versa.

[0014] Referring to Figure 3, illustrates Flow Chart for Greedy Search Firefly Optimization (GSFO) comprising of Start GSFO (301), Search for Nodes (302), Probability Next-Node (303), Routing Table (304), If all nodes Completed (305), Distance among the Nodes (306), Shortest Node (307), Maximum Iterations Performed (308), and Stop (309), in accordance with another exemplary embodiment of the present disclosure to Greedy Search Firefly Optimization (GSFO) of the present disclosure, This illustration is offered to aid understanding of the disclosure and should not be regarded as limiting the breadth, scope, or applicability of the disclosure.

[0015] Referring to Figure 4, illustrates Plot of Node Failure Prediction, in accordance with another exemplary embodiment of the present disclosure to understand node failure percentage detection of the present disclosure, the invention is not limited only to this drawing, and this illustration is provided to assist comprehension of the disclosure and should not be construed as restricting the depth, nature, or applicability of the disclosure.

DETAIL DESCRIPTION OF INVENTION

[0016] The invention will become more well-known as a result of the following extensive description, and objects other than those stated below will become apparent. The appended drawings are used in this description. The invention will become more well-known as a result of the following detailed description, and objects other than those described above will become obvious. This description pertains to the drawings that go along with the invention. In order to offer a complete understanding of embodiments of the current disclosure, certain specifics relating to various components and processes are provided. The information provided in the embodiments should not be construed as limiting the scope of this disclosure, as those skilled in the art will understand. The order of steps revealed in this invention's process and technique should not be interpreted as necessitating the order defined or represented. Alternatives or additional steps should also be considered. While the present invention is described herein using embodiments and illustrative drawings as examples, those skilled in the art will recognize that the invention is not limited to the embodiments or drawings described, and that they are not intended to represent the scale of the various components.

[0017] Referring to Figure 1, illustrates Task Level Scheduling in Cloud Computing comprising of User (101), Tasks (102), Task Manager (103), Task Scheduling (104), and Task Assign (105), in accordance with another exemplary embodiment of the present disclosure to understand the steps for scheduling the tasks to the virtual machines (VMs) in cloud of the present disclosure. Users (101) will submit requests (102) to the cloud server in order to gain access to data stored in the cloud. Multiple requests (102) can be sent to the cloud server by users (101). Users (101) can submit their tasks (102) from anywhere and at any time using various applications. Based on the Service Level Agreements (SLAs), the task manager (103) generates hosts. The task manager (103) provides details about task processing, running applications, and the priority of task scheduling. Task Scheduler (104) is a programme that selects appropriate resources for task execution based on a set of constraints and parameters, generally speaking, the task manager's constraints (103). Tasks are assigned to Virtual Machines (VMs) by the Task Scheduler (104), the Task Scheduler (104) allocates the tasks to the Virtual Machines (VMs). Different tasks are assigned (105) to the VMs.

[0018] Referring to Figure 2, illustrates the present invention and main embodiment of current disclosure that is Block Diagram of Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task Assignment System (204), Node Data (205), LSTM-RF (206), Vector Matrix (207), Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery (209); provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing. The User Tasks (201) are submitted by the user from anywhere and at any time using various applications, Users will submit requests to the cloud server in order to gain access to data stored in the cloud. Multiple requests can be sent to the cloud server by users. Based on the Service Level Agreements (SLAs) policies and Scheduling Policies (202), the task manager generates hosts. The task manager provides details about task processing, running applications, and the priority of task scheduling. Task Scheduler (203) is a programme that selects appropriate resources for task execution based on a set of constraints and parameters, generally speaking, the task manager's constraints. Tasks are assigned to Virtual Machines (VMs) by the Task Assignment System (204), the Task Assignment System (204) allocates the tasks to the Virtual Machines (VMs). Different tasks are assigned to the VMs. The VMs are considered as Nodes (205), the Node Data (205) contains temporal and spatial features by which the status of node in time, Input-Output throughput, resources usage, response delays, local and global relationships between the nodes, and Load balance can be known. All features may not be suitable for training to the machine learning algorithm proposed, so the features are converted into suitable form in the training phase. The main component of the present invention is LSTM-RF (206) which is the combination of the Long Short-Term Memory and Random Forest to analyze the Node Data of each node present in the cloud computing environment. The LSTM-RF (206) is trained by both the temporal and spatial data of the nodes. The temporal features are selected by the LSTM, can be operated in the bidirectional way to select the pattern behind the time-series data. The LSTM-RM (206) comprises of fully connected and dense layers. The spatial features are selected by the Random Forest (RF). The combine model

LSTM-RF (206) selects the features separately with feature vector size of 128xl. The feature vectors forms as feature Vector Matrix (207). The feature vector matrix (207) contains the features of each node produced by the LSTM-RM (206) from the node data. Now these feature vectors in the matrix are ranked by the Greedy Search Firefly Optimization (208). The nodes in the cloud environment fails at different timing at different locations, generally VMs are allocated to each node for better Resources Reliability and to obtain this better Resources Reliability and allocation of resources, VMs should be allocated to the healthy nodes not for failure nodes. The Greedy Search Firefly Optimization (208) facilitates the VMs switching between the nodes if node is dead. The Greedy Search Firefly Optimization (208) ranking the node based on the features of the each node and calculates the silence probability to know the failure nodes. The Greedy Search Firefly Optimization (208) selects optimized features, raked top-K nodes as health nodes based on the optimized features and the silence probability. To rank the nodes, the Greedy Search Firefly Optimization (208) learns automatically the behavioral history and the silence probability of each node. The optimal path is routed from the nodes to the server, if the node moves the out of coverage area of the server means spatial location of the node changed then the node is treated as failure node and the its spatial location is restored automatically to make the failure node into healthy node and them VMs are allocated to improve the Resources Reliability. The Failure Node Detection and Recovery (209) provides the number of nodes failed and recovered. The following Table 1 gives the Failure Node Detection and Recovery rate of the present invention disclosed.

TABLE 1

Parameters obtained in the Present Invention

Present Invention with Greedy Search Parameters Firefly Optimization Number of Tasks 100

Dead Nodes 100 Dead Nodes Remains 04 Node Recovery Rate 96% Energy Consumption 0.45 Joules Execution Time 2.109 Seconds

The performance parameters of the present invention are validated experimentally by considering the 100 dead nodes on CloudSim, out of 100 dead nodes only 4 nodes remains as dead nodes, yield node recovery rate of 96% with reduced energy consumption of 0.45 Joules for dead nodes. The Resources Reliable Routing through active nodes improves the task scheduling and execution of 100 tasks in 2.109 Seconds.

[0019] Referring to Figure 3, illustrates Flow Chart for Greedy Search Firefly Optimization (GSFO) comprising of Start GSFO (301), Search for Nodes (302), Probability Next-Node (303), Routing Table (304), If all nodes Completed (305), Distance among the Nodes (306), Shortest Node (307), Maximum Iterations Performed (308), and Stop (309), in accordance with another exemplary embodiment of the present disclosure to Greedy Search Firefly Optimization (GSFO) of the present disclosure. Initially the Greedy Search Firefly Optimization (GSFO) started (301) to find the failure nodes and to recover the failure nodes. It Search for Nodes (302) data in the form of features and feature vectors. At each node it detects the probability of the visiting the next node (303). The probability of the node failure of the present node is kept at Routing Table (304) and then it moves to the next node to calculate the failure probability. If all nodes are visited (305), the distance among the nodes is recorded (306) and determines the optimal path with shortest node (307) distances. If the Maximum Iterations Performed (308) at each node then the optimization algorithm stops (309). During the calculation of the iterations, the spatial location of the each node is recorded in the table and if any changes in the location of the node then the node is treated as failure node and is recovered back to the original spatial location based on the spatial features extracted.

[0020] Referring to Figure 4, illustrates Plot of Node Failure Prediction, in accordance with another exemplary embodiment of the present disclosure to understand node failure percentage detection of the present disclosure. The plot illustrates that the failure node prediction increases if the number of the nodes increase. This means the present invention can predict more failure nodes as the number of the nodes increases. The prediction rates always more even if we increase the dead nodes in the experiment.

[0021] In order to provide a more detailed understanding of embodiments of the invention, some specific details are set out in the above exemplary description. An ordinary skilled person, on the other hand, might recognize that the existing innovation can be implemented without including any of the specific data presented here. The major embodiments of the present disclosure are for the detection and recovery of node failures. The subsequent description gives the details about the how the nodes are recovered by the Greedy Search Firefly Optimization technique. To predict and recover the nodes which are failed in the cloud environment, the method and the way of the present embodiment is provided in the above layout and it shall not limit the scope of the present disclosure.

Claims

FAILURE NODES DETECTION AND RECOVERY SYSTEM IN CLOUD COMPUTING TO IMPROVE RESOURCES RELIABILITY CLAIMS We claim:

1. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task Assignment System (204), Node Data (205), LSTM-RF (206), Vector Matrix (207), Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery (209); provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing.

2. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability as claimed in claim 1, wherein the Node Data contains temporal and spatial features by which the status of node in time, Input-Output throughput, resources usage, response delays, local and global relationships between the nodes, and Load balance can be known. All features may not be suitable for training to the machine learning algorithm proposed, so the features are converted into suitable form in the training phase.

3. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability as claimed in claim 1, wherein LSTM-RF is the combination of the Long Short-Term Memory and Random Forest to analyze the Node Data of each node present in the cloud computing environment; the LSTM-RF is trained by both the temporal and spatial data of the nodes; the temporal features are selected by the LSTM, can be operated in the bidirectional way to select the pattern behind the time-series data; the LSTM-RM comprises of fully connected and dense layers; the spatial features are selected by the Random Forest (RF), combine model LSTM-RF selects the features separately with feature vector size of 128xl. The feature vectors forms as feature Vector Matrix.

4. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability as claimed in claim 1, wherein the Greedy Search Firefly Optimization ranking the node based on the features of the each node and calculates the silence probability to know the failure nodes, learns automatically the behavioural history and the silence probability of each node, optimal path is routed from the nodes to the server, if the node moves the out of coverage area of the server means spatial location of the node changed then the node is treated as failure node and the its spatial location is restored automatically to make the failure node into healthy node, and them VMs are allocated to improve the Resources Reliability.

5. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability as claimed in claim 1, wherein the performance parameters of the present invention are validated experimentally by considering the 100 dead nodes on CloudSim, out of 100 dead nodes only 4 nodes remains as dead nodes, yield node recovery rate of 96% with reduced energy consumption of 0.45 Joules for dead nodes. The Resources Reliable Routing through active nodes improves the task scheduling and execution of 100 tasks in 2.109 Seconds.