AU2021104045A4 - Failure nodes detection and recovery system in cloud computing to improve resources reliability - Google Patents
Failure nodes detection and recovery system in cloud computing to improve resources reliability Download PDFInfo
- Publication number
- AU2021104045A4 AU2021104045A4 AU2021104045A AU2021104045A AU2021104045A4 AU 2021104045 A4 AU2021104045 A4 AU 2021104045A4 AU 2021104045 A AU2021104045 A AU 2021104045A AU 2021104045 A AU2021104045 A AU 2021104045A AU 2021104045 A4 AU2021104045 A4 AU 2021104045A4
- Authority
- AU
- Australia
- Prior art keywords
- nodes
- node
- failure
- cloud computing
- resources
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 238000011084 recovery Methods 0.000 title claims abstract description 38
- 238000001514 detection method Methods 0.000 title claims abstract description 35
- 238000005457 optimization Methods 0.000 claims abstract description 31
- 241000254158 Lampyridae Species 0.000 claims abstract description 24
- 239000013598 vector Substances 0.000 claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 19
- 239000011159 matrix material Substances 0.000 claims abstract description 14
- 238000007637 random forest analysis Methods 0.000 claims abstract description 11
- 238000005265 energy consumption Methods 0.000 claims abstract description 5
- 230000006403 short-term memory Effects 0.000 claims abstract description 4
- 238000004422 calculation algorithm Methods 0.000 claims description 17
- 230000002123 temporal effect Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 6
- 238000010801 machine learning Methods 0.000 claims description 4
- 230000003542 behavioural effect Effects 0.000 claims description 3
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 230000001934 delay Effects 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 abstract description 5
- 230000008569 process Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000036541 health Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0709—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/50—Network service management, e.g. ensuring proper service fulfilment according to agreements
- H04L41/5003—Managing SLA; Interaction between SLA and QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
FAILURE NODES DETECTION AND RECOVERY SYSTEM IN
CLOUD COMPUTING TO IMPROVE RESOURCES RELIABILITY
ABSTRACT
Intelligently connected machines such as servers, virtual machines and load balancer
provides different computing resources to users in cloud computing. The Cloud
Computing responds and provides resources upon user request. The cloud is not able to
respond to requests because of the heavy load on the cloud nodes when several requests
are received by users or due to the nodes failure. The major challenge of the Cloud
Computing is to schedule user tasks as fast as users require by detecting and recovering
failure nodes, while retaining a high degree of service quality (QoS) and Resources
Reliability. Because cloud computing relies on a large number of nodes to execute high
performance applications, it is critical to distribute resources throughout the network's
nodes and creating a system and the method for failure nodes detection and recovery in
the event of a node failure. The present invention disclosed herein is Failure Nodes
Detection and Recovery System in Cloud Computing to Improve Resources Reliability
comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task
Assignment System (204), Node Data (205), LSTM-RF (206), Vector Matrix (207),
Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery
(209); provides an efficient method and the system for detecting and recovering the
failure nodes to improve the Resources Reliable Routing, performance and Quality of
Services (QoS) in Cloud Computing. The present invention uses combined network of
Long Short-Term Memory and Random Forest (LSTM-RF) to analyze the Node Data,
Greedy Search Firefly Optimization (GSFO) to provide global optimal ranking of nodes
and facilitate the failure nodes detection and recovery. The performance parameters of
the present invention are validated experimentally by considering the 100 dead nodes on
CloudSim, out of 100 dead nodes only 4 nodes remains as dead nodes, yield node
recovery rate of 96% with reduced energy consumption of 0.45 Joules for dead nodes.
The Resources Reliable Routing through active nodes improves the task scheduling and
execution of 100 tasks in 2.109 Seconds.
1/2
FAILURE NODES DETECTION AND RECOVERY SYSTEM IN
CLOUD COMPUTING TO IMPROVE RESOURCES
RELIABILITY
DRAWINGS
102
r, TASKS 103
TASK MANAGER TASKSCHEDULER
USER
101
TASKS ASSIGN
105
VM
Figure 1: Task Level Scheduling in Cloud Computing.
202
V
SCHEDULING POLICIES
201 204 205
USER TASKS TASKSCHEDULER TASK ASSIGNMENT \ NODE DATA
U T SYSTEM -D
209 20 207 206
FAILURE NODE GREEDYSEARCH
DETECTION AND - FIREFLYOPTIIZATION VECTOR MATRIX LSTM-RF
RECOVERY
Figure 2: Block Diagram of Failure Nodes Detection and Recovery System in Cloud
Computing to Improve Resources Reliability.
Description
1/2
102 r, TASKS 103 TASK MANAGER TASKSCHEDULER USER 101
TASKS ASSIGN 105
Figure 1: Task Level Scheduling in Cloud Computing.
202 V
201 204 205
209 20 207 206
Figure 2: Block Diagram of Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability.
[0001] The present invention relates to the technical field of Computer Science Engineering.
[0002] Particularly, the present invention is related to Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability of the broader field of Cloud Computing in Computer Science Engineering.
[0003] More particularly, the present invention is relates to Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing. The Resources Reliable Routing through active nodes is provided by recovering the failed nodes in the Cloud Computing environment.
[0004] In cloud computing, intelligently connected machines such as servers, virtual machines, and load balancers provide users with a variety of computational resources. When a user requests resources, Cloud Computing reacts and offers them. When multiple requests are received by users, the cloud is unable to react due to a strong load on the cloud nodes or due to node failure. The main difficulty of Cloud Computing is to schedule user tasks as quickly as they require by identifying and recovering failed nodes while maintaining high service quality (QoS) and resource reliability. A Node is a worker machine, a VM or a physical machine which contains services to run pods.
[0005] When a high number of nodes are involved, conventional methods are unable to ensure stable end-to-end communication. As a result, metaheuristic algorithms formed the backbone of the algorithms. The major application of metaheuristic algorithms is optimization. The optimization appears to be difficult due to the problem of interest's complexity and nonlinearity. Existing metaheuristic algorithms take a long time to run and have large computing costs. The search algorithms aid in reaching optimality in the context of the problem under issue. The algorithms that are currently available are mostly deterministic or stochastic.
[0006] The unpredictability is introduced at any time in the stochastic algorithm, and it is believed to be an efficient global search method. The type of problem, nature, and desired quality of solutions, as well as the available computing resource, time restriction, and method availability, all influence the optimization algorithms. Existing inventions were unable to strike a balance between the necessary quality and the available computational resources. As a result, the invention is to develop the best possible algorithms that strike a balance between quality and resources, as well as to obtain global best solutions. Special techniques would have to be added to the optimization algorithms in order to apply an approximation in the optimization process and get an ideal design at a lower computational cost. Metaheuristic algorithms are the most extensively used algorithms for optimization, and they have a number of advantages over standard algorithms.
[0007] Cloud service systems continue to have problems and fail to match client demands in practice. Computational node failures in cloud service systems are a common source of these problems. A cloud service system is made up of multiple computational nodes to which virtual machine are allocated. There is a need for a method that can learn the node behaviour and failure nature, and then need to use machine learning algorithms to recover the failed node. Node is controlled by a master who coordinates between all the nodes. A node will contain the following information: Address: Host name and the IP address of the node. The system should able to allocate Virtual Machines to the nodes which are active based on their probability of failure, reducing the frequency of node failures and the duration of VM downtime. In addition, if a node is expected to fail, the system should be able to repair it and allot VMs to improve cloud computing resource reliability. The current invention may be capable of predicting failing nodes in a cloud computing environment and subsequently recovering the nodes.
[0008] Intelligently connected machines such as servers, virtual machines and load balancer provides different computing resources to users in cloud computing. The Cloud Computing responds and provides resources upon user request. The cloud is not able to respond to requests because of the heavy load on the cloud nodes when several requests are received by users or due to the nodes failure. The major challenge of the Cloud Computing is to schedule user tasks as fast as users require by detecting and recovering failure nodes, while retaining a high degree of service quality (QoS) and Resources Reliability. Because cloud computing relies on a large number of nodes to execute high-performance applications, it is critical to distribute resources throughout the network's nodes and creating a system and the method for failure nodes detection and recovery in the event of a node failure..
[0009] The present invention and main embodiment of current disclosure that is Block Diagram of Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task Assignment System (204), Node Data (205), LSTM RF (206), Vector Matrix (207), Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery (209); provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing. The User Tasks (201) are submitted by the user from anywhere and at any time using various applications, Users will submit requests to the cloud server in order to gain access to data stored in the cloud. Multiple requests can be sent to the cloud server by users. Based on the Service Level Agreements (SLAs) policies and Scheduling Policies (202), the task manager generates hosts. The task manager provides details about task processing, running applications, and the priority of task scheduling. Task Scheduler (203) is a programme that selects appropriate resources for task execution based on a set of constraints and parameters, generally speaking, the task manager's constraints. Tasks are assigned to Virtual Machines (VMs) by the Task Assignment System (204), the Task Assignment System (204) allocates the tasks to the Virtual Machines (VMs).
Different tasks are assigned to the VMs. The VMs are considered as Nodes (205), the Node Data (205) contains temporal and spatial features by which the status of node in time, Input-Output throughput, resources usage, response delays, local and global relationships between the nodes, and Load balance can be known. All features may not be suitable for training to the machine learning algorithm proposed, so the features are converted into suitable form in the training phase. The main component of the present invention is LSTM-RF (206) which is the combination of the Long Short-Term Memory and Random Forest to analyze the Node Data of each node present in the cloud computing environment. The LSTM-RF (206) is trained by both the temporal and spatial data of the nodes. The temporal features are selected by the LSTM, can be operated in the bidirectional way to select the pattern behind the time-series data. The LSTM-RM (206) comprises of fully connected and dense layers. The spatial features are selected by the Random Forest (RF). The combine model LSTM-RF (206) selects the features separately with feature vector size of 128xl. The feature vectors forms as feature Vector Matrix (207). The feature vector matrix (207) contains the features of each node produced by the LSTM-RM (206) from the node data. Now these feature vectors in the matrix are ranked by the Greedy Search Firefly Optimization (208). The nodes in the cloud environment fails at different timing at different locations, generally VMs are allocated to each node for better Resources Reliability and to obtain this better Resources Reliability and allocation of resources, VMs should be allocated to the healthy nodes not for failure nodes. The Greedy Search Firefly Optimization (208) facilitates the VMs switching between the nodes if node is dead. The Greedy Search Firefly Optimization (208) ranking the node based on the features of the each node and calculates the silence probability to know the failure nodes. The Greedy Search Firefly Optimization (208) selects optimized features, raked top-K nodes as health nodes based on the optimized features and the silence probability. To rank the nodes, the Greedy Search Firefly Optimization (208) learns automatically the behavioral history and the silence probability of each node. The optimal path is routed from the nodes to the server, if the node moves the out of coverage area of the server means spatial location of the node changed then the node is treated as failure node and the its spatial location is restored automatically to make the failure node into healthy node and them VMs are allocated to improve the Resources Reliability. The Failure Node Detection and Recovery (209) provides the number of nodes failed and recovered. The performance parameters of the present invention are validated experimentally by considering the 100 dead nodes on CloudSim, out of 100 dead nodes only 4 nodes remains as dead nodes, yield node recovery rate of 96% with reduced energy consumption of 0.45 Joules for dead nodes. The Resources Reliable Routing through active nodes improves the task scheduling and execution of 100 tasks in 2.109 Seconds.
[0010] The present invention is described in various levels of detail in the Summary of the Invention, as well as the attached sketches and the Detailed Description of the Invention, and the inclusion or omission of components, sections, or other things in this Summary of the Invention is not intended to limit the scope of the present disclosure. For a better understanding of the current disclosure, read the summary of the invention with the detailed description.
[0011] To better understand the innovation, the accompanying drawings are used and are incorporated into this specification. The accompanying drawings are included. The drawing shows the exemplary extent of the current disclosure and helps to understand its principles when viewed in conjunction with the explanation. The drawings are only for illustrative purposes and do not in any way limit the extent of the information. Elements that use the same reference numbers are comparable but not identical. In order to define relative components, different reference numerals can, on the other hand, be used. Some embodiments may be lacking of such parts and/or components, while others may make use of elements or components not shown in the drawings.
[0012] Referring to Figure 1, illustrates Task Level Scheduling in Cloud Computing comprising of User (101), Tasks (102), Task Manager (103), Task Scheduling (104), and Task Assign (105), in accordance with another exemplary embodiment of the present disclosure to understand the steps for scheduling the tasks to the virtual machines (VMs) in cloud of the present disclosure, This illustration is offered to aid understanding of the disclosure and should not be regarded as limiting the breadth, scope, or applicability of the disclosure.
[0013] Referring to Figure 2, illustrates the present invention and main embodiment of current disclosure that is Block Diagram of Failure Nodes Detection and Recovery
System in Cloud Computing to Improve Resources Reliability comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task Assignment System (204), Node Data (205), LSTM-RF (206), Vector Matrix (207), Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery (209); provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing, in accordance with an exemplary embodiment of the present disclosure to understand the method of detecting the failure nodes and recovering the nodes for improving the Resources Reliable Routing, performance and Quality of Services (QoS) and accompanied drawing. It shall be understood that the invention does not limit itself to this drawing in all its components in the proposed method, and that illustration is provided for understanding the disclosure and should not be understood to limit the scope or the application of the disclosure. However, some aspects and/or components may not be present in incarnations and others can be used in forms different from those listed in drawings. A plurality of such components or elements, depending on the context, may include the use of one language to describe a component or element and vice versa.
[0014] Referring to Figure 3, illustrates Flow Chart for Greedy Search Firefly Optimization (GSFO) comprising of Start GSFO (301), Search for Nodes (302), Probability Next-Node (303), Routing Table (304), If all nodes Completed (305), Distance among the Nodes (306), Shortest Node (307), Maximum Iterations Performed (308), and Stop (309), in accordance with another exemplary embodiment of the present disclosure to Greedy Search Firefly Optimization (GSFO) of the present disclosure, This illustration is offered to aid understanding of the disclosure and should not be regarded as limiting the breadth, scope, or applicability of the disclosure.
[0015] Referring to Figure 4, illustrates Plot of Node Failure Prediction, in accordance with another exemplary embodiment of the present disclosure to understand node failure percentage detection of the present disclosure, the invention is not limited only to this drawing, and this illustration is provided to assist comprehension of the disclosure and should not be construed as restricting the depth, nature, or applicability of the disclosure.
[0016] The invention will become more well-known as a result of the following extensive description, and objects other than those stated below will become apparent. The appended drawings are used in this description. The invention will become more well-known as a result of the following detailed description, and objects other than those described above will become obvious. This description pertains to the drawings that go along with the invention. In order to offer a complete understanding of embodiments of the current disclosure, certain specifics relating to various components and processes are provided. The information provided in the embodiments should not be construed as limiting the scope of this disclosure, as those skilled in the art will understand. The order of steps revealed in this invention's process and technique should not be interpreted as necessitating the order defined or represented. Alternatives or additional steps should also be considered. While the present invention is described herein using embodiments and illustrative drawings as examples, those skilled in the art will recognize that the invention is not limited to the embodiments or drawings described, and that they are not intended to represent the scale of the various components.
[0017] Referring to Figure 1, illustrates Task Level Scheduling in Cloud Computing comprising of User (101), Tasks (102), Task Manager (103), Task Scheduling (104), and Task Assign (105), in accordance with another exemplary embodiment of the present disclosure to understand the steps for scheduling the tasks to the virtual machines (VMs) in cloud of the present disclosure. Users (101) will submit requests (102) to the cloud server in order to gain access to data stored in the cloud. Multiple requests (102) can be sent to the cloud server by users (101). Users (101) can submit their tasks (102) from anywhere and at any time using various applications. Based on the Service Level Agreements (SLAs), the task manager (103) generates hosts. The task manager (103) provides details about task processing, running applications, and the priority of task scheduling. Task Scheduler (104) is a programme that selects appropriate resources for task execution based on a set of constraints and parameters, generally speaking, the task manager's constraints (103). Tasks are assigned to Virtual Machines (VMs) by the Task Scheduler (104), the Task Scheduler (104) allocates the tasks to the Virtual Machines (VMs). Different tasks are assigned (105) to the VMs.
[0018] Referring to Figure 2, illustrates the present invention and main embodiment of current disclosure that is Block Diagram of Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task Assignment System (204), Node Data (205), LSTM-RF (206), Vector Matrix (207), Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery (209); provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing. The User Tasks (201) are submitted by the user from anywhere and at any time using various applications, Users will submit requests to the cloud server in order to gain access to data stored in the cloud. Multiple requests can be sent to the cloud server by users. Based on the Service Level Agreements (SLAs) policies and Scheduling Policies (202), the task manager generates hosts. The task manager provides details about task processing, running applications, and the priority of task scheduling. Task Scheduler (203) is a programme that selects appropriate resources for task execution based on a set of constraints and parameters, generally speaking, the task manager's constraints. Tasks are assigned to Virtual Machines (VMs) by the Task Assignment System (204), the Task Assignment System (204) allocates the tasks to the Virtual Machines (VMs). Different tasks are assigned to the VMs. The VMs are considered as Nodes (205), the Node Data (205) contains temporal and spatial features by which the status of node in time, Input-Output throughput, resources usage, response delays, local and global relationships between the nodes, and Load balance can be known. All features may not be suitable for training to the machine learning algorithm proposed, so the features are converted into suitable form in the training phase. The main component of the present invention is LSTM-RF (206) which is the combination of the Long Short-Term Memory and Random Forest to analyze the Node Data of each node present in the cloud computing environment. The LSTM-RF (206) is trained by both the temporal and spatial data of the nodes. The temporal features are selected by the LSTM, can be operated in the bidirectional way to select the pattern behind the time-series data. The LSTM-RM (206) comprises of fully connected and dense layers. The spatial features are selected by the Random Forest (RF). The combine model
LSTM-RF (206) selects the features separately with feature vector size of 128xl. The feature vectors forms as feature Vector Matrix (207). The feature vector matrix (207) contains the features of each node produced by the LSTM-RM (206) from the node data. Now these feature vectors in the matrix are ranked by the Greedy Search Firefly Optimization (208). The nodes in the cloud environment fails at different timing at different locations, generally VMs are allocated to each node for better Resources Reliability and to obtain this better Resources Reliability and allocation of resources, VMs should be allocated to the healthy nodes not for failure nodes. The Greedy Search Firefly Optimization (208) facilitates the VMs switching between the nodes if node is dead. The Greedy Search Firefly Optimization (208) ranking the node based on the features of the each node and calculates the silence probability to know the failure nodes. The Greedy Search Firefly Optimization (208) selects optimized features, raked top-K nodes as health nodes based on the optimized features and the silence probability. To rank the nodes, the Greedy Search Firefly Optimization (208) learns automatically the behavioral history and the silence probability of each node. The optimal path is routed from the nodes to the server, if the node moves the out of coverage area of the server means spatial location of the node changed then the node is treated as failure node and the its spatial location is restored automatically to make the failure node into healthy node and them VMs are allocated to improve the Resources Reliability. The Failure Node Detection and Recovery (209) provides the number of nodes failed and recovered. The following Table 1 gives the Failure Node Detection and Recovery rate of the present invention disclosed.
TABLE 1
Parameters obtained in the Present Invention
Present Invention with Greedy Search Parameters Firefly Optimization Number of Tasks 100
Dead Nodes 100 Dead Nodes Remains 04 Node Recovery Rate 96% Energy Consumption 0.45 Joules Execution Time 2.109 Seconds
The performance parameters of the present invention are validated experimentally by considering the 100 dead nodes on CloudSim, out of 100 dead nodes only 4 nodes remains as dead nodes, yield node recovery rate of 96% with reduced energy consumption of 0.45 Joules for dead nodes. The Resources Reliable Routing through active nodes improves the task scheduling and execution of 100 tasks in 2.109 Seconds.
[0019] Referring to Figure 3, illustrates Flow Chart for Greedy Search Firefly Optimization (GSFO) comprising of Start GSFO (301), Search for Nodes (302), Probability Next-Node (303), Routing Table (304), If all nodes Completed (305), Distance among the Nodes (306), Shortest Node (307), Maximum Iterations Performed (308), and Stop (309), in accordance with another exemplary embodiment of the present disclosure to Greedy Search Firefly Optimization (GSFO) of the present disclosure. Initially the Greedy Search Firefly Optimization (GSFO) started (301) to find the failure nodes and to recover the failure nodes. It Search for Nodes (302) data in the form of features and feature vectors. At each node it detects the probability of the visiting the next node (303). The probability of the node failure of the present node is kept at Routing Table (304) and then it moves to the next node to calculate the failure probability. If all nodes are visited (305), the distance among the nodes is recorded (306) and determines the optimal path with shortest node (307) distances. If the Maximum Iterations Performed (308) at each node then the optimization algorithm stops (309). During the calculation of the iterations, the spatial location of the each node is recorded in the table and if any changes in the location of the node then the node is treated as failure node and is recovered back to the original spatial location based on the spatial features extracted.
[0020] Referring to Figure 4, illustrates Plot of Node Failure Prediction, in accordance with another exemplary embodiment of the present disclosure to understand node failure percentage detection of the present disclosure. The plot illustrates that the failure node prediction increases if the number of the nodes increase. This means the present invention can predict more failure nodes as the number of the nodes increases. The prediction rates always more even if we increase the dead nodes in the experiment.
[0021] In order to provide a more detailed understanding of embodiments of the invention, some specific details are set out in the above exemplary description. An ordinary skilled person, on the other hand, might recognize that the existing innovation can be implemented without including any of the specific data presented here. The major embodiments of the present disclosure are for the detection and recovery of node failures. The subsequent description gives the details about the how the nodes are recovered by the Greedy Search Firefly Optimization technique. To predict and recover the nodes which are failed in the cloud environment, the method and the way of the present embodiment is provided in the above layout and it shall not limit the scope of the present disclosure.
Claims (5)
1. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability comprising of User Tasks (201), Scheduling Policies (202), Task Scheduler (203), Task Assignment System (204), Node Data (205), LSTM-RF (206), Vector Matrix (207), Greedy Search Firefly Optimization (208), and Failure Node Detection and Recovery (209); provides an efficient method and the system for detecting and recovering the failure nodes to improve the Resources Reliable Routing, performance and Quality of Services (QoS) in Cloud Computing.
2. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability as claimed in claim 1, wherein the Node Data contains temporal and spatial features by which the status of node in time, Input-Output throughput, resources usage, response delays, local and global relationships between the nodes, and Load balance can be known. All features may not be suitable for training to the machine learning algorithm proposed, so the features are converted into suitable form in the training phase.
3. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability as claimed in claim 1, wherein LSTM-RF is the combination of the Long Short-Term Memory and Random Forest to analyze the Node Data of each node present in the cloud computing environment; the LSTM-RF is trained by both the temporal and spatial data of the nodes; the temporal features are selected by the LSTM, can be operated in the bidirectional way to select the pattern behind the time-series data; the LSTM-RM comprises of fully connected and dense layers; the spatial features are selected by the Random Forest (RF), combine model LSTM-RF selects the features separately with feature vector size of 128xl. The feature vectors forms as feature Vector Matrix.
4. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability as claimed in claim 1, wherein the Greedy Search Firefly Optimization ranking the node based on the features of the each node and calculates the silence probability to know the failure nodes, learns automatically the behavioural history and the silence probability of each node, optimal path is routed from the nodes to the server, if the node moves the out of coverage area of the server means spatial location of the node changed then the node is treated as failure node and the its spatial location is restored automatically to make the failure node into healthy node, and them VMs are allocated to improve the Resources Reliability.
5. Failure Nodes Detection and Recovery System in Cloud Computing to Improve Resources Reliability as claimed in claim 1, wherein the performance parameters of the present invention are validated experimentally by considering the 100 dead nodes on CloudSim, out of 100 dead nodes only 4 nodes remains as dead nodes, yield node recovery rate of 96% with reduced energy consumption of 0.45 Joules for dead nodes. The Resources Reliable Routing through active nodes improves the task scheduling and execution of 100 tasks in 2.109 Seconds.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021104045A AU2021104045A4 (en) | 2021-07-11 | 2021-07-11 | Failure nodes detection and recovery system in cloud computing to improve resources reliability |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021104045A AU2021104045A4 (en) | 2021-07-11 | 2021-07-11 | Failure nodes detection and recovery system in cloud computing to improve resources reliability |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2021104045A4 true AU2021104045A4 (en) | 2021-09-09 |
Family
ID=77563661
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2021104045A Ceased AU2021104045A4 (en) | 2021-07-11 | 2021-07-11 | Failure nodes detection and recovery system in cloud computing to improve resources reliability |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU2021104045A4 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117014214A (en) * | 2023-08-21 | 2023-11-07 | 中山市智牛电子有限公司 | Intelligent control system and control method for LED display screen |
US11886283B2 (en) | 2022-03-30 | 2024-01-30 | International Business Machines Corporation | Automatic node crash detection and remediation in distributed computing systems |
-
2021
- 2021-07-11 AU AU2021104045A patent/AU2021104045A4/en not_active Ceased
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11886283B2 (en) | 2022-03-30 | 2024-01-30 | International Business Machines Corporation | Automatic node crash detection and remediation in distributed computing systems |
CN117014214A (en) * | 2023-08-21 | 2023-11-07 | 中山市智牛电子有限公司 | Intelligent control system and control method for LED display screen |
CN117014214B (en) * | 2023-08-21 | 2024-04-02 | 中山市智牛电子有限公司 | Intelligent control system and control method for LED display screen |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Islam et al. | Mobile cloud-based big healthcare data processing in smart cities | |
US10613883B2 (en) | Managing virtual machine migration | |
AU2021104045A4 (en) | Failure nodes detection and recovery system in cloud computing to improve resources reliability | |
US20200364608A1 (en) | Communicating in a federated learning environment | |
JP6380110B2 (en) | Resource control system, control pattern generation device, control device, resource control method, and program | |
Beck et al. | Distributed and scalable embedding of virtual networks | |
Lakhan et al. | Mobility and fault aware adaptive task offloading in heterogeneous mobile cloud environments | |
CN104679594A (en) | Middleware distributed calculating method | |
Nguyen et al. | Studying and developing a resource allocation algorithm in Fog computing | |
Ray et al. | Prioritized fault recovery strategies for multi-access edge computing using probabilistic model checking | |
Han et al. | EdgeTuner: Fast scheduling algorithm tuning for dynamic edge-cloud workloads and resources | |
Fahimullah et al. | A review of resource management in fog computing: Machine learning perspective | |
Cao et al. | Distributed workflow mapping algorithm for maximized reliability under end-to-end delay constraint | |
Nguyen et al. | Elasticity control for latency-intolerant mobile edge applications | |
Talpur et al. | On attack-resilient service placement and availability in edge-enabled iov networks | |
CN111698580B (en) | Method, apparatus and computer readable medium for implementing network planning | |
Saleh et al. | A new grid scheduler with failure recovery and rescheduling mechanisms: discussion and analysis | |
Yang et al. | Satss: A self-adaptive task scheduling scheme for mobile edge computing | |
Sheetal et al. | Priority based resource allocation and scheduling using artificial bee colony (ABC) optimization for cloud computing systems | |
Chandrika et al. | Edge resource slicing approaches for latency optimization in AI-edge orchestration | |
Choi et al. | Group-based resource selection algorithm supporting fault-tolerance in mobile grid | |
Tam et al. | Adaptive Partial Task Offloading and Virtual Resource Placement in SDN/NFV-Based Network Softwarization. | |
Filho et al. | Experiments with a machine-centric approach to realise distributed emergent software systems | |
Xie et al. | A novel independent job rescheduling strategy for cloud resilience in the cloud environment | |
Rahumath et al. | Resource Scalability and Security Using Entropy Based Adaptive Krill Herd Optimization for Auto Scaling in Cloud |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FGI | Letters patent sealed or granted (innovation patent) | ||
MK22 | Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry |