CN115102962A - Cluster management method and device, computer equipment and storage medium - Google Patents
Cluster management method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN115102962A CN115102962A CN202210711547.7A CN202210711547A CN115102962A CN 115102962 A CN115102962 A CN 115102962A CN 202210711547 A CN202210711547 A CN 202210711547A CN 115102962 A CN115102962 A CN 115102962A
- Authority
- CN
- China
- Prior art keywords
- node
- attribute
- abnormal
- target
- cluster system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007726 management method Methods 0.000 title abstract description 57
- 230000002159 abnormal effect Effects 0.000 claims abstract description 218
- 238000002955 isolation Methods 0.000 claims abstract description 69
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000012545 processing Methods 0.000 claims abstract description 37
- 238000004590 computer program Methods 0.000 claims abstract description 30
- 238000012544 monitoring process Methods 0.000 claims abstract description 21
- 238000012937 correction Methods 0.000 claims abstract description 18
- 230000000903 blocking effect Effects 0.000 claims abstract description 14
- 238000001514 detection method Methods 0.000 claims description 28
- 230000004044 response Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 230000005856 abnormality Effects 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
- H04L67/1044—Group management mechanisms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0659—Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0659—Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
- H04L41/0661—Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities by reconfiguring faulty entities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
- H04L67/1044—Group management mechanisms
- H04L67/1048—Departure or maintenance mechanisms
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Debugging And Monitoring (AREA)
Abstract
The present application relates to a cluster management method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: monitoring the running state of each node in the cluster system, and determining abnormal nodes in the cluster system according to the running state; if the node attribute of the abnormal node is the target attribute, the abnormal node is placed in an isolation area, the target attribute represents that the current node is providing service, and the isolation area is used for blocking the correction processing of the cluster system on the abnormal node; and determining a first target node in other nodes except the abnormal node, updating the node attribute of the first target node into a target attribute to obtain a new target attribute node, and executing the service through the new target attribute node. By adopting the method, the cluster system can provide the service without interruption.
Description
Technical Field
The present application relates to the field of internet technologies, and in particular, to a cluster management method and apparatus, a computer device, a storage medium, and a computer program product.
Background
With the development of Internet technology, in a current cluster system, a cluster management tool configures a virtual IP (Internet Protocol) in advance, a user accesses the virtual IP and initiates a service request, and then the cluster system determines a target node and provides the service through the target node. Therefore, in order to effectively provide the service, the abnormality detection management needs to be performed on the target node.
In the current method for managing the exception of the target node, in order to ensure the multi-node security activity (also called resource security activity) of the cluster system, the cluster system immediately detects and corrects the exception parameter of the target node until the target node recovers, and then the target node provides the service.
However, when the abnormal target node has a complicated function and needs to be connected with more external services, it takes a long time to detect and correct the abnormality of the abnormal target node, which further causes discontinuity of service provision.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a cluster management method, apparatus, computer device, computer readable storage medium and computer program product for solving the above technical problems.
In a first aspect, the present application provides a cluster management method. The method comprises the following steps:
monitoring the running state of each node in a cluster system, and determining abnormal nodes in the cluster system according to the running state;
if the node attribute of the abnormal node is a target attribute, the abnormal node is placed in an isolation area, the target attribute represents that the current node is providing service, and the isolation area is used for blocking the correction processing of the cluster system on the abnormal node;
and determining a first target node in other nodes except the abnormal node, updating the node attribute of the first target node to the target attribute to obtain a new target attribute node, and executing business service through the new target attribute node.
By adopting the method, each node is monitored in real time in the cluster system, if the node attribute of the abnormal node is detected to be the target attribute, the abnormal node is directly placed into the isolation area, and then a new target attribute node is selected to execute the service, so that the cluster system can uninterruptedly provide the service.
In one embodiment, the monitoring the operating state of each node in the cluster system, and determining an abnormal node in the cluster system according to the operating state includes:
monitoring heartbeat signals sent by each node in a cluster system;
and if the heartbeat signal sent by the second target node does not meet the preset heartbeat signal detection condition, determining that the second target node is an abnormal node.
In this embodiment, the heartbeat signal sent by each node is monitored in real time by the heartbeat detection tool, and the running state of each node in the cluster system is judged according to the periodic sending rule of the heartbeat signal of each node, so that an abnormal node in the cluster system is found in time.
In one embodiment, the determining a first target node in the nodes other than the abnormal node, and updating the node attribute of the first target node to the target attribute to obtain a new target attribute node includes:
determining a first target node in other nodes except the abnormal node according to a preset node election strategy;
adding an attribute tag of a target attribute for the first target node, and updating the node attribute of the first target node into the target attribute;
and pointing the cluster virtual access address to the first target node according to the attribute label of the first target node and the cluster virtual access address to obtain a new target attribute node.
In this embodiment, the elected first target node updates the node attribute of the target node and re-determines the direction of the virtual access address in the cluster system, so as to obtain a new target attribute node to continue providing the service, thereby ensuring continuity of the service.
In one embodiment, the method further comprises:
and if the node attribute of the abnormal node is the non-target attribute, correcting the abnormal node through a preset abnormal processing strategy to obtain a corrected node with the non-target attribute.
In this embodiment, the node attribute of the abnormal node is detected to determine that the abnormal node is a node with a non-target attribute, so that the service provided by the cluster system is not affected, and therefore, the abnormal problem of the abnormal node is directly corrected, the corrected normal operation node is obtained, and the node security activity in the cluster system is improved.
In one embodiment, if the node attribute of the abnormal node is a target attribute, after the abnormal node is placed in an isolation area, the method further includes:
and generating alarm information of the abnormity of the target attribute node, and outputting and displaying the alarm information.
In this embodiment, when the node attribute of the abnormal node is the target attribute, the abnormal node is placed in the isolation region, and the cluster management system is prevented from correcting the abnormal node and providing the service to the node, so that the cluster management tool generates the alarm information for the abnormal node with the target attribute and notifies the manager of the current operation condition of the cluster system, thereby improving the management timeliness of the cluster management.
In one embodiment, the method further comprises:
in response to a request for releasing isolation of the abnormal node in the isolation area, updating the node attribute of the abnormal node into a non-target attribute, and adding the node attribute to the cluster system;
and under the condition that the node attribute of the abnormal node in the cluster system is the non-target attribute, correcting the abnormal node according to a preset abnormal processing strategy to obtain the node of the non-target attribute after correction in the cluster system.
In this embodiment, the abnormal node in the isolation area is restored to the cluster system again to become an abnormal node with a non-target attribute, and further the processing of the abnormal node does not affect the service provision of the cluster system, so that the abnormal problem of the abnormal node is directly corrected and processed to obtain a corrected normal operation node, the number of surviving nodes in the cluster system is ensured, and the node security activity in the cluster system is improved.
In a second aspect, the present application further provides a cluster management apparatus. The device comprises:
the monitoring module is used for monitoring the running state of each node in the cluster system and determining abnormal nodes in the cluster system according to the running state;
the processing module is used for placing the abnormal node into an isolation area if the node attribute of the abnormal node is a target attribute, wherein the target attribute represents that the current node is providing service, and the isolation area is used for blocking the correction processing of the cluster system on the abnormal node;
and the updating module is used for determining a first target node in other nodes except the abnormal node, updating the node attribute of the first target node into the target attribute to obtain a new target attribute node, and executing business service through the new target attribute node.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:
monitoring the running state of each node in a cluster system, and determining abnormal nodes in the cluster system according to the running state;
if the node attribute of the abnormal node is a target attribute, the abnormal node is placed in an isolation area, the target attribute represents that the current node is providing service, and the isolation area is used for blocking the correction processing of the cluster system on the abnormal node;
and determining a first target node in other nodes except the abnormal node, updating the node attribute of the first target node to the target attribute to obtain a new target attribute node, and executing business service through the new target attribute node.
In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
monitoring the running state of each node in a cluster system, and determining abnormal nodes in the cluster system according to the running state;
if the node attribute of the abnormal node is a target attribute, the abnormal node is placed in an isolation area, the target attribute represents that the current node is providing service, and the isolation area is used for blocking the correction processing of the cluster system on the abnormal node;
and determining a first target node in other nodes except the abnormal node, updating the node attribute of the first target node to the target attribute to obtain a new target attribute node, and executing service through the new target attribute node.
In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:
monitoring the running state of each node in a cluster system, and determining abnormal nodes in the cluster system according to the running state;
if the node attribute of the abnormal node is a target attribute, the abnormal node is placed in an isolation area, the target attribute represents that the current node is providing service, and the isolation area is used for blocking the correction processing of the cluster system on the abnormal node;
and determining a first target node in other nodes except the abnormal node, updating the node attribute of the first target node to the target attribute to obtain a new target attribute node, and executing business service through the new target attribute node.
The cluster management method, the cluster management device, the computer equipment, the storage medium and the computer program product monitor the running state of each node in the cluster system, and determine abnormal nodes in the cluster system according to the running state; if the node attribute of the abnormal node is a target attribute, the abnormal node is placed in an isolation area, the target attribute represents that the current node is providing service, and the isolation area is used for blocking the correction processing of the cluster system on the abnormal node; and determining a first target node in other nodes except the abnormal node, updating the node attribute of the first target node to the target attribute to obtain a new target attribute node, and executing service through the new target attribute node. By adopting the method, each node is monitored in real time in the cluster system, if the node attribute of the abnormal node is detected to be the target attribute, the abnormal node is directly placed into the isolation area, and then a new target attribute node is selected to execute the service, so that the cluster system can uninterruptedly provide the service.
Drawings
FIG. 1 is a flow diagram illustrating a cluster management method according to an embodiment;
FIG. 2 is a flowchart illustrating the abnormal node detection step according to an embodiment;
FIG. 3 is a flowchart illustrating the steps of determining a new target attribute node in one embodiment;
FIG. 4 is a schematic diagram illustrating node election in a cluster system according to an embodiment;
FIG. 5 is a flowchart illustrating the isolation step for recovering abnormal nodes in the isolation region in one embodiment;
FIG. 6 is a flow diagram illustrating an example of a method for cluster management in one embodiment;
FIG. 7 is a block diagram that illustrates the architecture of a cluster management appliance in one embodiment;
FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
In an embodiment, as shown in fig. 1, a cluster management method is provided, and this embodiment exemplifies that the method is applied to a distributed cluster system, and in this embodiment, the method includes the following steps:
In implementation, a cluster system includes a plurality of distributed nodes, where the nodes may execute a software program running on the cluster system to provide a software service (also referred to as a service) for a user, and a heartbeat detection tool and a cluster management tool are also preset in the cluster system, where the heartbeat detection tool is used to monitor the running state of each node included in the cluster system, so as to determine whether an abnormal node exists in the cluster system according to the running state of each node. And the cluster management tool processes the abnormal node determined by the heartbeat detection tool.
If the abnormal node exists, the heartbeat detection tool can send the detected information of the abnormal node to a cluster management tool in the cluster system, and the cluster management tool processes the abnormal node.
And step 104, if the node attribute of the abnormal node is the target attribute, placing the abnormal node into the isolation area.
The node attributes comprise target attributes and non-target attributes, the target attributes represent that the current node provides business services, and the non-target attributes represent that the current node does not participate in the business services. Specifically, the node attribute of each node in the cluster system is determined by whether the node is providing a business service. The cluster system configures a uniform virtual IP (Internet Protocol) address (also called a virtual access address) for a user, when the user initiates a request for a target business service by accessing the virtual access address, the cluster system will select at least one node in the cluster system as a node providing the target business service in response to the target business service request, and then the cluster system points the virtual access address to the selected node to execute the target business service, at this time, the node attribute of the selected node for providing the target business service is marked as a target attribute, and the node attributes of other nodes not selected in the cluster system are marked as non-target attributes.
In implementation, when the heartbeat detection tool detects that an abnormal node exists in the cluster system, the heartbeat detection tool notifies a cluster management tool in the cluster system of information of the abnormal node, then the cluster management tool detects a node attribute of the abnormal node, and if the node attribute of the abnormal node is a target attribute, the cluster management tool immediately puts the abnormal node into an isolation area, so that correction processing of the abnormal node by the cluster system is blocked, and the abnormal node is also blocked from participating in target service.
Optionally, all the nodes in the cluster system are non-target attribute nodes, that is, within a period of time, no service is received in the cluster system, and at this time, each node in the cluster system does not participate in any service, so that the node attributes of each node in the cluster system are non-target attribute nodes. And even under the condition that the cluster system does not bear any business service, the heartbeat detection tool keeps detecting the running state of each node in the cluster system and finds abnormal nodes in time.
Optionally, the isolation region of the abnormal node of the target attribute may be set on an electronic device outside the cluster system, or may be set on an offline device in the cluster system, which is not limited in this embodiment. The isolation area is used for carrying out communication isolation on the nodes placed in the isolation area, blocking the nodes in the isolation area from providing service services outwards and blocking the cluster system from correcting the nodes.
And 106, determining a first target node in other nodes except the abnormal node, updating the node attribute of the first target node into the target attribute to obtain a new target attribute node, and executing the service through the new target attribute node.
In implementation, node election policies corresponding to the respective service services are pre-stored in the cluster system, and the node election policies corresponding to different service services may also be different, for example, when a service requires higher data processing performance and request response rate, the node election policies may be used to screen the respective nodes in the cluster system in combination with conditions such as node performance parameters and network reachability. The enumerated nodes are used as target attribute nodes for providing business services. Therefore, when the abnormal node of the target attribute is placed in the isolation area, the cluster management tool can re-determine the target attribute node according to the corresponding node election strategy. Specifically, the cluster management tool elects a first target node from other nodes except the abnormal node according to the node election policy, and updates the node attribute of the first target node to the target attribute, so that the first target node serves as a new target attribute node to continue to receive the service.
In the cluster management method, a heartbeat detection tool monitors the running state of each node in a cluster system, and determines abnormal nodes in the cluster system according to the running state; if the node attribute of the abnormal node is the target attribute, placing the abnormal node into an isolation area; the cluster management tool determines a first target node in other nodes except the abnormal node, updates the node attribute of the first target node into the target attribute to obtain a new target attribute node, and executes the service through the new target attribute node. By adopting the method, each node is monitored in real time in the cluster system, if an abnormal node is detected and the node attribute of the abnormal node is the target attribute, the abnormal node is directly placed in the isolation area, the cluster system is blocked from correcting the abnormal node, and a new target attribute node is selected to execute the business service, so that the cluster system uninterruptedly provides the business service, and the continuity and the completion efficiency of the business service are improved.
In one embodiment, as shown in fig. 2, the specific process of step 102 includes the following steps:
In implementation, the heartbeat detection tool receives a heartbeat signal sent from each distributed node (referred to as a node for short) included in the cluster system, so as to monitor an operation state of each node in the cluster system. Specifically, heartbeat signal detection conditions such as a heartbeat signal sending cycle and a heartbeat signal strength of each node are stored in the heartbeat detection tool in advance, and the heartbeat detection tool distinguishes received heartbeat signals according to the heartbeat signal detection conditions to determine whether the running state of each node is abnormal or not.
And 204, if the heartbeat signal sent by the second target node does not meet the preset heartbeat signal detection condition, determining that the second target node is an abnormal node.
In implementation, if the heartbeat signal sent by the second target node does not meet the preset heartbeat signal detection condition, the heartbeat detection tool determines that the second target node is an abnormal node, and then the information of the abnormal node can be sent to the cluster management tool in the cluster system. For example, if the heartbeat detection tool does not receive the heartbeat signal sent by the second target node within the preset sending period, the heartbeat detection tool determines that the second target node is an abnormal node. The heartbeat signal detection condition can be set based on the actual characteristics of each node device, and the embodiments of the present application are not limited.
In this embodiment, the heartbeat signal sent by each node is monitored in real time by the heartbeat detection tool, and the running state of each node in the cluster system is judged according to the preset detection condition of the heartbeat signal, so that the timeliness of discovering the abnormal node in the cluster system is improved.
In an embodiment, when the node attribute of the abnormal node is the target attribute, the cluster management tool directly places the abnormal node of the target attribute into an isolation area, so as to block the correction processing of the cluster system on the current target attribute node, and simultaneously, block the service being provided by the current target attribute node, so that a node with a new target attribute needs to be selected in the cluster system to continuously provide the service, as shown in fig. 3, the specific processing procedure of step 106 includes the following steps:
The node election criteria included in the node election policy correspond to node requirements in the service to be provided, for example, the node election criteria may include network response speed, node data processing performance, storage capacity, and the like, which is not limited in the embodiment of the present application.
In implementation, the cluster management tool determines a first target node in other nodes except the abnormal node according to a preset node election strategy. The first target node is the node which best meets the node election standard in the current cluster system.
And step 304, adding an attribute tag of the target attribute to the first target node, and updating the node attribute of the first target node into the target attribute.
In implementation, the node attribute of each node in the cluster system is represented by an attribute tag, so that when a new node (i.e., a first target node) is elected in the cluster system to inherit an original abnormal target attribute node, the cluster management tool adds the attribute tag of the target attribute to the first target node, and the attribute tag is used for updating the node attribute of the first target node to the target attribute.
And step 306, according to the attribute label of the first target node and the cluster virtual access address, pointing the cluster virtual access address to the first target node to obtain a new target attribute node.
In implementation, in all nodes of the cluster system, the cluster management tool directs the virtual access address (virtual IP address) to the first target node carrying the attribute tag of the target attribute, and the first target node is used as a new target attribute node to continue executing the current unfinished service. After that, if the user initiates a service request for the same service many times, the cluster system does not need to perform node election every time, and can determine that the new target attribute node executes the service based on the pointing relationship between the attribute tag and the virtual access address, thereby improving the service efficiency of the cluster system.
Optionally, after adding a new node in the cluster system, recovering the original abnormal node from the isolation area, or deleting a part of nodes, if the user initiates a service request, the node in the cluster system may be reselected to ensure that the elected target attribute node in the cluster system is the optimal node for providing service in all nodes of the current cluster system.
In this embodiment, the first selected target node updates the node attribute of the target node and re-determines the direction of the virtual access address in the cluster system, so as to obtain a new target attribute node to continue providing the service, thereby ensuring continuity of the service.
In one embodiment, the cluster management method further includes: and if the node attribute of the abnormal node is the non-target attribute, correcting the abnormal node through a preset abnormal processing strategy to obtain the corrected node with the non-target attribute.
In implementation, if the cluster management tool determines that the node attribute of the abnormal node is the non-target attribute, it indicates that the abnormal node does not currently execute any service, and at this time, the abnormal node is processed without affecting the provision of the service by the cluster system, so that it is not necessary to place the abnormal node with the non-target attribute in the isolation region, the cluster management tool directly corrects the abnormal node through a preset abnormal processing strategy, and recovers the performance of the abnormal node through operations such as node parameter modification, thereby obtaining a node which normally operates, and the node attribute of the node after correction is maintained as the non-target attribute. As shown in fig. 4, the cluster system includes A, B, C nodes, where a node a is a node with a target attribute (i.e., a node that is providing a service), B and C are nodes with non-target attributes, and if the node B or the node C is an abnormal node, the node B or the node C is directly modified in the cluster system, and the provision of the service by the cluster system is not affected.
In this embodiment, the node attribute of the abnormal node is detected to determine that the abnormal node is a node with a non-target attribute, so that the service provided by the cluster system is not affected, and therefore, the abnormal problem of the abnormal node is directly corrected, the corrected normal operation node is obtained, and the node security activity in the cluster system is improved.
In an embodiment, after the abnormal node of the target attribute is placed in the isolation area, the number of nodes in the cluster system that normally operate is decreased, and in order to ensure the number of surviving nodes in the cluster system, a cluster manager should be timely notified that the node in the cluster system is abnormal and the number of nodes is decreased, so after step 104, the cluster management method further includes: and generating alarm information of the abnormity of the target attribute node, and outputting and displaying the alarm information.
In implementation, in the cluster management process, if the node attribute of the abnormal node is detected as the target attribute, the cluster management tool generates alarm information of the abnormal node of the target attribute, and outputs and displays the alarm information to inform a manager of the cluster system. The generated alarm information may be text alarm information and voice alarm information, and the embodiment of the present application is not limited.
In this embodiment, when the node attribute of the abnormal node is the target attribute, the abnormal node is placed in the isolation region, and the cluster management system is prevented from correcting the abnormal node and providing the service to the node, so that the cluster management tool generates the alarm information for the abnormal node with the target attribute and notifies the manager of the current operation condition of the cluster system, thereby improving the management timeliness of the cluster management.
In one embodiment, as shown in fig. 5, the specific process of step 102 includes the following steps:
In an implementation, when an abnormal node exists in the isolation zone, in order to keep more available nodes in the cluster system, a user can restore the abnormal node in the isolation zone to the cluster system. Specifically, a user sends an isolation release request to the electronic device where the isolation area is located, and the electronic device responds to the isolation release request, updates the node attribute of the abnormal node in the isolation area from the target attribute to the non-target attribute, and adds the node to the cluster system again.
In implementation, in the cluster system, the node attribute of the node recovered from the isolation region is a non-target attribute, and at this time, under the condition that the node attribute of the abnormal node is the non-target attribute, the cluster management tool may correct the abnormal node according to a preset abnormal processing policy, so as to obtain the node with the non-target attribute in the normal operation state after correction.
As shown in fig. 4, the abnormal node (node a) in the isolation area is restored to the cluster system again, and a non-target attribute node that operates normally is obtained, and optionally, the non-target attribute node may become a target attribute node again by election when the service is provided next time, so as to provide the service.
In this embodiment, the abnormal node in the isolation area is restored to the cluster system to become an abnormal node with a non-target attribute, and then the processing of the abnormal node does not affect the service provided by the cluster system, so that the abnormal problem of the abnormal node is directly corrected to obtain a corrected normal operation node, the number of live nodes in the cluster system is ensured, and the node security activity in the cluster system is improved.
In one embodiment, as shown in fig. 6, an example of a cluster management method is provided, the example including the steps of:
step 601, monitoring the running state of each node in the cluster system, and determining abnormal nodes in the cluster system according to the running state.
Step 607, in the case that the node attribute of the abnormal node in the cluster system is the non-target attribute, the abnormal node is modified according to the preset abnormal processing strategy, so as to obtain the modified node with the non-target attribute in the cluster system.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present application further provides a cluster management apparatus for implementing the above cluster management method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the cluster management device provided below may refer to the limitations in the above cluster management method, and details are not described here.
In one embodiment, as shown in fig. 7, there is provided a cluster management apparatus 700, including: a listening module 710, a processing module 720, and an updating module 730, wherein:
the monitoring module 710 is configured to monitor an operating state of each node in the cluster system, and determine an abnormal node in the cluster system according to the operating state;
the processing module 720 is configured to, if the node attribute of the abnormal node is a target attribute, place the abnormal node in an isolation area, where the target attribute represents that the current node is providing a service, and the isolation area is used to block the cluster system from performing correction processing on the abnormal node;
the updating module 730 is configured to determine a first target node in the nodes except the abnormal node, update the node attribute of the first target node to a target attribute, obtain a new target attribute node, and execute the service through the new target attribute node.
By adopting the device, each node is monitored in real time in the cluster system, if the node attribute of the abnormal node is detected to be the target attribute, the abnormal node is directly placed in the isolation area, and then a new target attribute node is selected to execute the service, so that the cluster system can uninterruptedly provide the service.
In one embodiment, the monitoring module 710 is further configured to monitor a heartbeat signal sent by each node in the cluster system;
and if the heartbeat signal sent by the second target node does not meet the preset heartbeat signal detection condition, determining that the second target node is an abnormal node.
In one embodiment, the updating module 730 is configured to determine a first target node among other nodes except the abnormal node according to a preset node election policy;
adding an attribute tag of a target attribute for the first target node, and updating the node attribute of the first target node into the target attribute;
and pointing the cluster virtual access address to the first target node according to the attribute label of the first target node and the cluster virtual access address to obtain a new target attribute node.
In one embodiment, cluster management apparatus 700 further comprises:
and the correction processing module is used for correcting the abnormal node through a preset abnormal processing strategy if the node attribute of the abnormal node is the non-target attribute, so as to obtain the corrected node with the non-target attribute.
In one embodiment, cluster management apparatus 700 further comprises:
and the alarm module is used for generating alarm information of the abnormity of the target attribute node and outputting and displaying the alarm information.
In one embodiment, cluster management apparatus 700 further comprises:
the isolation removing module is used for responding to an isolation removing request aiming at abnormal nodes in the isolation area, updating the node attribute of the abnormal node into a non-target attribute and adding the non-target attribute to the cluster system;
and the correction processing module is used for correcting the abnormal node according to a preset abnormal processing strategy under the condition that the node attribute of the abnormal node in the cluster system is the non-target attribute, so as to obtain the node with the corrected non-target attribute in the cluster system.
All or part of the modules in the cluster management device can be implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 8. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a cluster management method.
It will be appreciated by those skilled in the art that the configuration shown in fig. 8 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
monitoring the running state of each node in a cluster system, and determining abnormal nodes in the cluster system according to the running state;
if the node attribute of the abnormal node is a target attribute, the abnormal node is placed in an isolation area, the target attribute represents that the current node is providing business service, and the isolation area is used for blocking the abnormal node from executing the business service and blocking the cluster system from correcting the abnormal node;
and determining a first target node in other nodes except the abnormal node, updating the node attribute of the first target node to the target attribute to obtain a new target attribute node, and executing service through the new target attribute node.
In one embodiment, the processor when executing the computer program further performs the steps of:
monitoring heartbeat signals sent by each node in a cluster system;
and if the heartbeat signal sent by the second target node does not meet the preset heartbeat signal detection condition, determining that the second target node is an abnormal node.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
determining a first target node in other nodes except the abnormal node according to a preset node election strategy;
adding an attribute tag of a target attribute to the first target node, and updating the node attribute of the first target node into the target attribute;
and pointing the cluster virtual access address to the first target node according to the attribute label of the first target node and the cluster virtual access address to obtain a new target attribute node.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and if the node attribute of the abnormal node is the non-target attribute, correcting the abnormal node through a preset abnormal processing strategy to obtain a corrected node with the non-target attribute.
In one embodiment, the processor when executing the computer program further performs the steps of:
and generating alarm information of the abnormity of the target attribute node, and outputting and displaying the alarm information.
In one embodiment, the processor when executing the computer program further performs the steps of:
in response to a request for releasing isolation of the abnormal node in the isolation area, updating the node attribute of the abnormal node into a non-target attribute, and adding the node attribute to the cluster system;
and under the condition that the node attribute of the abnormal node in the cluster system is the non-target attribute, correcting the abnormal node according to a preset abnormal processing strategy to obtain the node of the non-target attribute after correction in the cluster system.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, databases, or other media used in the embodiments provided herein can include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases involved in the embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.
Claims (10)
1. A method for cluster management, the method comprising:
monitoring the running state of each node in a cluster system, and determining abnormal nodes in the cluster system according to the running state;
if the node attribute of the abnormal node is a target attribute, the abnormal node is placed in an isolation area, the target attribute represents that the current node is providing business service, and the isolation area is used for blocking the abnormal node from executing the business service and blocking the cluster system from correcting the abnormal node;
and determining a first target node in other nodes except the abnormal node, updating the node attribute of the first target node to the target attribute to obtain a new target attribute node, and executing business service through the new target attribute node.
2. The method of claim 1, wherein the monitoring the operating status of each node in the cluster system, and determining an abnormal node in the cluster system according to the operating status comprises:
monitoring heartbeat signals sent by each node in a cluster system;
and if the heartbeat signal sent by the second target node does not meet the preset heartbeat signal detection condition, determining that the second target node is an abnormal node.
3. The method according to claim 1, wherein the determining a first target node among the nodes other than the abnormal node, updating the node attribute of the first target node to the target attribute, and obtaining a new target attribute node comprises:
determining a first target node in other nodes except the abnormal node according to a preset node election strategy;
adding an attribute tag of a target attribute to the first target node, and updating the node attribute of the first target node into the target attribute;
and pointing the cluster virtual access address to the first target node according to the attribute label of the first target node and the cluster virtual access address to obtain a new target attribute node.
4. The method of claim 1, further comprising:
and if the node attribute of the abnormal node is the non-target attribute, correcting the abnormal node through a preset abnormal processing strategy to obtain a corrected node with the non-target attribute.
5. The method of claim 1, wherein after placing the abnormal node in the isolation area if the node attribute of the abnormal node is the target attribute, the method further comprises:
and generating alarm information of the abnormity of the target attribute node, and outputting and displaying the alarm information.
6. The method of claim 1, further comprising:
in response to a request for releasing isolation of the abnormal node in the isolation area, updating the node attribute of the abnormal node into a non-target attribute, and adding the node attribute to the cluster system;
and under the condition that the node attribute of the abnormal node in the cluster system is the non-target attribute, correcting the abnormal node according to a preset abnormal processing strategy to obtain the node of the non-target attribute after correction in the cluster system.
7. An apparatus for cluster management, the apparatus comprising:
the monitoring module is used for monitoring the running state of each node in the cluster system and determining abnormal nodes in the cluster system according to the running state;
the processing module is used for placing the abnormal node into an isolation area if the node attribute of the abnormal node is a target attribute, wherein the target attribute represents that the current node is providing service, and the isolation area is used for blocking the correction processing of the cluster system on the abnormal node;
and the updating module is used for determining a first target node in other nodes except the abnormal node, updating the node attribute of the first target node into the target attribute to obtain a new target attribute node, and executing business service through the new target attribute node.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210711547.7A CN115102962B (en) | 2022-06-22 | 2022-06-22 | Cluster management method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210711547.7A CN115102962B (en) | 2022-06-22 | 2022-06-22 | Cluster management method, device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115102962A true CN115102962A (en) | 2022-09-23 |
CN115102962B CN115102962B (en) | 2024-08-23 |
Family
ID=83292945
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210711547.7A Active CN115102962B (en) | 2022-06-22 | 2022-06-22 | Cluster management method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115102962B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010103695A (en) * | 2008-10-22 | 2010-05-06 | Ntt Data Corp | Cluster system, cluster server and cluster control method |
US20170004057A1 (en) * | 2015-06-30 | 2017-01-05 | International Business Machines Corporation | Cluster file system support for extended network service addresses |
CN106982259A (en) * | 2017-04-19 | 2017-07-25 | 聚好看科技股份有限公司 | The failure solution of server cluster |
CN108092850A (en) * | 2017-12-12 | 2018-05-29 | 郑州云海信息技术有限公司 | A kind of cluster server method for diagnosing faults and system based on heartbeat mechanism |
CN110677480A (en) * | 2019-09-29 | 2020-01-10 | 北京浪潮数据技术有限公司 | Node health management method and device and computer readable storage medium |
CN111212127A (en) * | 2019-12-29 | 2020-05-29 | 浪潮电子信息产业股份有限公司 | Storage cluster, service data maintenance method, device and storage medium |
CN112035326A (en) * | 2020-09-03 | 2020-12-04 | 中国银行股份有限公司 | Abnormal node task processing method and device based on cluster node mutual detection |
US20210136146A1 (en) * | 2019-10-31 | 2021-05-06 | Elasticsearch B.V. | Node Clustering Configuration |
CN113626238A (en) * | 2021-07-23 | 2021-11-09 | 济南浪潮数据技术有限公司 | ctdb service health state monitoring method, system, device and storage medium |
CN114363162A (en) * | 2021-12-31 | 2022-04-15 | 支付宝(杭州)信息技术有限公司 | Block chain log generation method and device, electronic equipment and storage medium |
-
2022
- 2022-06-22 CN CN202210711547.7A patent/CN115102962B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010103695A (en) * | 2008-10-22 | 2010-05-06 | Ntt Data Corp | Cluster system, cluster server and cluster control method |
US20170004057A1 (en) * | 2015-06-30 | 2017-01-05 | International Business Machines Corporation | Cluster file system support for extended network service addresses |
CN106982259A (en) * | 2017-04-19 | 2017-07-25 | 聚好看科技股份有限公司 | The failure solution of server cluster |
CN108092850A (en) * | 2017-12-12 | 2018-05-29 | 郑州云海信息技术有限公司 | A kind of cluster server method for diagnosing faults and system based on heartbeat mechanism |
CN110677480A (en) * | 2019-09-29 | 2020-01-10 | 北京浪潮数据技术有限公司 | Node health management method and device and computer readable storage medium |
US20210136146A1 (en) * | 2019-10-31 | 2021-05-06 | Elasticsearch B.V. | Node Clustering Configuration |
CN111212127A (en) * | 2019-12-29 | 2020-05-29 | 浪潮电子信息产业股份有限公司 | Storage cluster, service data maintenance method, device and storage medium |
CN112035326A (en) * | 2020-09-03 | 2020-12-04 | 中国银行股份有限公司 | Abnormal node task processing method and device based on cluster node mutual detection |
CN113626238A (en) * | 2021-07-23 | 2021-11-09 | 济南浪潮数据技术有限公司 | ctdb service health state monitoring method, system, device and storage medium |
CN114363162A (en) * | 2021-12-31 | 2022-04-15 | 支付宝(杭州)信息技术有限公司 | Block chain log generation method and device, electronic equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
谢丽霞;汪子荧;: "一种在线集群异常作业预测方法", 北京邮电大学学报, no. 05 * |
Also Published As
Publication number | Publication date |
---|---|
CN115102962B (en) | 2024-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109670950B (en) | Transaction monitoring method, device, equipment and storage medium based on blockchain | |
US8656219B2 (en) | System and method for determination of the root cause of an overall failure of a business application service | |
US5781737A (en) | System for processing requests for notice of events | |
US11012476B2 (en) | Protecting IOT devices by behavioural analysis of their file system | |
US11416819B2 (en) | Connecting contact center resources using DLT for IOT solutions | |
US5768524A (en) | Method for processing requests for notice of events | |
CN113220540B (en) | Service management method, device, computer equipment and storage medium | |
CN107508700B (en) | Disaster recovery method, device, equipment and storage medium | |
US5768523A (en) | Program product for processing requests for notice of events | |
CN113489149B (en) | Power grid monitoring system service master node selection method based on real-time state sensing | |
US20240048611A1 (en) | Detecting anomalies in a distributed application | |
CN111342986A (en) | Distributed node management method and device, distributed system and storage medium | |
US11824704B2 (en) | Computer network troubleshooting and diagnostics using metadata | |
CN115102962B (en) | Cluster management method, device, computer equipment and storage medium | |
US11658889B1 (en) | Computer network architecture mapping using metadata | |
WO2020037607A1 (en) | Data transmission method and apparatus | |
US10027544B1 (en) | Detecting and managing changes in networking devices | |
CN111258860B (en) | Data alarm method, device, computer equipment and storage medium | |
CN110890977B (en) | Host node monitoring method and device of cloud platform and computer equipment | |
CN111131198B (en) | Updating method and device for network security policy configuration | |
CN111698266A (en) | Service node calling method, device, equipment and readable storage medium | |
CN110955647A (en) | Database assistance method, database assistance device, computer equipment and storage medium | |
CN114285722B (en) | Distributed storage cluster node communication alarm method, device, equipment and medium | |
US12126522B2 (en) | Computer network troubleshooting and diagnostics using metadata | |
CN115150253B (en) | Fault root cause determining method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |