CN114443438A - Node state detection method, node abnormity processing method and device - Google Patents
Node state detection method, node abnormity processing method and device Download PDFInfo
- Publication number
- CN114443438A CN114443438A CN202210111973.7A CN202210111973A CN114443438A CN 114443438 A CN114443438 A CN 114443438A CN 202210111973 A CN202210111973 A CN 202210111973A CN 114443438 A CN114443438 A CN 114443438A
- Authority
- CN
- China
- Prior art keywords
- target
- node
- detection index
- detection
- index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 175
- 238000003672 processing method Methods 0.000 title abstract description 5
- 238000000034 method Methods 0.000 claims abstract description 50
- 230000002159 abnormal effect Effects 0.000 claims abstract description 40
- 238000012545 processing Methods 0.000 claims abstract description 21
- 238000013507 mapping Methods 0.000 claims description 22
- 238000004891 communication Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 8
- 230000000737 periodic effect Effects 0.000 claims description 2
- 239000003795 chemical substances by application Substances 0.000 description 21
- 238000007726 management method Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000036541 health Effects 0.000 description 5
- 230000005856 abnormality Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 241000412611 Consul Species 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/073—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a memory management context, e.g. virtual memory or cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/80—Database-specific techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/805—Real-time
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The application discloses a node state detection method, a node abnormity processing method and a node abnormity processing device. The method comprises the following steps: receiving a target detection index sent by an agent terminal deployed on a target node in a target cluster and operation data corresponding to the target detection index, wherein the target node is a node of which the target detection index is in an abnormal state; determining a target abnormal type corresponding to the target detection index according to the operation data; inquiring a target exception handling script corresponding to the target exception type; and under the condition that the target exception handling script exists, executing the target exception handling script to enable the target detection index of the target node to be recovered to a normal state. According to the method, when the node in the target cluster is abnormal, the abnormal processing script can be automatically acquired and executed according to the abnormal type of the node, so that the node can be self-healed in time after the node is abnormal, and the target cluster is ensured to be continuously in a high-availability state.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method for detecting a node state, a method for processing a node exception, and an apparatus for processing a node exception.
Background
At present, the k8s official part provides the fault detection function of the node, but as mentioned above, the detection function of the node still has many problems in actual production and does not have the capability of self-healing of the node. The node self-healing capability is added on the basis of the node fault detection function provided by the k8s official party, but the node self-healing capability is only restarted according to the surface phenomenon, and the problem cannot be solved by the simple restart in the actual production, because the restart of a component is usually not successful after the start failure.
Disclosure of Invention
In order to solve the technical problem or at least partially solve the technical problem, the application provides a node state detection method, a node abnormality processing method and a node abnormality processing device.
According to an aspect of the embodiments of the present application, there is further provided a method for processing a node exception, where the method is applied to a controller deployed on a master node in a target cluster, and the method includes:
receiving a target detection index sent by an agent terminal deployed on a target node in a target cluster and operation data corresponding to the target detection index, wherein the target node is a node of which the target detection index is in an abnormal state;
determining a target abnormal type corresponding to the target detection index according to the operation data;
inquiring a target exception handling script corresponding to the target exception type;
and under the condition that the target exception handling script exists, executing the target exception handling script to enable the target detection index of the target node to be recovered to a normal state.
Further, the querying an exception handling script corresponding to the target exception type includes:
reading a mapping relation between an exception type and an exception handling script from a cache of the controller;
and acquiring a target exception handling script corresponding to the target exception type based on the mapping relation.
Further, in the absence of the exception handling script, the method further comprises:
sending the target detection index and operation data corresponding to the target detection index to a target client;
receiving a target exception handling script fed back by the target client based on the target detection index and the running data;
and establishing a mapping relation between the target exception handling script and the target exception type, and storing the mapping relation to a cache of the controller.
According to another aspect of the embodiments of the present application, a method for detecting a node state is provided, where the method is applied to a proxy terminal, and the proxy terminal is deployed in each node in a target cluster, and the method includes:
carrying out periodic detection on the nodes according to detection strategies corresponding to the detection indexes to obtain operation data corresponding to each detection index in the nodes;
determining state information corresponding to the detection index based on the operation data;
determining the detection index of which the state information is in an abnormal state as a target detection index;
and sending the running data corresponding to the target detection index to a controller so that the controller executes exception handling operation on the target node according to the running data.
Further, when the detection index is a network index, the periodically detecting the node according to the detection strategy corresponding to the detection index to obtain the operation data corresponding to each detection index in the node includes:
determining a network detection strategy corresponding to the network index;
detecting network parameters respectively corresponding to a management network, a service network and a storage network where the node is located by utilizing the network detection strategy;
and determining the network parameters respectively corresponding to the management network, the service network and the storage network as the operation data corresponding to the network index.
Further, when the detection index is a component index, the periodically detecting the node according to the detection strategy corresponding to the detection index to obtain the operation data corresponding to each detection index in the node includes:
determining a component detection strategy corresponding to the component index;
and querying a log file of the node by using the component detection strategy, and counting the operation data of the components in the node from the log file.
According to another aspect of the embodiments of the present application, there is also provided a node exception handling apparatus, including:
the system comprises a receiving module, a processing module and a sending module, wherein the receiving module is used for receiving a target detection index sent by an agent terminal deployed on a target node in a target cluster and operation data corresponding to the target detection index, and the target node is a node of which the target detection index is in an abnormal state;
the determining module is used for determining a target abnormal type corresponding to the target detection index according to the running data;
the query module is used for querying a target exception handling script corresponding to the target exception type;
and the execution module is used for executing the target exception handling script under the condition that the target exception handling script exists so as to enable the target detection index of the target node to be recovered to a normal state.
According to another aspect of the embodiments of the present application, there is also provided a device for detecting a node state, including:
the detection module is used for periodically detecting the nodes according to the detection strategies corresponding to the detection indexes to obtain the operation data corresponding to each detection index in the nodes;
the determining module is used for determining the state information corresponding to the detection index based on the operation data;
the processing module is used for determining the detection index of which the state information is in the abnormal state as a target detection index;
and the sending module is used for sending the running data corresponding to the target detection index to the controller so that the controller executes exception handling operation on the target node according to the running data.
According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program that executes the above steps when the program is executed.
According to another aspect of the embodiments of the present application, there is also provided an electronic apparatus, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus; wherein: a memory for storing a computer program; a processor for executing the steps of the method by running the program stored in the memory.
Embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the steps of the above method.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages: according to the method, when the node in the target cluster is abnormal, the abnormal processing script can be automatically acquired and executed according to the abnormal type of the node, so that the node can be self-healed in time after the node is abnormal, and the target cluster is ensured to be continuously in a high-availability state.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 is a flowchart of a method for detecting a node state according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a target cluster according to an embodiment of the present application;
fig. 3 is a flowchart of a method for processing a node exception according to an embodiment of the present application;
fig. 4 is a block diagram of a node state detection apparatus according to an embodiment of the present disclosure;
fig. 5 is a block diagram of a device for processing a node exception according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer and more fully described below with reference to the accompanying drawings in the embodiments of the present application, it is obvious that the described embodiments are some, but not all, embodiments of the present application, and the exemplary embodiments and descriptions thereof in the present application are used for explaining the present application and do not constitute an undue limitation on the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another similar entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The embodiment of the application provides a node state detection method, a node abnormity processing method and a node abnormity processing device. The method provided by the embodiment of the invention can be applied to any required electronic equipment, for example, the electronic equipment can be electronic equipment such as a server and a terminal, and the method is not particularly limited herein, and is hereinafter simply referred to as electronic equipment for convenience in description.
According to an aspect of an embodiment of the present application, a method embodiment of a method for detecting a node state is provided. Fig. 1 is a flowchart of a method for detecting a node state according to an embodiment of the present application, and as shown in fig. 1, the method includes:
and step S11, periodically detecting the nodes according to the detection strategies corresponding to the detection indexes to obtain the operation data corresponding to each detection index in the nodes.
The method provided by the embodiment of the application is applied to the agent terminal, the agent terminal is deployed on each node in the target cluster, as shown in fig. 2, each node is deployed with an agent terminal agent, and the agent terminal is used for monitoring the operation data of the node where the agent terminal is located, for example, the operation data may be: network parameters, component parameters, memory parameters, and the like.
In this embodiment of the present application, when the detection index is a network index, the node is periodically detected according to a detection policy corresponding to the detection index to obtain operation data corresponding to each detection index in the node, including the following steps a 1-A3:
step A1, determining the network detection strategy corresponding to the network index.
In the embodiment of the present application, the network detection policy corresponding to the network index is: and detecting a management network, a service network and a storage network corresponding to the node by using a Gossip protocol. It should be noted that, using Gossip protocol, nodes can mutually guarantee the reachability of the network of the other party.
Step A2, using the network detection strategy to detect the network parameters respectively corresponding to the management network, the service network and the storage network where the node is located.
Step A3, determining the network parameters respectively corresponding to the management network, the service network and the storage network as the operation data corresponding to the network index.
In the embodiment of the application, the agent terminal firstly queries the first node corresponding to the management network, then sends a message to the first node based on the Gossip protocol, and determines the first network parameter of the management network where the node corresponding to the agent terminal is located. The proxy terminal firstly inquires a second node corresponding to the service network, then sends a message to the second node based on the Gossip protocol, and determines a second network parameter of a management network where the node corresponding to the proxy terminal is located. The proxy terminal firstly inquires a third node corresponding to the storage network, then sends a message to the third node based on the Gossip protocol, and determines a third network parameter of the storage network where the node corresponding to the proxy terminal is located. Then, the network parameters respectively corresponding to the management network, the service network and the storage network can be used as the operation data corresponding to the network index. The network parameters may include transmission rate, transmission delay, packet loss rate, and the like.
As an example, each proxy terminal communicates using Gossip protocol, and at intervals, the proxy terminal in each node randomly selects several nodes to send Gossip messages, and the other nodes randomly select other nodes to send Gossip messages again. After a period of time, the nodes of the whole cluster can receive the Gossip message. Each node may know all other nodes or only a few neighbor nodes, as long as the nodes can communicate through the network, and finally, the consul cluster states learned by all the nodes are consistent.
In this embodiment of the present application, when a detection index is a component index, periodically detecting a node according to a detection strategy corresponding to the detection index to obtain operation data corresponding to each detection index in the node, including the following steps B1-B2:
and step B1, determining a component detection strategy corresponding to the component index.
And step B2, inquiring the log file of the node by using the component detection strategy, and counting the operation data of the components in the node from the log file.
In the embodiment of the application, the agent terminal can monitor the operation data of the key component in real time in a system log detection mode. In addition, heartbeat data of the kubel assembly can be received every preset time, and when the heartbeat data sent by the kubel assembly is not received within the preset time, health detection is conducted on the kubel assembly, and operation data of the kubel assembly are obtained. The operating data of the key components include: the number of reboots of the kubelet component, health information of the kubelet component, and so forth.
In the embodiment of the application, under the condition that the detection index is a disk index, the use condition of the system disk is monitored in a mode of executing a df-h command and an iostat command at intervals, and the disk capacity of the system disk is determined based on the use condition. In addition, under the condition that the detection index is the memory index, the use condition of the memory can be monitored, so that the occupancy rate of the memory is determined.
In step S12, status information corresponding to the detection index is determined based on the operation data.
In this embodiment of the application, the operation data may be compared with preset operation data, where the preset operation data is a numerical range of each detection index of the node in a normal state, or an upper limit value or a lower limit value preset by a worker, and the like. And if the operation data is matched with the preset operation data, determining that the state information is in a normal state. And if the operation data are not matched with the preset operation data, determining that the state information is in an abnormal state.
As an example, if the disk capacity of the system disk is greater than a preset capacity, the status information of the system disk is determined to be an abnormal status. And if the transmission rate corresponding to the management network is less than the preset transmission rate, determining that the management network where the node is located is not communicated, and at the moment, determining that the state information corresponding to the management network is in an abnormal state. And if the restart times corresponding to the components are greater than the preset times or the health information is not matched with the preset health information, determining that the state information of the components is in an abnormal state.
In step S13, the detection index whose state information is an abnormal state is determined as the target detection index.
In the embodiment of the application, after determining the state information corresponding to each detection index according to the operation data, the agent terminal counts the state information corresponding to each detection index, and determines the detection index of which the state information is an abnormal state as the target detection index.
And step S14, sending the running data corresponding to the target detection index to the controller, so that the controller executes exception handling operation on the target node according to the running data.
In the embodiment of the application, after determining the target detection index, the agent terminal sends the target detection index and the operation data of the node where the agent terminal is located to the controller, so that the controller determines the abnormal reason of the target detection index according to the operation data, and performs abnormal processing on the node according to the abnormal reason, thereby restoring the target detection index of the node to a normal state.
According to another aspect of the embodiment of the present application, there is further provided a method for processing a node exception, and fig. 3 is a flowchart of the method for processing a node exception provided in the embodiment of the present application, as shown in fig. 3, the method may include the following steps:
step S21, receiving a target detection index sent by an agent terminal deployed on a target node in the target cluster, and operation data corresponding to the target detection index, where the target node is a node where the target detection index is in an abnormal state.
The method provided by the embodiment of the application is applied to a controller deployed on a master node in a target cluster, and the controller receives a target detection index sent by each agent terminal deployed in the target cluster and operation data corresponding to the target detection index, wherein each agent terminal corresponds to one target node.
It should be noted that a plurality of nodes exist in the target cluster, each node is provided with an agent terminal, the agent terminal monitors the node where the node is located, and if a certain detection index of the node is abnormal, the agent terminal determines the abnormal detection index as the target detection index. Therefore, after receiving the target detection index sent by the proxy terminal, the controller determines the node where the proxy terminal is located as the target node.
And step S22, determining the target abnormal type corresponding to the target detection index according to the operation data.
In the embodiment of the present application, the controller stores an abnormality classification condition corresponding to each detection index, for example: when the running data is the restarting times of the components, the target exception type is that the components are frequently restarted. Or when the running data is the health information of the component, the target exception type is a Kubelet exception or a Docker exception. Or, when the operation data is the network parameters of the management network, and/or the service network, and/or the storage network, the target abnormal type of the management network, and/or the service network, and/or the storage network may be determined as the network failure according to the network parameters.
And step S23, inquiring a target exception handling script corresponding to the target exception type.
In this embodiment of the present application, in step S23, querying an exception handling script corresponding to the target exception type includes the following steps C1-C2:
and step C1, reading the mapping relation between the exception type and the exception handling script from the buffer of the controller.
In the embodiment of the present application, a mapping relationship between an exception type and an exception handling script is stored in a cache of a controller, where one exception type may correspond to at least one exception handling script. Specifically, the controller may send the exception type to the target client, the target client may feed back the exception handling script to the controller according to the exception type, and then the controller establishes a mapping relationship between the exception handling script and the exception type.
It should be noted that the exception handling script may be a script that the user solves the exception after capturing the exception through the target client, and after solving the exception, the user writes a solution process into the script through the target client, and introduces the script into the port file according to the format to generate a final exception handling script.
And step C2, acquiring a target exception handling script corresponding to the target exception type based on the mapping relation.
In this embodiment of the application, in the query process, the controller may obtain a target exception handling script corresponding to the target exception type based on the established mapping relationship. Therefore, by establishing the mapping relation between the exception type and the exception handling script in advance, when a certain node in the target cluster is abnormal, the corresponding exception handling script can be automatically acquired according to the abnormal type of the node, the exception is not required to be manually solved, and the target cluster is ensured to be in a high availability state continuously.
In step S24, when the target exception handling script exists, the target exception handling script is executed to return the target detection index of the target node to a normal state.
In the embodiment of the application, under the condition that the target exception handling script exists, the controller executes the target exception handling script, so that the target detection index of the target node is recovered to be normal, and the node self-healing is realized.
In the embodiment of the present application, in step S24, in the case that there is no exception handling script, the method further includes the following steps D1-D3:
and D1, sending the target detection index and the operation data corresponding to the target detection index to the target client.
And D2, receiving a target exception handling script fed back by the target client based on the target detection index and the running data.
And D3, establishing a mapping relation between the target exception handling script and the target exception type, and storing the mapping relation to the cache of the controller.
In the embodiment of the application, under the condition that the target exception handling script exists, the controller sends the target detection index and the running data corresponding to the target detection index to the target client, and the target client displays the target detection index and the running data, so that a user writes the corresponding target exception handling script according to the displayed target detection index and running data. After the user finishes writing, the target exception handling script can be sent to the controller in the target cluster through the target client.
In the embodiment of the application, after receiving the target exception handling script, the controller executes the target exception handling script, so that the target detection index of the node is restored to a normal state. And meanwhile, storing the target exception handling script, establishing a mapping relation between the target exception handling script and the target exception type, and storing the mapping relation into a cache of the controller.
The method provided by the embodiment of the application has the advantage that the controller can automatically send the target detection index and the running data to the target client under the condition that the exception handling script does not exist in the controller. And the controller can execute and store the target exception handling script, so that the exception handling operation can be automatically executed when the same exception occurs in subsequent nodes.
Fig. 4 is a block diagram of an apparatus for detecting a node state according to an embodiment of the present disclosure, where the apparatus may be implemented as part of or all of an electronic device through software, hardware, or a combination of the two. As shown in fig. 4, the apparatus includes:
the detection module 31 is configured to periodically detect the node according to a detection strategy corresponding to the detection index, so as to obtain operation data corresponding to each detection index in the node;
a determining module 32, configured to determine, based on the operation data, status information corresponding to the detection index;
the processing module 33 is configured to determine a detection index of which the state information is an abnormal state as a target detection index;
and the sending module 34 is configured to send the running data corresponding to the target detection index to the controller, so that the controller executes an exception handling operation on the target node according to the running data.
In the embodiment of the present application, in the case that the detection index is a network index, the detection module 31 is configured to determine a network detection policy corresponding to the network index; network parameters respectively corresponding to a management network, a service network and a storage network where the node is located are detected by using a network detection strategy; and determining network parameters respectively corresponding to the management network, the service network and the storage network as operation data corresponding to the network indexes.
In the embodiment of the present application, in the case that the detection index is a component index, the detection module 31 is configured to determine a component detection policy corresponding to the component index; and querying the log file of the node by using the component detection strategy, and counting the operation data of the component in the node from the log file.
Fig. 5 is a block diagram of a node exception handling apparatus according to an embodiment of the present application, where the apparatus may be implemented as part or all of an electronic device through software, hardware, or a combination of the two. As shown in fig. 5, the apparatus includes:
the receiving module 41 is configured to receive a target detection index sent by an agent terminal deployed on a target node in a target cluster, and operation data corresponding to the target detection index, where the target node is a node where the target detection index is in an abnormal state.
And the determining module 42 is configured to determine a target abnormality type corresponding to the target detection index according to the operation data.
And the query module 43 is configured to query a target exception handling script corresponding to the target exception type.
And the execution module 44 is configured to execute the target exception handling script to restore the target detection index of the target node to a normal state if the target exception handling script exists.
In this embodiment of the present application, the query module 43 is configured to read a mapping relationship between an exception type and an exception handling script from a cache of a controller; and acquiring a target exception handling script corresponding to the target exception type based on the mapping relation.
In this embodiment of the present application, in the case that no exception handling script exists, the apparatus for handling a node exception further includes: the establishing module is used for sending the target detection index and the operation data corresponding to the target detection index to the target client; receiving a target exception handling script fed back by a target client based on target detection indexes and running data; and establishing a mapping relation between the target exception handling script and the target exception type, and storing the mapping relation to a cache of the controller.
An embodiment of the present application further provides an electronic device, as shown in fig. 5, the electronic device may include: the system comprises a processor 1501, a communication interface 1502, a memory 1503 and a communication bus 1504, wherein the processor 1501, the communication interface 1502 and the memory 1503 complete communication with each other through the communication bus 1504.
A memory 1503 for storing a computer program;
the processor 1501 is configured to implement the steps of the above embodiments when executing the computer program stored in the memory 1503.
The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the terminal and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In another embodiment provided by the present application, a computer-readable storage medium is further provided, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer, the computer is caused to execute the method for detecting a node state in any one of the above embodiments.
In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions, which when run on a computer, causes the computer to execute the method for detecting a node state according to any one of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk), among others.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.
The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A method for processing node exception, which is applied to a controller deployed on a master node in a target cluster, and comprises the following steps:
receiving a target detection index sent by an agent terminal deployed on a target node in a target cluster and operation data corresponding to the target detection index, wherein the target node is a node of which the target detection index is in an abnormal state;
determining a target abnormal type corresponding to the target detection index according to the operation data;
inquiring a target exception handling script corresponding to the target exception type;
and under the condition that the target exception handling script exists, executing the target exception handling script to enable the target detection index of the target node to be recovered to a normal state.
2. The method of claim 1, wherein the querying for an exception handling script corresponding to the target exception type comprises:
reading a mapping relation between an exception type and an exception handling script from a cache of the controller;
and acquiring a target exception handling script corresponding to the target exception type based on the mapping relation.
3. The method of claim 2, wherein in the absence of the exception handling script, the method further comprises:
sending the target detection index and operation data corresponding to the target detection index to a target client;
receiving a target exception handling script fed back by the target client based on the target detection index and the running data;
and establishing a mapping relation between the target exception handling script and the target exception type, and storing the mapping relation to a cache of the controller.
4. A method for detecting node states is applied to a proxy terminal, the proxy terminal is deployed at each node in a target cluster, and the method comprises the following steps:
carrying out periodic detection on the nodes according to detection strategies corresponding to the detection indexes to obtain operation data corresponding to each detection index in the nodes;
determining state information corresponding to the detection index based on the operation data;
determining the detection index of which the state information is in an abnormal state as a target detection index;
and sending the running data corresponding to the target detection index to a controller so that the controller executes exception handling operation on the target node according to the running data.
5. The method according to claim 4, wherein, in a case that the detection index is a network index, the periodically detecting the node according to the detection policy corresponding to the detection index to obtain the operation data corresponding to each detection index in the node includes:
determining a network detection strategy corresponding to the network index;
detecting network parameters respectively corresponding to a management network, a service network and a storage network where the node is located by utilizing the network detection strategy;
and determining the network parameters respectively corresponding to the management network, the service network and the storage network as the operation data corresponding to the network index.
6. The method according to claim 4, wherein, in a case that the detection index is a component index, the periodically detecting the node according to the detection strategy corresponding to the detection index to obtain the operation data corresponding to each detection index in the node includes:
determining a component detection strategy corresponding to the component index;
and querying a log file of the node by using the component detection strategy, and counting the operation data of the components in the node from the log file.
7. An apparatus for processing node exceptions, comprising:
the system comprises a receiving module, a processing module and a sending module, wherein the receiving module is used for receiving a target detection index sent by an agent terminal deployed on a target node in a target cluster and operation data corresponding to the target detection index, and the target node is a node of which the target detection index is in an abnormal state;
the determining module is used for determining a target abnormal type corresponding to the target detection index according to the running data;
the query module is used for querying a target exception handling script corresponding to the target exception type;
and the execution module is used for executing the target exception handling script under the condition that the target exception handling script exists so as to enable the target detection index of the target node to be recovered to a normal state.
8. An apparatus for detecting a node status, comprising:
the detection module is used for periodically detecting the nodes according to the detection strategies corresponding to the detection indexes to obtain the operation data corresponding to each detection index in the nodes;
the determining module is used for determining the state information corresponding to the detection index based on the operation data;
the processing module is used for determining the detection index of which the state information is in the abnormal state as a target detection index;
and the sending module is used for sending the running data corresponding to the target detection index to the controller so that the controller executes exception handling operation on the target node according to the running data.
9. A storage medium, characterized in that the storage medium comprises a stored program, wherein the program is operative to perform the method steps of any of the preceding claims 1 to 6.
10. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus; wherein:
a memory for storing a computer program;
a processor for performing the method steps of any of claims 1 to 6 by executing a program stored on a memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210111973.7A CN114443438A (en) | 2022-01-29 | 2022-01-29 | Node state detection method, node abnormity processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210111973.7A CN114443438A (en) | 2022-01-29 | 2022-01-29 | Node state detection method, node abnormity processing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114443438A true CN114443438A (en) | 2022-05-06 |
Family
ID=81371202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210111973.7A Pending CN114443438A (en) | 2022-01-29 | 2022-01-29 | Node state detection method, node abnormity processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114443438A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117768165A (en) * | 2023-12-12 | 2024-03-26 | 暗物质(北京)智能科技有限公司 | Network anomaly detection method, device, computer equipment and storage medium |
-
2022
- 2022-01-29 CN CN202210111973.7A patent/CN114443438A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117768165A (en) * | 2023-12-12 | 2024-03-26 | 暗物质(北京)智能科技有限公司 | Network anomaly detection method, device, computer equipment and storage medium |
CN117768165B (en) * | 2023-12-12 | 2024-09-06 | 暗物质(北京)智能科技有限公司 | Network anomaly detection method, device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110830283B (en) | Fault detection method, device, equipment and system | |
CN107315825B (en) | Index updating system, method and device | |
CN107181834B (en) | Method and device for managing virtual IP address by redis and redis system | |
CN113141412B (en) | Domain name switching method, system, device, equipment and storage medium | |
CN110896382B (en) | Flow control method, apparatus, device and computer readable storage medium | |
CN108243031B (en) | Method and device for realizing dual-computer hot standby | |
CN111556125B (en) | Access request distribution method, load balancing equipment and electronic equipment | |
CN111010318A (en) | Method and system for discovering loss of connection of terminal equipment of Internet of things and equipment shadow server | |
US11930292B2 (en) | Device state monitoring method and apparatus | |
CN114443438A (en) | Node state detection method, node abnormity processing method and device | |
CN108509296B (en) | Method and system for processing equipment fault | |
CN109510730B (en) | Distributed system, monitoring method and device thereof, electronic equipment and storage medium | |
US11153769B2 (en) | Network fault discovery | |
CN110784369A (en) | Method for detecting long connection, server, terminal and storage medium | |
CN113568781B (en) | Database error processing method and device and database cluster access system | |
CN114090293A (en) | Service providing method and electronic equipment | |
CN112671590B (en) | Data transmission method and device, electronic equipment and computer storage medium | |
CN114584454A (en) | Server information processing method and device, electronic equipment and storage medium | |
CN114143330A (en) | Configuration method, device and system of time server | |
CN111064609A (en) | Master-slave switching method and device of message system, electronic equipment and storage medium | |
CN111064608A (en) | Master-slave switching method and device of message system, electronic equipment and storage medium | |
CN115543698B (en) | Data backup method, device, equipment and storage medium | |
CN111614747B (en) | Information processing method and device | |
CN112769889B (en) | Service data pushing method and device, storage medium and electronic device | |
CN108377670A (en) | A kind of method of processing business, service node, control node and distributed system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |