CN117041013A - Fault node processing method, device, system, equipment and storage medium - Google Patents

Fault node processing method, device, system, equipment and storage medium Download PDF

Info

Publication number
CN117041013A
CN117041013A CN202311053906.5A CN202311053906A CN117041013A CN 117041013 A CN117041013 A CN 117041013A CN 202311053906 A CN202311053906 A CN 202311053906A CN 117041013 A CN117041013 A CN 117041013A
Authority
CN
China
Prior art keywords
fault
node
module
instruction
execution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311053906.5A
Other languages
Chinese (zh)
Inventor
赵福建
夏彬
李林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinzhuan Xinke Co Ltd
Original Assignee
Jinzhuan Xinke Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinzhuan Xinke Co Ltd filed Critical Jinzhuan Xinke Co Ltd
Priority to CN202311053906.5A priority Critical patent/CN117041013A/en
Publication of CN117041013A publication Critical patent/CN117041013A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a fault node processing method, a device, a system, equipment and a storage medium, which are used for further collecting log information and performance data of a node by analyzing fault information after the fault of the node is monitored, automatically identifying fault types based on a preset fault diagnosis identification model and generating corresponding repair execution instructions, so that an execution module can automatically repair the fault of the node according to the repair execution instructions, the fault is prevented from being existed for a long time, the problem that a large amount of time and manpower resources are consumed in the manual fault diagnosis of a distributed database possibly is solved, the risk of human error exists in the repair process, the complexity of fault repair is increased, and meanwhile, the conventional database automatic repair method is generally based on predefined rules and logic to try to identify and solve the common known fault and cannot cope with the technical problems of complex and unknown fault.

Description

Fault node processing method, device, system, equipment and storage medium
Technical Field
The present application relates to the field of database technologies, and in particular, to a method, an apparatus, a system, a device, and a storage medium for processing a failure node.
Background
A distributed database is typically made up of a plurality of nodes, with complex topological relationships between the nodes. When one node in the distributed database fails, the fault node can be rapidly propagated to other nodes, the stability and the usability of the whole system are affected, and the failed node needs to be timely and accurately determined and repaired.
Fault diagnosis of a distributed database is a complex task requiring analysis of a large amount of log and performance index data. The manual diagnosis may require a lot of time and human resources, and there is a risk of human error in the repair process, increasing the complexity of the fault repair. Meanwhile, conventional database automatic repair methods are generally based on predefined rules and logic, attempt to identify and resolve common known faults, and cannot cope with complex and unknown faults.
Disclosure of Invention
The application provides a fault node processing method, a device, a system, equipment and a storage medium, which solve the technical problems that a large amount of time and human resources are consumed in fault manual diagnosis of a distributed database, human error risks exist in the repairing process, the complexity of fault repairing is increased, meanwhile, the conventional database automatic repairing method is generally based on predefined rules and logics, and the conventional known faults are tried to be identified and solved, and the complex and unknown faults cannot be dealt with.
In view of this, a first aspect of the present application provides a method for processing a failed node, the method comprising:
s1, when a fault node occurs, acquiring fault information uploaded by a monitoring module arranged on the node;
s2, analyzing according to the fault information, generating and sending a target acquisition instruction to the monitoring module, wherein the target acquisition instruction comprises target log information and target performance data;
s3, acquiring target information uploaded by the monitoring module, and carrying out fault type identification on the target information through a preset fault diagnosis model to obtain a fault diagnosis result;
s4, generating a repairing execution instruction according to the fault diagnosis result, and sending the repairing execution instruction to an execution module, so that the execution module repairs the fault node according to the repairing execution instruction.
Optionally, the step S4 further includes:
s5, after receiving the execution completion information returned by the execution module, sending a fault checking instruction for the fault node to the monitoring module;
s6, receiving a fault verification result returned by the monitoring module, if the fault verification result is successful in verification, completing a fault node processing flow, and otherwise, sending an alarm instruction to an alarm module.
Optionally, the method further comprises:
acquiring collected real-time log information and real-time performance data;
analyzing through a preset fault prediction model based on the log information and the real-time performance data to obtain a fault prediction result;
and sending an alarm instruction to an alarm module or ending the fault prediction processing flow according to the fault prediction result.
A second aspect of the present application provides a fault node processing apparatus, the apparatus comprising:
the first acquisition unit is used for acquiring fault information uploaded by a monitoring module arranged on the node when the fault node occurs;
the first analysis unit is used for analyzing according to the fault information, generating and sending a target acquisition instruction to the monitoring module, wherein the target acquisition instruction comprises target log information and target performance data;
the second acquisition unit is used for acquiring the target information uploaded by the monitoring module, and carrying out fault type identification on the target information through a preset fault diagnosis model to obtain a fault diagnosis result;
the first processing unit is used for generating a repairing execution instruction according to the fault diagnosis result and sending the repairing execution instruction to the execution module so that the execution module repairs the fault node according to the repairing execution instruction.
Optionally, the method further comprises:
the sending unit is used for sending a fault checking instruction to the fault node to the monitoring module after receiving the execution completion information returned by the execution module;
and the alarm unit is used for receiving the fault verification result returned by the monitoring module, completing the fault node processing flow if the fault verification result is successful in verification, and otherwise, sending an alarm instruction to the alarm module.
Optionally, the method further comprises:
the third acquisition unit is used for acquiring the acquired real-time log information and real-time performance data;
the second analysis unit is used for analyzing through a preset fault prediction model based on the log information and the real-time performance data to obtain a fault prediction result;
and the second processing unit is used for sending an alarm instruction to the alarm module or ending the fault prediction processing flow according to the fault prediction result.
A third aspect of the present application provides a fault node processing system, where the system includes any one of the fault node processing devices of the second aspect of the present application, and further includes:
the monitoring module and the execution module are arranged on the node;
wherein:
when a fault node occurs, the fault node processing device acquires fault information uploaded by the monitoring module arranged on the node;
the fault node processing device analyzes according to the fault information, generates and sends a target acquisition instruction to the monitoring module, wherein the target acquisition instruction comprises target log information and target performance data;
the fault node processing device acquires the target information uploaded by the monitoring module, and performs fault type identification on the target information through a preset fault diagnosis model to obtain a fault diagnosis result;
the fault node processing device generates a repair execution instruction according to the fault diagnosis result, and sends the repair execution instruction to the execution module, so that the execution module repairs the fault node according to the repair execution instruction.
Optionally, the method further comprises:
an alarm module;
after receiving the execution completion information returned by the execution module, the fault node processing device sends a fault checking instruction for the fault node to the monitoring module;
and the fault node processing device receives a fault verification result returned by the monitoring module, if the fault verification result is successful in verification, the fault node processing flow is completed, and otherwise, an alarm instruction is sent to the alarm module.
A fourth aspect of the present application provides a fault node processing apparatus, the apparatus comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the steps of the method for processing a faulty node according to the first aspect described above according to instructions in the program code.
A fifth aspect of the present application provides a computer readable storage medium storing program code for performing the steps of the method of fault node handling as described in the first aspect above.
From the above technical solutions, the embodiment of the present application has the following advantages:
the application provides a fault node processing method, a device, a system, equipment and a storage medium, which are used for further collecting log information and performance data of a node by analyzing fault information after the fault of the node is monitored, and automatically identifying fault types and generating corresponding repairing execution instructions based on a preset fault diagnosis identification model, so that an execution module can automatically repair the fault of the node according to the repairing execution instructions, the fault is prevented from being existed for a long time, the problem that a large amount of time and manpower resources are consumed in the manual fault diagnosis of a distributed database, the risk of human error exists in the repairing process is solved, the complexity of fault repairing is increased, and meanwhile, the conventional database automatic repairing method is generally based on predefined rules and logics, and is used for attempting to identify and solve the common known faults and cannot cope with the technical problems of complicated and unknown faults.
Drawings
FIG. 1 is a flow chart of a method for processing a failed node according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a device for processing a failed node according to an embodiment of the present application;
FIG. 3 is a flow chart of a first mode of operation in the method for handling a failed node of the present application;
FIG. 4 is a flow chart of a second mode of operation in the method for handling a failed node of the present application;
FIG. 5 is a system architecture diagram of a failed node processing system in accordance with an embodiment of the present application;
fig. 6 is a schematic structural diagram of a fault node processing device according to an embodiment of the present application.
Detailed Description
In order to make the present application better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The application designs a fault node processing method, a device, a system, equipment and a storage medium, which solve the technical problems that a large amount of time and human resources are consumed in fault manual diagnosis of a distributed database, human error risks exist in the repairing process, the complexity of fault repairing is increased, and meanwhile, the conventional database automatic repairing method is generally based on predefined rules and logics, and is used for attempting to identify and solve common known faults and cannot cope with complicated and unknown faults.
For ease of understanding, referring to fig. 1, fig. 1 is a flowchart of a method for processing a fault node according to an embodiment of the present application, as shown in fig. 1, specifically:
s1, when a fault node occurs, acquiring fault information uploaded by a monitoring module arranged on the node;
it should be noted that, when the monitoring module set in the node monitors the progress of the node in a certain frequency or in real time, so as to determine that the running state of the current node is abnormal, the fault information is uploaded in real time.
S2, analyzing according to the fault information, generating and sending a target acquisition instruction to the monitoring module, wherein the target acquisition instruction comprises target log information and target performance data;
after the fault information is obtained, preliminary judgment is carried out according to the fault information, and then a target acquisition instruction is sent to the monitoring module, so that the monitoring module acquires target log information and target performance data corresponding to the target node according to the target acquisition instruction, and subsequent fault analysis is facilitated.
S3, acquiring target information uploaded by the monitoring module, and carrying out fault type identification on the target information through a preset fault diagnosis model to obtain a fault diagnosis result;
after the monitoring module completes the collection of the target log information and the target performance data according to the target collection instruction, the monitoring module identifies the fault type of the target information according to a preset fault diagnosis model. It can be understood that the preset fault diagnosis model adopted by the application is obtained by continuously optimizing training and learning based on machine learning and a large amount of log and performance data.
S4, generating a repair execution instruction according to the fault diagnosis result, and sending the repair execution instruction to the execution module, so that the execution module repairs the fault node according to the repair execution instruction.
It should be noted that, according to the fault diagnosis result obtained by the judgment, the repair execution scheme is correspondingly called, so as to generate a repair execution instruction, so that the execution module repairs the fault node according to the repair execution instruction, thereby realizing a full-automatic fault processing scheme from fault discovery, fault identification to fault repair.
Further, step S4 further includes:
s5, after receiving the execution completion information returned by the execution module, sending a fault checking instruction for the fault node to the monitoring module;
after the execution module repairs the fault node according to the repair execution instruction, the execution completion information is fed back, and further, whether the real repair is successful or not needs to be checked by sending a fault check instruction of the fault node to the monitoring module.
S6, receiving a fault verification result returned by the monitoring module, if the fault verification result is successful in verification, completing a fault node processing flow, and otherwise, sending an alarm instruction to the alarm module.
If the failure check result is successful, it is indicated that the node failure has been repaired, and if the failure check result is unsuccessful, an alarm needs to be sent to the personnel through an alarm module, and the personnel intervention is used for subsequent processing.
As shown in fig. 3, fig. 3 is a flowchart of a first operation mode in the fault node processing method of the present application, specifically:
an embodiment of the application when in an operational mode basically implements the steps as follows:
(1) The monitoring module monitors the running state of the distributed database nodes in real time and reports the running state to the management module based on machine learning.
(2) When a node fails, the monitoring module reports failure information to the management module, the management module preliminarily judges according to the failure information, and then the acquisition module is informed to acquire specific log information and performance data.
(3) The management module collects the reported information based on the collection module of the fault node, and the fault diagnosis model construction module is utilized to carry out model construction, analysis, calculation and identification of the fault type.
(4) The fault repair flow definition module in the management module generates corresponding repair operation steps according to the diagnosis result of the fault diagnosis model construction module and sends the corresponding repair operation steps to the execution module of the fault node to execute processing.
(5) After the execution module finishes processing, if the execution is successful, the execution module feeds back to the management module, the management module notifies the monitoring module to check, a check result is fed back to the management module, the process is finished if the check is successful, and the alarm module is notified to alarm outwards if the check is failed.
If the processing of the execution module fails, the management module notifies the alarm module to give an alarm.
Further, the method further comprises the following steps:
acquiring collected real-time log information and real-time performance data;
it should be noted that, besides the fault type identification and fault repair after the fault occurs, the scheme of the application can also collect log information and performance data in real time.
Based on the log information and the real-time performance data, analyzing through a preset fault prediction model to obtain a fault prediction result;
it should be noted that, according to the log information and the performance data collected in real time, the fault prediction result including but not limited to the fault probability of the target node may be obtained by analyzing through a preset fault prediction model.
And sending an alarm instruction to the alarm module or ending the fault prediction processing flow according to the fault prediction result.
It should be noted that, if the failure prediction result is high risk or high probability, an alarm needs to be sent out through the alarm module, otherwise, the daily monitoring is completed.
As shown in fig. 4, fig. 4 is a flowchart of a second operation mode in the fault node processing method according to the present application, specifically:
(1) The acquisition module acquires the log and the performance data in real time and provides the log and the performance data for the fault real-time detection prediction module.
(2) And the fault real-time detection prediction module performs prediction model analysis to obtain a conclusion and feeds back information to the management module.
(3) And the management module judges according to the prediction result, and if no fault sharing exists, the whole flow is ended.
(4) If the fault risk exists, the alarm module is informed to give an alarm.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a fault node processing apparatus according to an embodiment of the present application, as shown in fig. 2, specifically:
a first obtaining unit 201, configured to obtain, when a faulty node occurs, fault information uploaded by a monitoring module disposed at the node;
the first analysis unit 202 is configured to perform analysis according to the fault information, generate and send a target acquisition instruction to the monitoring module, where the target acquisition instruction includes target log information and target performance data;
the second obtaining unit 203 is configured to obtain the target information uploaded by the monitoring module, and identify a fault type of the target information by using a preset fault diagnosis model, so as to obtain a fault diagnosis result;
the first processing unit 204 is configured to generate a repair execution instruction according to the fault diagnosis result, and send the repair execution instruction to the execution module, so that the execution module repairs the fault node according to the repair execution instruction.
Further, the method further comprises the following steps:
the sending unit is used for sending a fault checking instruction for the fault node to the monitoring module after receiving the execution completion information returned by the execution module;
and the alarm unit is used for receiving the fault verification result returned by the monitoring module, completing the fault node processing flow if the fault verification result is successful, and otherwise, sending an alarm instruction to the alarm module.
Further, the method further comprises the following steps:
the third acquisition unit is used for acquiring the acquired real-time log information and real-time performance data;
the second analysis unit is used for analyzing through a preset fault prediction model based on the log information and the real-time performance data to obtain a fault prediction result;
and the second processing unit is used for sending an alarm instruction to the alarm module or ending the fault prediction processing flow according to the fault prediction result.
Referring to fig. 5, fig. 5 is a system architecture diagram of a fault node processing system according to an embodiment of the present application, as shown in fig. 5, specifically:
the fault node processing device 301 including any one of the second aspects of the present application further includes:
a monitoring module 302 and an executing module 303, which are arranged on the node;
wherein:
when a fault node occurs, the fault node processing device 301 acquires fault information uploaded by a monitoring module 302 arranged on the node;
the fault node processing device 301 analyzes according to the fault information, generates and sends a target acquisition instruction to the monitoring module 302, wherein the target acquisition instruction comprises target log information and target performance data;
the fault node processing device 301 obtains the target information uploaded by the monitoring module 302, and performs fault type identification on the target information through a preset fault diagnosis model to obtain a fault diagnosis result;
the fault node processing device 301 generates a repair execution instruction according to the fault diagnosis result, and sends the repair execution instruction to the execution module 303, so that the execution module 303 repairs the fault node according to the repair execution instruction.
Further, the method further comprises the following steps:
an alarm module;
after receiving the execution completion information returned by the execution module, the fault node processing device sends a fault checking instruction for the fault node to the monitoring module;
the fault node processing device receives a fault check result returned by the monitoring module, if the fault check result is successful in checking, the fault node processing flow is completed, and otherwise, an alarm instruction is sent to the alarm module.
The embodiment of the present application further provides another fault node processing device, as shown in fig. 6, for convenience of explanation, only the portion relevant to the embodiment of the present application is shown, and specific technical details are not disclosed, please refer to the method portion of the embodiment of the present application. The terminal can be any terminal equipment including a mobile phone, a tablet personal computer, a personal digital assistant (English full name: personal Digital Assistant, english abbreviation: PDA), a Sales terminal (English full name: point of Sales, english abbreviation: POS), a vehicle-mounted computer and the like, taking the mobile phone as an example of the terminal:
fig. 6 is a block diagram showing a part of a structure of a mobile phone related to a terminal provided by an embodiment of the present application. Referring to fig. 6, the mobile phone includes: radio Frequency (RF) circuit 1010, memory 1020, input unit 1030, display unit 1040, sensor 1050, audio circuit 1060, wireless fidelity (wireless fidelity, wiFi) module 1070, processor 1080, and power source 1090. Those skilled in the art will appreciate that the handset configuration shown in fig. 6 is not limiting of the handset and may include more or fewer components than shown, or may combine certain components, or may be arranged in a different arrangement of components.
The following describes the components of the mobile phone in detail with reference to fig. 6:
the RF circuit 1010 may be used for receiving and transmitting signals during a message or a call, and particularly, after receiving downlink information of a base station, the signal is processed by the processor 1080; in addition, the data of the design uplink is sent to the base station. Generally, RF circuitry 1010 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (English full name: low Noise Amplifier, english abbreviation: LNA), a duplexer, and the like. In addition, the RF circuitry 1010 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (english: global System of Mobile communication, english: GSM), general packet radio service (english: general Packet Radio Service, GPRS), code division multiple access (english: code Division Multiple Access, english: CDMA), wideband code division multiple access (english: wideband Code Division Multiple Access, english: WCDMA), long term evolution (english: long Term Evolution, english: LTE), email, short message service (english: short Messaging Service, SMS), and the like.
The memory 1020 may be used to store software programs and modules that the processor 1080 performs various functional applications and data processing of the handset by executing the software programs and modules stored in the memory 1020. The memory 1020 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, memory 1020 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state memory device.
The input unit 1030 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the handset. In particular, the input unit 1030 may include a touch panel 1031 and other input devices 1032. The touch panel 1031, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 1031 or thereabout using any suitable object or accessory such as a finger, stylus, etc.), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 1031 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 1080 and can receive commands from the processor 1080 and execute them. Further, the touch panel 1031 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 1030 may include other input devices 1032 in addition to the touch panel 1031. In particular, other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a track ball, a mouse, a joystick, etc.
The display unit 1040 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The display unit 1040 may include a display panel 1041, and alternatively, the display panel 1041 may be configured in the form of a liquid crystal display (english full name: liquid Crystal Display, acronym: LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 1031 may overlay the display panel 1041, and when the touch panel 1031 detects a touch operation thereon or thereabout, the touch panel is transferred to the processor 1080 to determine a type of touch event, and then the processor 1080 provides a corresponding visual output on the display panel 1041 according to the type of touch event. Although in fig. 6, the touch panel 1031 and the display panel 1041 are two independent components to implement the input and input functions of the mobile phone, in some embodiments, the touch panel 1031 and the display panel 1041 may be integrated to implement the input and output functions of the mobile phone.
The handset may also include at least one sensor 1050, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1041 according to the brightness of ambient light, and the proximity sensor may turn off the display panel 1041 and/or the backlight when the mobile phone moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for applications of recognizing the gesture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the handset are not described in detail herein.
Audio circuitry 1060, a speaker 1061, and a microphone 1062 may provide an audio interface between a user and a cell phone. Audio circuit 1060 may transmit the received electrical signal after audio data conversion to speaker 1061 for conversion by speaker 1061 into an audio signal output; on the other hand, microphone 1062 converts the collected sound signals into electrical signals, which are received by audio circuit 1060 and converted into audio data, which are processed by audio data output processor 1080 for transmission to, for example, another cell phone via RF circuit 1010 or for output to memory 1020 for further processing.
WiFi belongs to a short-distance wireless transmission technology, and a mobile phone can help a user to send and receive emails, browse webpages, access streaming media and the like through a WiFi module 1070, so that wireless broadband Internet access is provided for the user. Although fig. 6 shows a WiFi module 1070, it is understood that it does not belong to the necessary constitution of the handset, and can be omitted entirely as required within the scope of not changing the essence of the application.
Processor 1080 is the control center of the handset, connects the various parts of the entire handset using various interfaces and lines, and performs various functions and processes of the handset by running or executing software programs and/or modules stored in memory 1020, and invoking data stored in memory 1020, thereby performing overall monitoring of the handset. Optionally, processor 1080 may include one or more processing units; preferably, processor 1080 may integrate an application processor primarily handling operating systems, user interfaces, applications, etc., with a modem processor primarily handling wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1080.
The handset further includes a power source 1090 (e.g., a battery) for powering the various components, which may preferably be logically connected to the processor 1080 by a power management system, such as to provide for managing charging, discharging, and power consumption by the power management system.
Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which will not be described herein.
In an embodiment of the present application, the processor 1080 included in the terminal further has the following functions:
s1, when a fault node occurs, acquiring fault information uploaded by a monitoring module arranged on the node;
s2, analyzing according to the fault information, generating and sending a target acquisition instruction to the monitoring module, wherein the target acquisition instruction comprises target log information and target performance data;
s3, acquiring target information uploaded by the monitoring module, and carrying out fault type identification on the target information through a preset fault diagnosis model to obtain a fault diagnosis result;
s4, generating a repair execution instruction according to the fault diagnosis result, and sending the repair execution instruction to the execution module, so that the execution module repairs the fault node according to the repair execution instruction.
The embodiments of the present application also provide a computer readable storage medium storing program code for executing any one of the foregoing fault node processing methods according to the foregoing embodiments.
In the embodiment of the application, after a node is monitored to be faulty, log information and performance data of the node are further collected through analyzing the fault information, and a fault type is automatically identified based on a preset fault diagnosis identification model and a corresponding repair execution instruction is generated, so that the execution module can automatically repair the fault of the node according to the repair execution instruction, the fault is prevented from being existed for a long time, the problem that a large amount of time and manpower resources are consumed in the manual fault diagnosis of a distributed database, the risk of human error exists in the repair process is solved, the complexity of fault repair is increased, and meanwhile, the conventional automatic database repair method is generally based on predefined rules and logic, and is used for attempting to identify and solve the common known fault and cannot cope with the technical problems of complicated and unknown faults.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: u disk, mobile hard disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. A method for processing a failed node, comprising:
s1, when a fault node occurs, acquiring fault information uploaded by a monitoring module arranged on the node;
s2, analyzing according to the fault information, generating and sending a target acquisition instruction to the monitoring module, wherein the target acquisition instruction comprises target log information and target performance data;
s3, acquiring target information uploaded by the monitoring module, and carrying out fault type identification on the target information through a preset fault diagnosis model to obtain a fault diagnosis result;
s4, generating a repairing execution instruction according to the fault diagnosis result, and sending the repairing execution instruction to an execution module, so that the execution module repairs the fault node according to the repairing execution instruction.
2. The method according to claim 1, wherein after the step S4, further comprising:
s5, after receiving the execution completion information returned by the execution module, sending a fault checking instruction for the fault node to the monitoring module;
s6, receiving a fault verification result returned by the monitoring module, if the fault verification result is successful in verification, completing a fault node processing flow, and otherwise, sending an alarm instruction to an alarm module.
3. The method of processing a failed node as set forth in claim 1, further comprising:
acquiring collected real-time log information and real-time performance data;
analyzing through a preset fault prediction model based on the log information and the real-time performance data to obtain a fault prediction result;
and sending an alarm instruction to an alarm module or ending the fault prediction processing flow according to the fault prediction result.
4. A failure node processing apparatus, comprising:
the first acquisition unit is used for acquiring fault information uploaded by a monitoring module arranged on the node when the fault node occurs;
the first analysis unit is used for analyzing according to the fault information, generating and sending a target acquisition instruction to the monitoring module, wherein the target acquisition instruction comprises target log information and target performance data;
the second acquisition unit is used for acquiring the target information uploaded by the monitoring module, and carrying out fault type identification on the target information through a preset fault diagnosis model to obtain a fault diagnosis result;
the first processing unit is used for generating a repairing execution instruction according to the fault diagnosis result and sending the repairing execution instruction to the execution module so that the execution module repairs the fault node according to the repairing execution instruction.
5. The apparatus according to claim 4, further comprising:
the sending unit is used for sending a fault checking instruction to the fault node to the monitoring module after receiving the execution completion information returned by the execution module;
and the alarm unit is used for receiving the fault verification result returned by the monitoring module, completing the fault node processing flow if the fault verification result is successful in verification, and otherwise, sending an alarm instruction to the alarm module.
6. The apparatus according to claim 4, further comprising:
the third acquisition unit is used for acquiring the acquired real-time log information and real-time performance data;
the second analysis unit is used for analyzing through a preset fault prediction model based on the log information and the real-time performance data to obtain a fault prediction result;
and the second processing unit is used for sending an alarm instruction to the alarm module or ending the fault prediction processing flow according to the fault prediction result.
7. A failed node processing system comprising the failed node processing apparatus of any one of claims 4 to 6, further comprising:
the monitoring module and the execution module are arranged on the node;
wherein:
when a fault node occurs, the fault node processing device acquires fault information uploaded by the monitoring module arranged on the node;
the fault node processing device analyzes according to the fault information, generates and sends a target acquisition instruction to the monitoring module, wherein the target acquisition instruction comprises target log information and target performance data;
the fault node processing device acquires the target information uploaded by the monitoring module, and performs fault type identification on the target information through a preset fault diagnosis model to obtain a fault diagnosis result;
the fault node processing device generates a repair execution instruction according to the fault diagnosis result, and sends the repair execution instruction to the execution module, so that the execution module repairs the fault node according to the repair execution instruction.
8. The failed node processing system of claim 7, further comprising:
an alarm module;
after receiving the execution completion information returned by the execution module, the fault node processing device sends a fault checking instruction for the fault node to the monitoring module;
and the fault node processing device receives a fault verification result returned by the monitoring module, if the fault verification result is successful in verification, the fault node processing flow is completed, and otherwise, an alarm instruction is sent to the alarm module.
9. A device for handling a failed node, the device comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method of any of claims 1-3 according to instructions in the program code.
10. A computer readable storage medium, characterized in that the computer readable storage medium is for storing a program code for executing the method of handling a faulty node according to any one of claims 1-3.
CN202311053906.5A 2023-08-21 2023-08-21 Fault node processing method, device, system, equipment and storage medium Pending CN117041013A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311053906.5A CN117041013A (en) 2023-08-21 2023-08-21 Fault node processing method, device, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311053906.5A CN117041013A (en) 2023-08-21 2023-08-21 Fault node processing method, device, system, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117041013A true CN117041013A (en) 2023-11-10

Family

ID=88602138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311053906.5A Pending CN117041013A (en) 2023-08-21 2023-08-21 Fault node processing method, device, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117041013A (en)

Similar Documents

Publication Publication Date Title
CN107480533A (en) A kind of method, apparatus and device of leak reparation
CN104516812A (en) Method and device for testing software
CN111078556B (en) Application testing method and device
CN111666222A (en) Test method and related device
CN107451046B (en) Method and terminal for detecting threads
CN112131093A (en) Chaos test-based test case automatic generation method and device
CN116303085A (en) Test reason analysis method, device, equipment and storage medium
CN116468382A (en) RPA robot flow management method, device, equipment and storage medium
CN115904950A (en) Test case generation method, device, equipment and storage medium
CN113760592B (en) Node kernel detection method and related device
CN117041013A (en) Fault node processing method, device, system, equipment and storage medium
CN112667868B (en) Data detection method and device
CN116450384A (en) Information processing method and related device
CN113961380A (en) Cross-application repair method, device, equipment and storage medium
CN116627797A (en) Demand standard processing method, device, equipment and storage medium
CN116862473A (en) Bank production application calling relation analysis method, device, equipment and storage medium
CN116303060A (en) ESB call request processing method, device, system, equipment and storage medium
CN116881143A (en) Data object copying abnormality investigation method, device, equipment and storage medium
CN116467192A (en) Automatic test case generation method, device, equipment and storage medium
CN115905008A (en) Method, device, equipment and storage medium for automatically generating test case
CN116501413A (en) Automatic generation interface calling method, device, equipment and storage medium
CN117743015A (en) SQL fault positioning method, device, system and equipment
CN116680185A (en) Test automation execution optimization method, device, equipment and storage medium
CN116680156A (en) Log monitoring processing method, device, equipment and storage medium
CN116303086A (en) End-to-end testing method, configuration method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination