CN114884796B - Fault processing method and device, electronic equipment and storage medium - Google Patents
Fault processing method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN114884796B CN114884796B CN202210694260.8A CN202210694260A CN114884796B CN 114884796 B CN114884796 B CN 114884796B CN 202210694260 A CN202210694260 A CN 202210694260A CN 114884796 B CN114884796 B CN 114884796B
- Authority
- CN
- China
- Prior art keywords
- target service
- target
- self
- processing
- script
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 78
- 238000012544 monitoring process Methods 0.000 claims abstract description 66
- 238000000034 method Methods 0.000 claims abstract description 56
- 238000004458 analytical method Methods 0.000 claims abstract description 54
- 230000002159 abnormal effect Effects 0.000 claims description 24
- 238000002955 isolation Methods 0.000 claims description 7
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000004590 computer program Methods 0.000 description 18
- 230000015654 memory Effects 0.000 description 13
- 230000008569 process Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 10
- 230000004044 response Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 239000003795 chemical substances by application Substances 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004806 packaging method and process Methods 0.000 description 2
- 239000000758 substrate Substances 0.000 description 2
- 238000013475 authorization Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0659—Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/069—Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Debugging And Monitoring (AREA)
Abstract
The disclosure provides a fault handling method, which can be applied to the field of computer technology or the field of finance. The fault processing method comprises the following steps: responding to the alarm information from the monitoring platform, and acquiring target service information contained in the alarm information; determining a target service corresponding to the target service information according to the target service information; analyzing the running state of the target service by using a preset self-healing script corresponding to the target service to obtain an analysis result; and processing the fault of the target service by using the preset self-healing script under the condition that the analysis result shows that the running state of the target service is faulty. The disclosure also provides a fault processing device, equipment and a storage medium.
Description
Technical Field
The present disclosure relates to the field of computer technology or finance, and more particularly, to a fault handling method, apparatus, device, medium, and program product.
Background
The monitoring system is used for monitoring all server states, and comprises a monitoring host memory CPU, a monitoring cluster state, a monitoring log file and the like. When the monitoring system monitors that the server fails, alarm information is sent to inform a system administrator, and the administrator performs fault detection and related processing.
In the process of implementing the inventive concept of the present disclosure, the inventor found that there are at least the following problems in the related art: the fault removal is performed manually, the response time of the processing is long, the processing efficiency is low, and the problems of secondary faults and the like caused by manual reasons due to misoperation are easy to occur.
Disclosure of Invention
In view of the foregoing, the present disclosure provides a fault handling method, apparatus, device, medium, and program product.
According to one aspect of the present disclosure, there is provided a fault handling method including:
responding to the alarm information from the monitoring platform, and acquiring target service information contained in the alarm information;
determining a target service corresponding to the target service information according to the target service information;
analyzing the running state of the target service by using a preset self-healing script corresponding to the target service to obtain an analysis result; and
and under the condition that the analysis result shows that the running state of the target service fails, processing the failure of the target service by utilizing the preset self-healing script.
According to an embodiment of the present disclosure, the analyzing the operation state of the target service by using the preset self-healing script corresponding to the target service, to obtain an analysis result includes:
Acquiring an operation log corresponding to the target service;
and analyzing the operation log by using the preset self-healing script, and obtaining an analysis result of the failure of the operation state of the target service under the condition that the operation log contains a preset keyword.
According to an embodiment of the present disclosure, the analyzing the operation state of the target service by using the preset self-healing script corresponding to the target service, to obtain an analysis result includes:
searching an abnormal file corresponding to the target service in a database, wherein the abnormal file comprises a file generated when the running state of the target service is abnormal;
and under the condition that the abnormal file exists in the database, obtaining an analysis result of the running state failure of the target service.
According to an embodiment of the present disclosure, the processing the failure of the target service by using the preset self-healing script includes:
automatically isolating a target server corresponding to the target service by utilizing the preset self-healing script;
processing the fault of the target service;
and releasing the isolation of the target server when the fault processing of the target service is determined to be completed.
According to an embodiment of the present disclosure, the fault handling method further includes,
and sending a processing result to the monitoring platform under the condition that the fault processing of the target service is determined to be completed.
According to an embodiment of the present disclosure, a target server corresponding to the target service is preconfigured with a self-starting script;
monitoring the running state of the target server by using the self-starting script;
analyzing the use state of the target service by using a configuration file corresponding to the self-starting script when the running state of the target server indicates that the target server is down;
and when the use state of the target service indicates that the target service is in use, starting the target server and the target service by using the self-starting script.
Another aspect of the present disclosure provides a fault handling apparatus, comprising:
the acquisition module is used for responding to the alarm information from the monitoring platform and acquiring target service information contained in the alarm information;
the determining module is used for determining target service corresponding to the target service information according to the target service information;
The first analysis module is used for analyzing the running state of the target service by utilizing a preset self-healing script corresponding to the target service to obtain an analysis result; and
and the processing module is used for processing the faults of the target service by utilizing the preset self-healing script under the condition that the analysis result shows that the running state of the target service is faulty.
Another aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the fault handling method.
Another aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described fault handling method.
Another aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described fault handling method.
According to the embodiment of the disclosure, the preset self-healing script corresponding to the service is preconfigured, after the alarm information sent by the monitoring platform is received, the target service is determined according to the alarm information, and then the preset self-healing script corresponding to the target service is automatically executed, so that the running state of the target service is automatically analyzed, when the running state of the target service sends a fault, the fault of the target service is processed by using the preset self-healing script corresponding to the target service, at least the technical problems that the fault is eliminated manually, the response time is long, the processing efficiency is low, and the secondary fault caused by the misoperation is easy to occur are at least partially overcome, so that the fault processing can be performed at the first time when the fault of the target service occurs, the processing speed is increased, the processing efficiency is improved, and meanwhile, the situation that the secondary fault is caused by the artificial cause is avoided.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an application scenario diagram of a fault handling method, apparatus, device, medium and program product according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a fault handling method according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow chart of a target service operational state analysis method according to an embodiment of the disclosure;
FIG. 4 schematically illustrates a flow chart of a fault handling method according to another embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart of a server failure handling method according to an embodiment of the disclosure;
FIG. 6 schematically illustrates a block diagram of a fault handling apparatus according to an embodiment of the present disclosure; and
fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement a fault handling method according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The monitoring system is used for monitoring all server states, and comprises a monitoring host memory CPU, a monitoring cluster state, a monitoring log file and the like. The purpose of the monitoring system is mainly to prevent faults, so that when faults are about to occur or occur, alarm information is generated to inform a system manager, and the manager performs fault detection and related processing.
However, in the process of implementing the inventive concept of the present disclosure, the inventor found that there are at least the following problems in the related art: troubleshooting is performed manually, response time is long, processing efficiency is low, and problems such as secondary faults caused by manual reasons due to misoperation are easy to occur. Meanwhile, when the alarm information is more, the problem that part of the alarm information is ignored and cannot be processed in time easily occurs.
In view of the above, the present disclosure addresses the above technical problems by pre-configuring a preset self-healing script corresponding to a service, determining a target service according to the alarm information after receiving the alarm information sent by the monitoring platform, and then automatically executing the preset self-healing script corresponding to the target service, thereby automatically analyzing an operation state of the target service, and when the operation state of the target service sends a fault, processing the fault of the target service by using the preset self-healing script corresponding to the target service, thereby overcoming the technical problems of manual fault removal, long response time, low processing efficiency, and easy occurrence of secondary faults caused by misoperation.
Specifically, an embodiment of the present disclosure provides a fault handling method, including: responding to the alarm information from the monitoring platform, and acquiring target service information contained in the alarm information; determining a target service corresponding to the target service information according to the target service information; analyzing the running state of the target service by using a preset self-healing script corresponding to the target service to obtain an analysis result; and processing the fault of the target service by using the preset self-healing script under the condition that the analysis result shows that the running state of the target service is faulty.
It should be noted that the fault processing method and device provided by the embodiments of the present disclosure may be used in the field of computer technology or the field of finance. The fault processing method and the fault processing device provided by the embodiment of the disclosure can be used in any field except the technical field of computers and the financial field. The application fields of the fault processing method and the device provided by the embodiment of the disclosure are not limited.
In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.
In the technical scheme of the disclosure, the processes of acquiring, collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the data all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.
Fig. 1 schematically illustrates an application scenario diagram of a fault handling method, apparatus, device, medium and program product according to an embodiment of the present disclosure.
As shown in fig. 1, an application scenario 100 according to this embodiment may include a network, a server, and a monitoring platform. The network 104 is the medium used to provide the communication links between the servers 101, 102, 103 and the monitoring platform 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The monitoring platform 105 may interact with the servers 101, 102, 103 via the network 104 to receive or send messages, etc. Various services, such as WAS services (for example only), may be installed on the servers 101, 102, 103.
The servers 101, 102, 103 may be servers that provide various services, such as a background management monitoring platform (by way of example only) that provides support for websites that users browse using the servers 101, 102, 103. The background management monitoring platform can analyze and process the received data such as the user request and the like, and feed back the processing result (such as a webpage, information, data or the like acquired or generated according to the user request) to the server.
The monitoring platform 105 may be a monitoring platform for monitoring all server states, such as monitoring a host memory CPU, monitoring cluster states, having monitoring log files, and so on. When the monitoring platform 105 monitors that the service on the server fails, alarm information is sent to inform the system server of performing fault detection and relevant processing.
It should be noted that the fault handling method provided by the embodiments of the present disclosure may be generally performed by the servers 101, 102, 103. Accordingly, the fault handling apparatus provided by the embodiments of the present disclosure may be generally provided in the servers 101, 102, 103. The fault handling method provided by the embodiments of the present disclosure may also be performed by a server or cluster of servers other than the servers 101, 102, 103 and capable of communicating with the monitoring platform 105 and/or the servers 101, 102, 103. Accordingly, the fault handling apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster different from the servers 101, 102, 103 and capable of communicating with the monitoring platform 105 and/or the servers 101, 102, 103.
It should be understood that the number of servers, networks, and monitoring platforms in fig. 1 is merely illustrative. There may be any number of servers, networks, and monitoring platforms, as desired for implementation.
The fault handling method of the disclosed embodiment will be described in detail below with reference to fig. 2 to 5 based on the scenario described in fig. 1.
Fig. 2 schematically illustrates a flow chart of a fault handling method according to an embodiment of the present disclosure.
As shown in fig. 2, the fault handling method of this embodiment includes operations S210 to S240, and the fault handling method may be performed by a server.
In operation S210, in response to the alert information from the monitoring platform, the target service information included in the alert information is acquired.
According to the embodiment of the disclosure, the monitoring platform monitors the running state of the server in real time, and when the monitoring platform detects that the server fails, alarm information is sent to the server.
According to embodiments of the present disclosure, the alert information may include target service information for a failure on the server. The target service information may include, for example, information of a target service name, a target service ID, and the like. The target service information may also include fault information for the target service, such as target service downtime.
In operation S220, a target service corresponding to the target service information is determined according to the target service information.
According to the embodiment of the disclosure, the failed target service is determined according to the target service information so as to process the failure sent by the target service.
In operation S230, the operation state of the target service is analyzed by using a preset self-healing script corresponding to the target service, so as to obtain an analysis result.
According to the embodiment of the disclosure, after the monitoring platform sends the alarm information containing the fault information, a self-healing script needs to be preset to analyze the running state of the target service to determine whether the target service has a fault. For example, when the fault information is that the target service is down, it needs to determine whether the target service is normal shutdown or abnormal down, when the target service is abnormal down, it determines that the target service is down, and when the target service is normal shutdown, it determines that the target service is not down.
According to the embodiment of the disclosure, the preset self-healing script analyzes the running state of the target service and judges whether the target service has faults. Under the condition that the fault of the target service is determined, obtaining an analysis result of the fault of the running state of the target service; and under the condition that the target service is determined to be not faulty, obtaining an analysis result that the running state of the target service is not faulty, and under the condition, processing the target service is not needed.
In operation S240, if the analysis result indicates that the operation state of the target service is faulty, the fault of the target service is processed using the preset self-healing script.
According to the embodiment of the disclosure, the preset self-healing script corresponding to the service is preconfigured, after the alarm information sent by the monitoring platform is received, the target service is determined according to the alarm information, and then the preset self-healing script corresponding to the target service is automatically executed, so that the running state of the target service is automatically analyzed, when the running state of the target service sends a fault, the fault of the target service is processed by using the preset self-healing script corresponding to the target service, at least the technical problems that the fault is eliminated manually, the response time is long, the processing efficiency is low, and the secondary fault caused by the misoperation is easy to occur are at least partially overcome, so that the fault processing can be performed at the first time when the fault of the target service occurs, the processing speed is increased, the processing efficiency is improved, and meanwhile, the situation that the secondary fault is caused by the artificial cause is avoided. According to the embodiment of the disclosure, in the fault processing method provided by the disclosure, the monitoring platform is responsible for monitoring the server and sending alarm information when the server is abnormal, fault analysis and fault processing are not performed, and the fault analysis and fault processing depend on the preset self-healing script, so that the problems of too slow judging faults and issuing processing commands caused by performance, network and other reasons of the monitoring platform when monitoring a large number of servers are avoided.
According to an embodiment of the present disclosure, the fault handling method further includes: and an agent is arranged on the server, monitors the running process of the service on the server in real time, and sends alarm information to the server when the agent monitors that the running process of the service does not exist. It should be noted that, the agent monitors whether the running process of the service exists, and when the running process does not exist, the agent sends alarm information, that is, the service is normally stopped and abnormally stopped, so that a preset self-healing script is required to determine whether the target service is abnormally stopped.
According to an embodiment of the present disclosure, the analyzing the operation state of the target service by using the preset self-healing script corresponding to the target service, to obtain an analysis result includes: acquiring an operation log corresponding to the target service; and analyzing the operation log by using the preset self-healing script, and obtaining an analysis result of the failure of the operation state of the target service under the condition that the operation log contains a preset keyword.
According to embodiments of the present disclosure, the travel log corresponding to the target service may include, for example, a travel log generated by the target service during the operation.
According to an embodiment of the present disclosure, the preset keywords may include, for example, keywords in a previously set running log generated when the target service transmits a failure.
In one embodiment, the target service may include a WAS (Windows Azure Storage, a cloud storage system) service, the fault information of the target service included in the target service information may include a downtime of the WAS service, the running log corresponding to the WAS service may include a newly generated java core log, and the preset keyword may include gpf and record.
Fig. 3 schematically illustrates a flowchart of a target service operational state analysis method according to an embodiment of the present disclosure.
As shown in fig. 3, in this embodiment, the target service is a WAS service, the target service information includes that failure information of the target service is a downtime of the WAS service, and the target service operation state analysis method of this embodiment includes operations S301 to S304.
In operation S301, java core log data newly generated by the WAS service in the running process is acquired.
In operation S302, it is analyzed whether gpf or abart is included in the java core log data. In the case where gPf or abart is included in the java core log data, operation S303 is performed, and in the case where gpf or abart is not included in the java core log data, operation S304 is performed.
In operation S303, an analysis result of the downtime of the WAS service is obtained.
In operation S304, an analysis result of the WAS service that is not down is obtained.
According to an embodiment of the present disclosure, the analyzing the operation state of the target service by using the preset self-healing script corresponding to the target service, to obtain an analysis result includes: searching an abnormal file corresponding to the target service in a database, wherein the abnormal file comprises a file generated when the running state of the target service is abnormal; and under the condition that the abnormal file exists in the database, obtaining an analysis result of the running state failure of the target service.
According to the embodiment of the disclosure, when the running state of the target service is abnormal, an abnormal file is generated, and when the abnormal file is contained in the database, the target service corresponding to the abnormal file is indicated to be faulty.
In one embodiment, the target service may include, for example, a WAS service, the target service information may include, for example, that the fault information of the target service is that the WAS service is down, and the abnormal file may be, for example, a file at the beginning of the core. When the preset self-healing script searches that the database contains the file at the beginning of the core, the WAS service is indicated to have downtime fault.
According to an embodiment of the present disclosure, the processing the failure of the target service by using the preset self-healing script includes: automatically isolating a target server corresponding to the target service by utilizing the preset self-healing script; processing the fault of the target service; and releasing the isolation of the target server when the fault processing of the target service is determined to be completed.
According to an embodiment of the present disclosure, the fault handling method further includes sending a handling result to the monitoring platform when it is determined that the fault handling of the target service is completed.
Fig. 4 schematically illustrates a flow chart of a fault handling method according to another embodiment of the present disclosure.
As shown in fig. 4, the fault handling method of this embodiment includes operations S401 to S410.
In operation S401, in response to the alert information from the monitoring platform, target service information included in the alert information is acquired.
In operation S402, a target service corresponding to the target service information is determined according to the target service information.
In operation S403, a running log corresponding to the target service is acquired.
In operation S404, it is determined whether the running log includes a preset keyword. Operations S405 to S409 are performed when the preset keyword is included in the operation log, and operation S410 is performed when the preset keyword is not included in the operation log.
In operation S405, it is determined that the target service fails.
In operation S406, a target server corresponding to the target service is automatically isolated using a preset self-healing script.
In operation S407, a failure of the target service is handled.
In operation S408, when it is determined that the failure processing of the target service is completed, the target server is released from isolation, and the failure processing of the target service is completed.
In operation S409, a failure processing result of the target service is transmitted to the monitoring platform.
In operation S410, an analysis result that the target service has not failed is fed back to the monitoring platform.
According to the embodiment of the disclosure, the preset self-healing script corresponding to the service is preconfigured, after the alarm information sent by the monitoring platform is received, the target service is determined according to the alarm information, and then the preset self-healing script corresponding to the target service is automatically executed, so that the running state of the target service is automatically analyzed, when the running state of the target service sends a fault, the target server corresponding to the target service is automatically isolated by the preset self-healing script corresponding to the target service, so that the fault of the target service is processed, the technical problems that the fault is eliminated manually, the processing response time is long, the processing efficiency is low, the secondary fault caused by misoperation is easy to occur, the first time processing of the fault of the target service is realized, the processing speed is increased, the efficiency is improved, the condition that the secondary fault is generated by the artificial reason is avoided, and the production problem caused by the fact that the target service does not run for a long time is avoided.
According to the embodiment of the disclosure, the fault processing method provided by the disclosure can be used for processing at the first time when the fault occurs, so that the processing speed is increased, and the processing efficiency is improved.
According to an embodiment of the present disclosure, a target server corresponding to the target service is preconfigured with a self-starting script; monitoring the running state of the target server by using the self-starting script; analyzing the use state of the target service by using a configuration file corresponding to the self-starting script when the running state of the target server indicates that the target server is down; and when the use state of the target service indicates that the target service is in use, starting the target server and the target service by using the self-starting script.
According to the embodiment of the disclosure, in the case that the use state of the target service indicates that the target service is not in use, no processing is performed.
Fig. 5 schematically illustrates a flow chart of a server failure handling method according to an embodiment of the disclosure.
As shown in fig. 5, the server failure processing method of this embodiment includes operations S501 to S504.
In operation S501, the running state of the target server is monitored using the self-starting script provided on the target server.
In operation S502, in case that the operation state of the target server indicates that the target server is down, the use state of the target service is analyzed using the configuration file corresponding to the self-starting script.
In operation S503, in the case where the use state of the target service indicates that the target service is in use, the target server and the target service are started using the self-start script.
In operation S504, the processing result that the target server and the target service have been started is transmitted to the monitoring platform.
According to the embodiment of the disclosure, the running state of the target server is automatically monitored through the self-starting script configured on the target server in advance, and the self-monitoring of the target server can be realized independently of the monitoring platform. And under the condition that the self-starting script monitors that the target server is down, the self-starting script can be utilized to automatically start the target server and the service, so that the problems of judging faults and too slow issuing of processing commands caused by the performance, network and other reasons of a monitoring platform when monitoring a large number of servers are avoided. In addition, the self-starting script can also analyze the use state of the target service on the target server by utilizing the configuration file corresponding to the self-starting script, and then the target service is started in an emergency mode when the use state of the target service is in use, and the target service is not required to be started in an emergency mode when the use state of the target service is not in use, so that the influence caused by the fact that an administrator misuses the target service is avoided.
According to the embodiment of the disclosure, the fault processing method does not need to increase additional expenditure, judges whether WAS service is down or not by means of the existing log, judges whether WAS in the server is used or not by means of the simple configuration file, and automatically runs the self-starting script.
It should be noted that, unless there is an execution sequence between different operations or an execution sequence between different operations in technical implementation, the execution sequence between multiple operations may be different, and multiple operations may also be executed simultaneously in the embodiment of the disclosure.
Based on the fault processing method, the disclosure further provides a fault processing device. The device will be described in detail below in connection with fig. 6.
Fig. 6 schematically shows a block diagram of a fault handling apparatus according to an embodiment of the present disclosure.
As shown in fig. 6, the fault handling apparatus 600 of this embodiment includes an acquisition module 610, a determination module 620, a first analysis module 630, and a processing module 640.
The acquiring module 610 is configured to, in response to the alert information from the monitoring platform, acquire target service information included in the alert information. In an embodiment, the obtaining module 610 may be configured to perform the operation S210 described above, which is not described herein.
The determining module 620 is configured to determine, according to the target service information, a target service corresponding to the target service information. In an embodiment, the determining module 620 may be configured to perform the operation S220 described above, which is not described herein.
The first analysis module 630 is configured to analyze an operation state of the target service by using a preset self-healing script corresponding to the target service, so as to obtain an analysis result. In an embodiment, the first analysis module 630 may be used to perform the operation S230 described above, which is not described herein.
The processing module 640 is configured to process, when the analysis result indicates that the operation state of the target service fails, the failure occurring in the target service by using the preset self-healing script. In an embodiment, the processing module 640 may be configured to perform the operation S240 described above, which is not described herein.
According to an embodiment of the present disclosure, the first analysis module includes: an acquisition unit and an analysis unit.
And the acquisition unit is used for acquiring the running log corresponding to the target service.
And the analysis unit is used for analyzing the operation log by utilizing the preset self-healing script, and obtaining an analysis result of the failure of the operation state of the target service under the condition that the operation log contains the preset keywords.
According to an embodiment of the present disclosure, the first analysis module further includes: a search unit and a determination unit.
And the searching unit is used for searching the abnormal file corresponding to the target service in the database, wherein the abnormal file comprises a file generated when the running state of the target service is abnormal.
And the determining unit is used for obtaining an analysis result of the running state failure of the target service under the condition that the abnormal file exists in the database.
According to an embodiment of the present disclosure, the processing module includes: isolation unit, processing unit and release unit.
And the isolation unit is used for automatically isolating the target server corresponding to the target service by utilizing the preset self-healing script.
And the processing unit is used for processing the faults of the target service.
And a releasing unit configured to release the isolation of the target server when it is determined that the failure processing of the target service is completed.
According to an embodiment of the disclosure, the fault handling device further includes a sending module.
And the sending module is used for sending a processing result to the monitoring platform under the condition that the fault processing of the target service is determined to be completed.
According to an embodiment of the present disclosure, a target server corresponding to the target service is preconfigured with a self-start script.
According to the embodiment of the disclosure, the fault processing device further comprises a monitoring module, a second analysis module and a self-starting module.
And the monitoring module is used for monitoring the running state of the target server by utilizing the self-starting script.
And the second analysis module is used for analyzing the use state of the target service by utilizing the configuration file corresponding to the self-starting script when the running state of the target server indicates that the target server is down.
And the self-starting module is used for starting the target server and the target service by using the self-starting script when the use state of the target service indicates that the target service is in use.
Any number of the modules, units, or at least some of the functionality of any number of the modules, units, or units according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, units according to embodiments of the present disclosure may be implemented as split into multiple modules. Any one or more of the modules, units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware in any other reasonable manner of integrating or packaging the circuits, or in any one of or in any suitable combination of three of software, hardware, and firmware. Alternatively, one or more of the modules, units according to embodiments of the disclosure may be at least partially implemented as computer program modules, which when executed, may perform the corresponding functions.
Any of the acquisition module 610, the determination module 620, the first analysis module 630, and the processing module 640 may be combined in one module to be implemented, or any of them may be split into a plurality of modules, according to an embodiment of the present disclosure. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the acquisition module 610, the determination module 620, the first analysis module 630, and the processing module 640 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable way of integrating or packaging circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of the acquisition module 610, the determination module 620, the first analysis module 630, and the processing module 640 may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
It should be noted that, in the embodiments of the present disclosure, the fault handling apparatus portion corresponds to the fault handling method portion in the embodiments of the present disclosure, and the description of the fault handling apparatus portion specifically refers to the fault handling method portion and is not described herein again.
Fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement a fault handling method according to an embodiment of the disclosure.
As shown in fig. 7, an electronic device 700 according to an embodiment of the present disclosure includes a processor 701 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. The processor 701 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 701 may also include on-board memory for caching purposes. The processor 701 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flows according to embodiments of the disclosure.
In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are stored. The processor 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. The processor 701 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 702 and/or the RAM 703. Note that the program may be stored in one or more memories other than the ROM 702 and the RAM 703. The processor 701 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, the electronic device 700 may further include an input/output (I/O) interface 705, the input/output (I/O) interface 705 also being connected to the bus 704. The electronic device 700 may also include one or more of the following components connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 702 and/or RAM 703 and/or one or more memories other than ROM 702 and RAM 703 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to implement the fault handling methods provided by embodiments of the present disclosure.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 701. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed over a network medium in the form of signals, downloaded and installed via the communication section 709, and/or installed from the removable medium 711. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 701. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.
Claims (8)
1. A fault handling method, comprising:
responding to alarm information from a monitoring platform, and acquiring target service information contained in the alarm information;
Determining target service corresponding to the target service information according to the target service information;
analyzing the running state of the target service by using a preset self-healing script corresponding to the target service to obtain an analysis result; and
under the condition that the analysis result shows that the running state of the target service fails, processing the failure of the target service by utilizing the preset self-healing script;
the fault processing method further comprises the following steps:
a self-starting script is pre-configured on a target server corresponding to the target service;
monitoring the running state of the target server by utilizing the self-starting script;
analyzing the use state of the target service by using a configuration file corresponding to the self-starting script under the condition that the running state of the target server indicates that the target server is down;
starting the target server and the target service by using the self-starting script under the condition that the using state of the target service indicates that the target service is in use; and if the use state of the target service indicates that the target service is not used, not processing.
2. The method of claim 1, wherein the analyzing the operation state of the target service by using the preset self-healing script corresponding to the target service, and obtaining the analysis result includes:
acquiring an operation log corresponding to the target service;
and analyzing the running log by using the preset self-healing script, and obtaining an analysis result of the running state failure of the target service under the condition that the running log contains a preset keyword.
3. The method of claim 1, wherein the analyzing the operation state of the target service by using the preset self-healing script corresponding to the target service, and obtaining the analysis result includes:
searching an abnormal file corresponding to the target service in a database, wherein the abnormal file comprises a file generated when the running state of the target service is abnormal;
and under the condition that the abnormal file exists in the database, obtaining an analysis result of the running state failure of the target service.
4. The method of claim 1, wherein the processing the failure of the target service using the preset self-healing script comprises:
Automatically isolating a target server corresponding to the target service by utilizing the preset self-healing script;
processing the faults of the target service;
and under the condition that the fault processing of the target service is determined to be completed, the isolation of the target server is released.
5. The method of claim 1, further comprising,
and under the condition that the fault processing of the target service is determined to be completed, sending a processing result to the monitoring platform.
6. A fault handling apparatus comprising:
the acquisition module is used for responding to the alarm information from the monitoring platform and acquiring target service information contained in the alarm information;
the determining module is used for determining target service corresponding to the target service information according to the target service information;
the first analysis module is used for analyzing the running state of the target service by utilizing a preset self-healing script corresponding to the target service to obtain an analysis result; and
the processing module is used for processing the faults of the target service by utilizing the preset self-healing script under the condition that the analysis result shows that the running state of the target service is faulty;
The fault handling apparatus further includes:
a self-starting script is pre-configured on a target server corresponding to the target service;
monitoring the running state of the target server by utilizing the self-starting script;
analyzing the use state of the target service by using a configuration file corresponding to the self-starting script under the condition that the running state of the target server indicates that the target server is down;
starting the target server and the target service by using the self-starting script under the condition that the using state of the target service indicates that the target service is in use; and if the use state of the target service indicates that the target service is not used, not processing.
7. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-5.
8. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210694260.8A CN114884796B (en) | 2022-06-16 | 2022-06-16 | Fault processing method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210694260.8A CN114884796B (en) | 2022-06-16 | 2022-06-16 | Fault processing method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114884796A CN114884796A (en) | 2022-08-09 |
CN114884796B true CN114884796B (en) | 2024-01-30 |
Family
ID=82680886
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210694260.8A Active CN114884796B (en) | 2022-06-16 | 2022-06-16 | Fault processing method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114884796B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115437889B (en) * | 2022-11-08 | 2023-03-10 | 统信软件技术有限公司 | Emergency processing method, system and computing equipment |
CN115981857B (en) * | 2022-12-23 | 2023-09-19 | 摩尔线程智能科技(北京)有限责任公司 | Fault analysis system |
CN117806978B (en) * | 2024-03-01 | 2024-05-14 | 腾讯科技(深圳)有限公司 | Cluster abnormity testing method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110401973A (en) * | 2019-08-19 | 2019-11-01 | Oppo广东移动通信有限公司 | Network search method and device, terminal, storage medium |
CN112769922A (en) * | 2020-12-31 | 2021-05-07 | 南京视察者智能科技有限公司 | Device and method for self-starting micro-service cluster |
CN113342560A (en) * | 2021-06-04 | 2021-09-03 | 中国工商银行股份有限公司 | Fault processing method, system, electronic equipment and storage medium |
CN113765687A (en) * | 2020-06-05 | 2021-12-07 | 网联清算有限公司 | Fault alarm method, device, equipment and storage medium of server |
-
2022
- 2022-06-16 CN CN202210694260.8A patent/CN114884796B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110401973A (en) * | 2019-08-19 | 2019-11-01 | Oppo广东移动通信有限公司 | Network search method and device, terminal, storage medium |
CN113765687A (en) * | 2020-06-05 | 2021-12-07 | 网联清算有限公司 | Fault alarm method, device, equipment and storage medium of server |
CN112769922A (en) * | 2020-12-31 | 2021-05-07 | 南京视察者智能科技有限公司 | Device and method for self-starting micro-service cluster |
CN113342560A (en) * | 2021-06-04 | 2021-09-03 | 中国工商银行股份有限公司 | Fault processing method, system, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114884796A (en) | 2022-08-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114884796B (en) | Fault processing method and device, electronic equipment and storage medium | |
US20190372804A1 (en) | Method and apparatus for operating smart network interface card | |
US8843623B2 (en) | Methods, apparatuses, and computer program products for automated remote data collection | |
CN111698131A (en) | Information processing method, information processing apparatus, electronic device, and medium | |
CN110896362B (en) | Fault detection method and device | |
CN110008092B (en) | Virtual machine safety monitoring method, device, equipment and readable storage medium | |
CN111611086A (en) | Information processing method, information processing apparatus, electronic device, and medium | |
CN112882883B (en) | Shutdown test method and device, electronic equipment and computer readable storage medium | |
CN109639755B (en) | Associated system server decoupling method, device, medium and electronic equipment | |
US9032014B2 (en) | Diagnostics agents for managed computing solutions hosted in adaptive environments | |
CN116841902A (en) | Health state checking method, device, equipment and storage medium | |
CN116450461A (en) | Method, device, equipment and medium for processing hard disk faults of storage cluster | |
CN114244700B (en) | Port processing method and device, electronic equipment and computer readable storage medium | |
CN115080434A (en) | Case execution method, device, equipment and medium | |
US11036624B2 (en) | Self healing software utilizing regression test fingerprints | |
CN113778798A (en) | Server control method, server control device, electronic device, and storage medium | |
CN114024867A (en) | Network anomaly detection method and device | |
CN113485930A (en) | Business process verification method, device, computer system and readable storage medium | |
CN115190008B (en) | Fault processing method, fault processing device, electronic equipment and storage medium | |
CN115292100A (en) | Database fault processing method and device, electronic equipment and storage medium | |
CN116820526B (en) | Operating system upgrading method, device, equipment and storage medium | |
CN114640585B (en) | Resource updating method and device, electronic equipment and storage medium | |
CN116483566A (en) | Resource processing method and device for server, electronic equipment and storage medium | |
CN116795599A (en) | Proxy process exception self-recovery method and device | |
CN116136818A (en) | Health inspection method, device, equipment and medium for message queue |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |