CN108989132A - Fault warning processing method, system and computer readable storage medium - Google Patents

Fault warning processing method, system and computer readable storage medium Download PDF

Info

Publication number
CN108989132A
CN108989132A CN201810979619.XA CN201810979619A CN108989132A CN 108989132 A CN108989132 A CN 108989132A CN 201810979619 A CN201810979619 A CN 201810979619A CN 108989132 A CN108989132 A CN 108989132A
Authority
CN
China
Prior art keywords
fault warning
fault
failure cause
warning information
alarm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810979619.XA
Other languages
Chinese (zh)
Inventor
程志峰
卢道和
周杰
谢波
胡盼盼
杨俊杰
饶俊明
龚洵峰
李云龙
朱敏毅
汪小苗
孟凡震
汪晓雪
周琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201810979619.XA priority Critical patent/CN108989132A/en
Publication of CN108989132A publication Critical patent/CN108989132A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of fault warning processing methods, comprising the following steps: regulation engine receives the various dimensions fault warning information of monitor supervision platform input;Logic judgment is carried out to the fault warning information based on configuration rule, to determine the failure cause of this alarm;Based on the failure cause, operation/maintenance data library is retrieved, the corresponding troubleshooting scheme of the failure cause is obtained.The invention also discloses a kind of fault warning processing system and computer readable storage mediums.The present invention improves the timeliness and accuracy of fault warning processing, enhances failure O&M efficiency.

Description

Fault warning processing method, system and computer readable storage medium
Technical field
The present invention relates to failure O&M technical field more particularly to a kind of fault warning processing methods, system and computer Readable storage medium storing program for executing.
Background technique
The network equipment or application are broken down and can usually be alerted in time by on-line monitoring mode.When accusing When alert, the prior art is usually that there are failures for directly prompt, then allows operation maintenance personnel that detection is gone to determine failure cause and analyzes It is out of order after exclusion program and debugs again, the whole story may need to take a long time, and then timeliness is not high.
Summary of the invention
The main purpose of the present invention is to provide a kind of fault warning processing method, system and computer-readable storage mediums Matter, it is intended to solve the technical issues of how improving fault warning processing timeliness.
To achieve the above object, the present invention provides a kind of fault warning processing method, the fault warning processing method packet Include following steps:
Regulation engine receives the various dimensions fault warning information of monitor supervision platform input;
Logic judgment is carried out to the fault warning information based on configuration rule, to determine the failure cause of this alarm;
Based on the failure cause, operation/maintenance data library is retrieved, the corresponding troubleshooting scheme of the failure cause is obtained.
Optionally, described that logic judgment is carried out to the fault warning information based on configuration rule, to determine this alarm Failure cause the step of include:
The warning information of various dimensions is compared with configuration rule, to search the alarm with various dimensions in configuration rule The information of information same field;
According to the mapping relations of preset information and failure cause, the corresponding failure cause of the information searched is determined.
Optionally, it is based on the failure cause described, retrieves operation/maintenance data library, obtain the corresponding event of the failure cause After the step of hindering processing scheme, the fault warning processing method includes:
Fault point is corresponded to based on the corresponding scheme automatic processing that automatically processes, wherein automatic processing includes automatic extensive Multiple fault point or automatic separating fault point.
Optionally, described to be based on the failure cause when failure cause is predetermined registration operation, operation/maintenance data library is retrieved, is obtained After the step of troubleshooting scheme corresponding to the failure cause, the fault warning processing method includes:
It calls robot to automatically generate examination and approval document to predetermined registration operation, examination and approval document is sent to default node and is examined, And execute the corresponding operating process of the examination and approval document automatically after the completion of examination & approval.
Optionally, it is based on the failure cause described, retrieves operation/maintenance data library, obtain the corresponding event of the failure cause After the step of hindering processing scheme, the fault warning processing method further include:
According to the fault warning information, the failure cause and the troubleshooting scheme, this alarm is generated Difference notice words art;
Notice where calling robot that the notice words art is pushed to corresponding partner and O&M side respectively respectively Group.
Optionally, in the calling robot, by notice words art, to push to corresponding partner respectively each with O&M side After the step of from the notice group at place, the fault warning processing method further include:
During fault recovery, the regulation engine caches identical fault warning information;
Fault recovery progress notification is generated, and every preset duration is pushed to the fault recovery progress notification corresponding Notice group where partner.
Optionally, the fault warning processing method further include:
When receiving the various dimensions fault warning information of monitor supervision platform input, the regulation engine is also flat from specified alarm Platform pulls all application class warning information that associated alarm source is alerted with this, for being accurately positioned the failure of this alarm Reason.
Optionally, before the step of regulation engine receives the various dimensions fault warning information of monitor supervision platform input, The fault warning processing method further include:
The monitor supervision platform collects the fault warning information that a variety of alarm sources report to be associated alarm, wherein described Alerting Source Type includes: host, network, database, platform, application program;
Based on preset threshold decision rule, regulation engine described in the fault warning information input of preset threshold will be above.
Further, to achieve the above object, the present invention also provides a kind of fault warning processing system, the fault warnings Processing system includes:
Monitor supervision platform, for inputting various dimensions fault warning information to regulation engine;
Regulation engine, for carrying out logic judgment to the fault warning information based on configuration rule, to determine this announcement Alert failure cause;Based on the failure cause, operation/maintenance data library is retrieved, the corresponding troubleshooting side of the failure cause is obtained Case.
Optionally, the regulation engine is also used to: fault point is corresponded to based on the corresponding scheme automatic processing that automatically processes, Wherein, automatic processing includes automatic recovery fault point or automatic separating fault point.
Optionally, when failure cause is predetermined registration operation, the regulation engine is also used to: calling robot to predetermined registration operation Examination and approval document is automatically generated, examination and approval document is sent to default node and is examined, and executes described examine automatically after the completion of examination & approval Criticize single corresponding operating process.
Optionally, the regulation engine is also used to: according to the fault warning information, the failure cause and the event Hinder processing scheme, generates the different notice words arts of this alarm;Call robot that notice words art is pushed to correspondence respectively Partner and O&M side respectively where notice group.
Optionally, the regulation engine is also used to:
During fault recovery, identical fault warning information is cached;
Fault recovery progress notification is generated, and every preset duration is pushed to the fault recovery progress notification corresponding Notice group where partner.
Optionally, the regulation engine is also used to:
Receive monitor supervision platform input various dimensions fault warning information when, also pulled from specified alarm platform and this All application class warning information of associated alarm source are alerted, for being accurately positioned the failure cause of this alarm.
Optionally, the monitor supervision platform is also used to:
The fault warning information that a variety of alarm sources report is collected to be associated alarm, wherein the alarm Source Type packet It includes: host, network, database, platform, application program;
Based on preset threshold decision rule, regulation engine described in the fault warning information input of preset threshold will be above.
Optionally, the fault warning processing system further include:
Robot, for when receiving the call request of the regulation engine, the regulation engine is exported described in Notice words art pushes to the notice group at the respective place of corresponding partner and O&M side respectively.
Further, to achieve the above object, the present invention also provides a kind of computer readable storage medium, the computers It is stored with fault warning processing routine on readable storage medium storing program for executing, is realized such as when the fault warning processing routine is executed by processor The step of fault warning processing method described in any of the above embodiments.
The present invention will carry out a system in the fault warning information input regulation engine for multiple dimensions that a variety of alarm sources export Complicated logic judgment is arranged, and then realizes the rapid computations of failure cause, improves the timeliness of troubleshooting.The present invention can also be While alarm, further automatically retrieval goes out corresponding troubleshooting scheme, and then improves the timeliness of alarm on the whole, Shorten fault location time.
Detailed description of the invention
Fig. 1 is the flow diagram of fault warning processing method first embodiment of the present invention;
Fig. 2 is the flow diagram of fault warning processing method second embodiment of the present invention;
Fig. 3 is the flow diagram of fault warning processing method 3rd embodiment of the present invention;
Fig. 4 is the flow diagram of fault warning processing method fourth embodiment of the present invention;
Fig. 5 is the functional block diagram of fault warning processing system first embodiment of the present invention;
Fig. 6 is the functional block diagram of fault warning processing system second embodiment of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.
The present invention provides a kind of fault warning processing method.
Referring to Fig.1, Fig. 1 is the flow diagram of fault warning processing method first embodiment of the present invention.In the present embodiment, The fault warning processing method the following steps are included:
Step S10, regulation engine receive the various dimensions fault warning information of monitor supervision platform input;
In the present embodiment, regulation engine can be carried out logic judgment based on parameter of the built-in rule to input and export judgement As a result.It is optional to use experian regulation engine defined function, script is write, creation is advised for the operation of fault warning informix Then, and then logic judgment can be carried out to various dimensions fault warning information.
The present embodiment is unlimited for the specifically setting of monitor supervision platform, be chosen as based on Falcon frame carry out secondary development and Monitor supervision platform generated is configured, which can collect a variety of warning information and realize second grade alarm.
To realize the accurate judgement to alarm failure reason, preferably believed by the fault warning of a variety of dimensions in the present embodiment Breath carries out total failure alert analysis.Multiple dimensions are embodied in, by monitor supervision platform collect host, network, database, platform and The warning information generated in various application programs promotes the standard of fault warning analysis to cover all possible failure cause True property, and then alert analysis event is saved, improve alarming processing timeliness.
Further alternative, horn of plenty alerts source information, and then regulation engine is made to possess more comprehensive information positioning failure root Source, therefore, when receiving the various dimensions fault warning information of monitor supervision platform input, regulation engine is also drawn from specified alarm platform All application class warning information that associated alarm source is alerted with this are taken, so that the failure for being accurately positioned this alarm is former Cause.Wherein, all application class warning information for alerting associated alarm source with this are pulled from specified alarm platform, are logical It crosses the keyword pulled in application and applies class warning information to determine.In the present embodiment, it should be noted that although monitor supervision platform It can all be obtained from application with alarm platform using class warning information, but monitor supervision platform crawl is that system is more running Warning information such as physical index, and alert that platform pulls is the running log of system and each trading volume, time delay, success rate and The warning information such as the exception reported.
It should be understood that monitor supervision platform from the positions such as host, network, database, platform collect warning information when, host, network, The warning information that the positions such as database, platform and java virtual machine generate is the warning information of base-level, and warning information obtains Dimension it is more single, although monitor supervision platform can also pull part warning information from application, the information pulled is only Some physical indexs.It is the alarm of application level from the warning information that pulls of alarm platform by regulation engine in the present embodiment Information, and the warning information type pulled is more abundant, by the acquisition of the warning information of two kinds of different stages, can be improved Position the accuracy of the failure cause of this alarm.
Step S20 carries out logic judgment to the fault warning information based on configuration rule, to determine the event of this alarm Hinder reason;
In the present embodiment, regulation engine judges rule based on built-in alarm, quickly carries out to multidimensional fault warning information Logic judgment, in the present embodiment, the embodiment of the step S20 includes:
1) mode one, the warning information of various dimensions is compared with configuration rule, with searched in configuration rule with it is more The information of the warning information same field of dimension;
According to the mapping relations of preset information and failure cause, the corresponding failure cause of the information searched is determined.
2) mode two, in the warning information of various dimensions, according to the warning information of each dimension determine break down source Head, the source are any one or more in host, network, database, platform or application;
When the source for generating failure is 1, failure cause that the failure cause which is generated is alerted as this;
When the source for generating failure includes multiple, each submodule in warning information is generated to each source and is divided Analysis, with the final score value for the failure cause that each source of determination generates, by the corresponding failure cause in the highest source of final score value Failure cause as this alarm.Wherein, it generates each submodule in warning information to source to analyze, to determine source The mode of the final score value for the failure cause that head generates includes: the power for the warning information for identifying that each submodule in the source generates Weight values and actual value calculate the source according to the weighted value and actual value of the warning information of the generation of each submodule Final score value.
It is appreciated that since warning information dimension is more, the failure cause that is obtained by logic judgment there may be Multiple, therefore, under the present embodiment, preference rule engine carries out comprehensive weight by the failure cause to each seed module It scores and sorts, so that exporting this alerts most accurate failure cause.For example, the public network packet loss of system cooperating side A is greater than 30%, position is Guangzhou, meanwhile, the public network packet loss of system cooperating side B, C are greater than 50%, and position is also Guangzhou. There are also the alarms that there is transaction success rate decline using partner simultaneously this moment, therefore, the comprehensive all dimensions of final regulation engine Fault warning information carry out logical operation, find this failure cause of In Guangzhou Area public network packet loss score highest, thus Obtain the public network this alarmed most possible failure cause is In Guangzhou Area it is abnormal rather than because system faults itself causes to trade Success rate decline;For another operation system executes effect because of SQL (Structured Query Language, structured query language) Rate is low and returns to Mass Result, inquires slowly to generate database, and database IO (Input/Output) is abnormal, and java is virtual Machine Indexes Abnormality has dragged slowly entire business processing efficiency, and upstream and downstream system average delay increases, and success rate decline passes through rule Engine judges the source system led to the problem of, then weights (such as database IO, CPU and JVM to every alarm index of source original system The GC time, queue depth's alarm weight successively successively decreases, and while being higher than operation system extends to the alarm such as success rate), it is final right Items alarm index is weighted and averaged to obtain the final score value of source system, and the failure of the highest source system of final score value is former Because of the failure cause alerted as this.
Step S30 is based on the failure cause, retrieves operation/maintenance data library, obtains at the corresponding failure of the failure cause Reason scheme.
In the present embodiment, for the quick exclusion for realizing alarm failure, regulation engine is returned while determining failure cause Corresponding troubleshooting scheme out.It is preferably provided in operation/maintenance data library and the operation/maintenance data library to be stored with and covers all possible event Hinder the corresponding processing scheme (O&M standard operating procedure SOP (Standard Operating Procedure)) of reason.
In this embodiment, the troubleshooting scheme is chosen as manual processing scheme and automatically processes scheme, wherein uses Automatically process the mode of scheme processing are as follows: fault point is corresponded to restorer based on the corresponding scheme automatic processing that automatically processes System, automatic processing include restoring fault point or automatic separating fault point automatically, specifically: according to determining troubleshooting scheme Corresponding script is called, the node for generating failure cause is isolated by the script, it then will be in the node The state dowm of director's feelings gets off, and goes analysis program in machine code section, to judge practical reason, wherein reason include disk overfill, Hostdown etc. determines the practical corresponding processing hand of reason finally according to the mapping relations of practical reason and processing means Section, and execute the processing means.For example, when practical reason is disk overfill, if the corresponding processing means of disk overfill It is that automatic remove caches, then executes the automatic mode for removing caching and carry out Disk Cleanup;For another example java program emerged in operation line Journey deadlock situation, automatically find the problem after can automatism isolation correspond to node.
Regulation engine passes through the O&M SOP stored in searching database, and then can get failure cause and correspond to troubleshooting Scheme, and realize that the combination of warning information+failure cause+troubleshooting scheme exports, to promote the processing of failure whole process High-timeliness.It is inconvenient there are being obtained in txt, word, excel of different directories compared to conditional electronic version SOP, even if each SOP Unified storage, but be also inconvenient to retrieve when failure occurs, and the processing of the failure whole process of the present embodiment can greatly shorten failure Recovery time improves O&M efficiency.For example, synchronization, there are N number of system alarm, regulation engine obtains announcement by logic judgment Alert basic failure cause is that some city XX runner public-network is abnormal, and retrieves operation/maintenance data library, passes through abnormal IP+ system name Claim and is matched to corresponding SOP.
In addition, when failure cause is predetermined registration operation, after the step S30, the method also includes:
It calls robot to automatically generate examination and approval document to predetermined registration operation, examination and approval document is sent to default node and is examined, And execute the corresponding operating process of the examination and approval document automatically after the completion of examination & approval.
Wherein, the predetermined registration operation includes deleting data, and multiple nodes are isolated, restart the high-risk operation such as core system, When failure cause is high-risk operation, robot is called to automatically generate examination and approval document to these high-risk operations, then examination and approval document is sent to Default node (being chosen as leader node) is examined, and executes after the completion of examination & approval that the examination and approval document is corresponding to be operated automatically Journey.
The present embodiment will carry out one in the fault warning information input regulation engine for multiple dimensions that a variety of alarm sources export Serial complicated logic judgment, and then realize the rapid computations of failure cause, improve the timeliness of troubleshooting.The present embodiment is also Can alarm while, further automatically retrieval goes out corresponding troubleshooting scheme, so improve on the whole alarm when Effect property, shortens fault location time.
It is the flow diagram of fault warning processing method second embodiment of the present invention referring to Fig. 2, Fig. 2.Based on above-mentioned One embodiment, in the present embodiment, after above-mentioned steps S30, the fault warning processing method further include:
Step S40 generates this according to the fault warning information, the failure cause and the troubleshooting scheme The different notice words arts of secondary alarm;
Step S50 calls robot that notice words art is pushed to corresponding partner and respective institute of O&M side respectively Notice group.
In the present embodiment, regulation engine needs after obtaining fault warning information, failure cause and troubleshooting scheme Above- mentioned information are pushed to partner and O&M side.For convenient for partner and O&M side can easy fault details, rule draws It holds up further by fault warning information, failure cause and troubleshooting scheme according to setting form collator as corresponding notice words Art.
In the present embodiment, first builds partner and group and O&M side is notified to notify group that (different products build difference respectively Partner notify group or different O&M sides to notify group), then such as wechat group passes through interface again and calls robot (one Kind application program), such as the WeChat robot of wechat, pass through the notice for this alarm that robot recognition rule engine is pushed The specific corresponding notice group of art is talked about, then classifying again is pushed to corresponding partner group and O&M side is notified to notify group.
The present embodiment will carry out one in the fault warning information input regulation engine for multiple dimensions that a variety of alarm sources export Serial complicated logic judgment, and then realize the rapid computations of failure cause and troubleshooting scheme, improve troubleshooting when Fault message is finally pushed in the notice group at the respective place of corresponding partner and O&M side, this implementation by effect property respectively again Example can be while alarm, it is further provided the active forewarning traded between troubleshooting scheme and realization and partner, into And the timeliness of alarm is improved on the whole, shorten fault location time and improves the service quality to partner.
It is the flow diagram of fault warning processing method 3rd embodiment of the present invention referring to Fig. 3, Fig. 3.Based on above-mentioned Two embodiments, in the present embodiment, after above-mentioned steps S50, the fault warning processing method further include:
Step S60, during fault recovery, the regulation engine caches identical fault warning information;
Step S70 generates fault recovery progress notification, and pushes the fault recovery progress notification every preset duration To the notice group where corresponding partner.
In the present embodiment, since fault warning is usually all real-time acquisition real-time report, when to avoid alerting long Between brush screen, during fault recovery, regulation engine caches identical fault warning information, while it is logical to generate fault recovery progress Know, and notify progress notification of the partner in relation to fault recovery every preset duration, avoid warning information long-time brush screen, reduces The processing pressure of regulation engine, and the service quality to partner is more preferably promoted, it improves customer satisfaction.
It is the flow diagram of fault warning processing method fourth embodiment of the present invention referring to Fig. 4, Fig. 4.Based on above-mentioned One embodiment, in the present embodiment, before above-mentioned steps S10, the fault warning processing method further include:
Step S80, the monitor supervision platform collect the fault warning information that a variety of alarm sources report be associated alarm, In, the alarm Source Type includes: host, network, database, platform, application program;
In the present embodiment, for the fault warning analysis for realizing various dimensions, gamut, monitor supervision platform collects a variety of alarm sources Fault warning information simultaneously reports regulation engine to be handled after being summarized.
In the present embodiment, alarm Source Type includes:
(1) alarm of the generations such as host, such as computer, server;
(2) network, such as the alarm that Internet occurs, such as suspension, packet loss;
(3) database, for example, various products the generations such as database, trading account database alarm;
(4) alarm of the generations such as platform, such as message-oriented middleware platform, service management platform;
When monitor supervision platform is collected into various faults warning information, since this variety of alarm failure information is same failure The information that reason generates, therefore identical zone bit information (such as field XX) is found in various fault warning information, according to this The various faults warning information of collection is associated alarm by identical zone bit information.
(5) alarm of the generations such as application program, such as account application, payment application.
Step S90 will be above rule described in the fault warning information input of preset threshold based on preset threshold decision rule Then engine.
In general, influence of the alarm to system varies, if influencing smaller or even hardly having any influence, this Class alarm is without reporting.Therefore, in the present embodiment, it is previously provided with the judgment threshold of various alarms, it is corresponding by configuring Judgment rule realizes that second grade output is higher than the abnormal failure information of threshold value to regulation engine.
For example, public network packet loss is greater than 50%, then such warning information needs report;Host disk use space rate reaches 50%, then such warning information needs report;Cpu busy percentage is more than 70%, then such warning information needs report.
In the present embodiment, by preset threshold decision mechanism, to filter out the fault warning letter for being higher than preset threshold Breath carries out alarming processing, avoids all warning information and is all reported to rules engines processes, reduces the processing pressure of regulation engine Power.
The present invention also provides a kind of fault warning processing systems.
It is the functional block diagram of fault warning processing system first embodiment of the present invention referring to Fig. 5, Fig. 5.This implementation In example, the fault warning processing system includes:
Monitor supervision platform 10, for inputting various dimensions fault warning information to regulation engine;
The present embodiment is unlimited for the specifically setting of monitor supervision platform, be chosen as based on Falcon frame carry out secondary development and Monitor supervision platform generated is configured, which can collect a variety of warning information and realize second grade alarm.
Further alternative, horn of plenty alerts source information, and then regulation engine is made to possess more comprehensive information positioning failure root Source, therefore, when receiving the various dimensions fault warning information of monitor supervision platform input, regulation engine is also drawn from specified alarm platform All application class warning information that associated alarm source is alerted with this are taken, so that the failure for being accurately positioned this alarm is former Cause.
Wherein, it pulls from specified alarm platform and believes with this all application classes alarm for alerting associated alarm source Breath is to determine by pulling the keyword in application and apply class warning information.In the present embodiment, it should be noted that although Monitor supervision platform and alarm platform can be all obtained from application using class warning information, but monitor supervision platform crawl is system operation In the warning information such as some physical indexs, and alert that platform pulls is the running log of system and each trading volume, time delay, The warning information such as success rate and the exception reported.
It should be understood that monitor supervision platform from the positions such as host, network, database, platform collect warning information when, host, network, The warning information that the positions such as database, platform and java virtual machine generate is the warning information of base-level, and warning information obtains Dimension it is more single, although monitor supervision platform can also pull part warning information from application, the information pulled is only Some physical indexs.It is the alarm of application level from the warning information that pulls of alarm platform by regulation engine in the present embodiment Information, and the warning information type pulled is more abundant, by the acquisition of the warning information of two kinds of different stages, can be improved Position the accuracy of the failure cause of this alarm.
Regulation engine 20, for carrying out logic judgment to the fault warning information based on configuration rule, to determine this The failure cause of alarm;Based on the failure cause, operation/maintenance data library is retrieved, the corresponding troubleshooting of the failure cause is obtained Scheme;
In the present embodiment, regulation engine can be carried out logic judgment based on parameter of the built-in rule to input and export judgement As a result.It is optional to use experian regulation engine defined function, script is write, creation is advised for the operation of fault warning informix Then, and then logic judgment can be carried out to various dimensions fault warning information.
To realize the accurate judgement to alarm failure reason, preferably believed by the fault warning of a variety of dimensions in the present embodiment Breath carries out total failure alert analysis.Multiple dimensions are embodied in, by monitor supervision platform collect host, network, database, platform and The warning information generated in various application programs promotes the standard of fault warning analysis to cover all possible failure cause True property, and then alert analysis event is saved, improve alarming processing timeliness.
In the present embodiment, regulation engine judges rule based on built-in alarm, quickly carries out to multidimensional fault warning information Logic judgment, in the present embodiment, specific embodiment includes:
1) mode one, the warning information of various dimensions is compared with configuration rule, with searched in configuration rule with it is more The information of the warning information same field of dimension;
According to the mapping relations of preset information and failure cause, the corresponding failure cause of the information searched is determined.
2) mode two, in the warning information of various dimensions, according to the warning information of each dimension determine break down source Head, the source are any one or more in host, network, database, platform or application;
When the source for generating failure is 1, failure cause that the failure cause which is generated is alerted as this;
When the source for generating failure includes multiple, each submodule in warning information is generated to each source and is divided Analysis, with the final score value for the failure cause that each source of determination generates, by the corresponding failure cause in the highest source of final score value Failure cause as this alarm.Wherein, it generates each submodule in warning information to source to analyze, to determine source The mode of the final score value for the failure cause that head generates includes: the power for the warning information for identifying that each submodule in the source generates Weight values and actual value calculate the source according to the weighted value and actual value of the warning information of the generation of each submodule Final score value.
It is appreciated that since warning information dimension is more, the failure cause that is obtained by logic judgment there may be Multiple, therefore, under the present embodiment, preference rule engine carries out comprehensive weight by the failure cause to each seed module It scores and sorts, so that exporting this alerts most accurate failure cause.
For example, the public network packet loss of system cooperating side A is greater than 30%, position is Guangzhou, meanwhile, system cooperating side B, The public network packet loss of C is greater than 50%, and position is also Guangzhou.There are also exist under transaction success rate using partner simultaneously this moment The alarm of drop, therefore, the fault warning information of the comprehensive all dimensions of final regulation engine carry out logical operation, find In Guangzhou Area The score highest of this failure cause of public network packet loss, to show that this most possible failure cause of alarming is In Guangzhou Area Public network it is abnormal rather than because system faults itself leads to success rate decline of trading;For another operation system is because of SQL (Structured Query Language, structured query language) execution efficiency is low and returns to Mass Result, it is looked into slowly to generate database It askes, database IO (Input/Output) is abnormal, and java virtual machine Indexes Abnormality has dragged slowly entire business processing efficiency, up and down Trip system average delay increases, and success rate decline judges the source system led to the problem of by regulation engine, then to source original system Every alarm index weighting (the GC time of such as database IO, CPU and JVM, queue depth's alarm weight are successively successively decreased, and The alarm such as success rate is extended to when higher than operation system), finally items alarm index is weighted and averaged to obtain source system most Whole score value, and the failure cause that the failure cause of the highest source system of final score value is alerted as this.
In the present embodiment, for the quick exclusion for realizing alarm failure, regulation engine is returned while determining failure cause Corresponding troubleshooting scheme out.It is preferably provided in operation/maintenance data library and the operation/maintenance data library to be stored with and covers all possible event Hinder the corresponding processing scheme (O&M standard operating procedure SOP (Standard Operating Procedure)) of reason.
In this embodiment, the troubleshooting scheme is chosen as manual processing scheme and automatically processes scheme, wherein uses Automatically process the mode of scheme processing are as follows: fault point is corresponded to restorer based on the corresponding scheme automatic processing that automatically processes System, automatic processing include restoring fault point or automatic separating fault point automatically, specifically: according to determining troubleshooting scheme Corresponding script is called, the node for generating failure cause is isolated by the script, it then will be in the node The state dowm of director's feelings gets off, and goes analysis program in machine code section, to judge practical reason, wherein reason include disk overfill, Hostdown etc. determines the practical corresponding processing hand of reason finally according to the mapping relations of practical reason and processing means Section, and execute the processing means.For example, when practical reason is disk overfill, if the corresponding processing means of disk overfill It is that automatic remove caches, then executes the automatic mode for removing caching and carry out Disk Cleanup;For another example java program emerged in operation line Journey deadlock situation, automatically find the problem after can automatism isolation correspond to node.
Regulation engine passes through the O&M SOP stored in searching database, and then can get failure cause and correspond to troubleshooting Scheme, and realize that the combination of warning information+failure cause+troubleshooting scheme exports, to promote the processing of failure whole process High-timeliness.It is inconvenient there are being obtained in txt, word, excel of different directories compared to conditional electronic version SOP, even if each SOP Unified storage, but be also inconvenient to retrieve when failure occurs, and the processing of the failure whole process of the present embodiment can greatly shorten failure Recovery time improves O&M efficiency.For example, synchronization, there are N number of system alarm, regulation engine obtains announcement by logic judgment Alert basic failure cause is that some city XX runner public-network is abnormal, and retrieves operation/maintenance data library, passes through abnormal IP+ system name Claim and is matched to corresponding SOP.
In addition, regulation engine is also used to when failure cause is predetermined registration operation, robot is called to give birth to predetermined registration operation automatically At examination and approval document, examination and approval document is sent to default node and is examined, and executes the examination and approval document pair automatically after the completion of examination & approval The operating process answered.
Wherein, the predetermined registration operation includes deleting data, and multiple nodes are isolated, restart the high-risk operation such as core system, When failure cause is high-risk operation, robot is called to automatically generate examination and approval document to these high-risk operations, then examination and approval document is sent to Default node (being chosen as leader node) is examined, and executes after the completion of examination & approval that the examination and approval document is corresponding to be operated automatically Journey.
The present embodiment will carry out one in the fault warning information input regulation engine for multiple dimensions that a variety of alarm sources export Serial complicated logic judgment, and then realize the rapid computations of failure cause, improve the timeliness of troubleshooting.The present embodiment is also Can alarm while, further automatically retrieval goes out corresponding troubleshooting scheme, so improve on the whole alarm when Effect property, shortens fault location time.
Further, in another embodiment of fault warning processing system of the present invention, the regulation engine 20 is also used to:
According to the fault warning information, the failure cause and the troubleshooting scheme, this alarm is generated Difference notice words art;
Notice where calling robot that the notice words art is pushed to corresponding partner and O&M side respectively respectively Group.
In the present embodiment, regulation engine needs after obtaining fault warning information, failure cause and troubleshooting scheme Above- mentioned information are pushed to partner and O&M side.For convenient for partner and O&M side can easy fault details, rule draws It holds up further by fault warning information, failure cause and troubleshooting scheme according to setting form collator as corresponding notice words Art.
In the present embodiment, first builds partner and group and O&M side is notified to notify group that (different products build difference respectively Partner notify group or different O&M sides to notify group), then such as wechat group passes through interface again and calls robot (one Kind application program), such as the WeChat robot of wechat, pass through the notice for this alarm that robot recognition rule engine is pushed The specific corresponding notice group of art is talked about, then classifying again is pushed to corresponding partner group and O&M side is notified to notify group.
The present embodiment will carry out one in the fault warning information input regulation engine for multiple dimensions that a variety of alarm sources export Serial complicated logic judgment, and then realize the rapid computations of failure cause and troubleshooting scheme, improve troubleshooting when Fault message is finally pushed in the notice group at the respective place of corresponding partner and O&M side, this implementation by effect property respectively again Example can be while alarm, it is further provided the active forewarning traded between troubleshooting scheme and realization and partner, into And the timeliness of alarm is improved on the whole, shorten fault location time and improves the service quality to partner.
It is further alternative, unify in embodiment in fault warning processing system of the present invention, the regulation engine 20 is also used to:
During fault recovery, identical fault warning information is cached;
Fault recovery progress notification is generated, and every preset duration is pushed to the fault recovery progress notification corresponding Notice group where partner.
In the present embodiment, since fault warning is usually all real-time acquisition real-time report, when to avoid alerting long Between brush screen, during fault recovery, regulation engine caches identical fault warning information, while it is logical to generate fault recovery progress Know, and notify progress notification of the partner in relation to fault recovery every preset duration, avoid warning information long-time brush screen, reduces The processing pressure of regulation engine, and the service quality to partner is more preferably promoted, it improves customer satisfaction.
Further alternative, in another embodiment of fault warning processing system of the present invention, the monitor supervision platform 10 is also used In:
The fault warning information that a variety of alarm sources report is collected to be associated alarm, wherein the alarm Source Type packet It includes: host, network, database, platform, application program;
Based on preset threshold decision rule, regulation engine described in the fault warning information input of preset threshold will be above.
In the present embodiment, for the fault warning analysis for realizing various dimensions, gamut, monitor supervision platform collects a variety of alarm sources Fault warning information simultaneously reports regulation engine to be handled after being summarized.
In the present embodiment, alarm Source Type includes:
(1) alarm of the generations such as host, such as computer, server;
(2) network, such as the alarm that Internet occurs, such as suspension, packet loss;
(3) database, for example, various products the generations such as database, trading account database alarm;
(4) alarm of the generations such as platform, such as message-oriented middleware platform, service management platform;
When monitor supervision platform is collected into various faults warning information, since this variety of alarm failure information is same failure The information that reason generates, therefore identical zone bit information (such as field XX) is found in various fault warning information, according to this The various faults warning information of collection is associated alarm by identical zone bit information.
(5) alarm of the generations such as application program, such as account application, payment application.
In general, influence of the alarm to system varies, if influencing smaller or even hardly having any influence, this Class alarm is without reporting.Therefore, in the present embodiment, it is previously provided with the judgment threshold of various alarms, it is corresponding by configuring Judgment rule realizes that second grade output is higher than the abnormal failure information of threshold value to regulation engine.
For example, public network packet loss is greater than 50%, then such warning information needs report;Host disk use space rate reaches 50%, then such warning information needs report;Cpu busy percentage is more than 70%, then such warning information needs report.
In the present embodiment, by preset threshold decision mechanism, to filter out the fault warning letter for being higher than preset threshold Breath carries out alarming processing, avoids all warning information and is all reported to rules engines processes, reduces the processing pressure of regulation engine Power.
It is the functional block diagram of fault warning processing system second embodiment of the present invention referring to Fig. 6, Fig. 6.This implementation In example, the fault warning processing system further include:
Robot 30, the institute for when receiving the call request of the regulation engine, the regulation engine to be exported State the notice group where notice words art pushes to corresponding partner and O&M side respectively respectively.
In the present embodiment, robot 30 is a kind of application program, such as WeChat robot.Announcement can be realized by robot 30 The classification push of alert notice.
It is built in advance in the present embodiment and links up group, for example partner is established according to product classification and links up group, or may be used also Group is linked up to establish the corresponding O&M of different product, then activates robot.It is notified in group for example, in advance pulling in robot, The instruction for sending " activation " printed words can one-touch activation notification group.
In the present embodiment, then various types of other alarm notification that robot 30 can be pushed with recognition rule engine is classified Corresponding partner is pushed to link up in group and the logical group of corresponding O&M box drain.For example, robot will be accused according to corresponding group ID Alert notice is pushed to corresponding partner respectively and links up in group and the logical group of O&M box drain.
The present invention also provides a kind of computer readable storage mediums.
Fault warning processing routine, the fault warning processing are stored in the present invention, on computer readable storage medium The step of fault warning processing method as described in the examples such as any of the above-described is realized when program is executed by processor.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM), including some instructions are used so that a terminal (can be mobile phone, computer, server or network are set It is standby etc.) execute method described in each embodiment of the present invention.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, it is all using equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, directly or indirectly Other related technical areas are used in, all of these belong to the protection of the present invention.

Claims (12)

1. a kind of fault warning processing method, which is characterized in that the fault warning processing method the following steps are included:
Regulation engine receives the various dimensions fault warning information of monitor supervision platform input;
Logic judgment is carried out to the fault warning information based on configuration rule, to determine the failure cause of this alarm;
Based on the failure cause, operation/maintenance data library is retrieved, the corresponding troubleshooting scheme of the failure cause is obtained.
2. fault warning processing method as described in claim 1, which is characterized in that the configuration rule that is based on is to the failure Warning information carries out logic judgment, includes: with the step of determining failure cause that this is alerted
The warning information of various dimensions is compared with configuration rule, to search the warning information with various dimensions in configuration rule The information of same field;
According to the mapping relations of preset information and failure cause, the corresponding failure cause of the information searched is determined.
3. fault warning processing method as described in claim 1, which is characterized in that be based on the failure cause, inspection described Rope operation/maintenance data library, after the step of obtaining the failure cause corresponding troubleshooting scheme, the fault warning processing side Method includes:
Fault point is corresponded to based on the corresponding scheme automatic processing that automatically processes, wherein automatic processing includes automatic recovery event Barrier point or automatic separating fault point.
4. fault warning processing method as described in claim 1, which is characterized in that when failure cause is predetermined registration operation, institute The step of stating based on the failure cause, retrieve operation/maintenance data library, obtaining the failure cause corresponding troubleshooting scheme it Afterwards, the fault warning processing method includes:
It calls robot to automatically generate examination and approval document to predetermined registration operation, examination and approval document is sent to default node and is examined, and Automatically the corresponding operating process of the examination and approval document is executed after the completion of examination & approval.
5. fault warning processing method as described in claim 1, which is characterized in that be based on the failure cause, inspection described Rope operation/maintenance data library, after the step of obtaining the failure cause corresponding troubleshooting scheme, the fault warning processing side Method further include:
According to the fault warning information, the failure cause and the troubleshooting scheme, the difference of this alarm is generated Notice words art;
Notice group where calling robot that the notice words art is pushed to corresponding partner and O&M side respectively respectively.
6. fault warning processing method as claimed in claim 5, which is characterized in that in the calling robot by the notice Words art was pushed to respectively after the step of notice group at corresponding partner and O&M side respective place, the fault warning processing Method further include:
During fault recovery, the regulation engine caches identical fault warning information;
Fault recovery progress notification is generated, and the fault recovery progress notification is pushed into corresponding cooperation every preset duration Notice group where side.
7. fault warning processing method as described in claim 1, which is characterized in that the fault warning processing method is also wrapped It includes:
When receiving the various dimensions fault warning information of monitor supervision platform input, the regulation engine is also drawn from specified alarm platform All application class warning information that associated alarm source is alerted with this are taken, so that the failure for being accurately positioned this alarm is former Cause.
8. such as fault warning processing method of any of claims 1-7, which is characterized in that connect in the regulation engine Before the step of receiving the various dimensions fault warning information of monitor supervision platform input, the fault warning processing method further include:
The monitor supervision platform collects the fault warning information that a variety of alarm sources report to be associated alarm, wherein the alarm Source Type includes: host, network, database, platform, application program;
Based on preset threshold decision rule, regulation engine described in the fault warning information input of preset threshold will be above.
9. a kind of fault warning processing system, which is characterized in that the fault warning processing system includes:
Monitor supervision platform, for inputting various dimensions fault warning information to regulation engine;
Regulation engine, for carrying out logic judgment to the fault warning information based on configuration rule, to determine this alarm Failure cause;Based on the failure cause, operation/maintenance data library is retrieved, the corresponding troubleshooting scheme of the failure cause is obtained.
10. fault warning processing system as claimed in claim 9, which is characterized in that the monitor supervision platform is also used to:
The fault warning information that a variety of alarm sources report is collected to be associated alarm, wherein the alarm Source Type includes: master Machine, network, database, platform, application program;
Based on preset threshold decision rule, regulation engine described in the fault warning information input of preset threshold will be above.
11. the fault warning processing system as described in claim 9 or 10, which is characterized in that the fault warning processing system Further include:
Robot, the notice for when receiving the call request of the regulation engine, the regulation engine to be exported Words art pushes to the notice group at the respective place of corresponding partner and O&M side respectively.
12. a kind of computer readable storage medium, which is characterized in that store faulty announcement on the computer readable storage medium Alert processing routine, is realized when the fault warning processing routine is executed by processor as of any of claims 1-8 The step of fault warning processing method.
CN201810979619.XA 2018-08-24 2018-08-24 Fault warning processing method, system and computer readable storage medium Pending CN108989132A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810979619.XA CN108989132A (en) 2018-08-24 2018-08-24 Fault warning processing method, system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810979619.XA CN108989132A (en) 2018-08-24 2018-08-24 Fault warning processing method, system and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN108989132A true CN108989132A (en) 2018-12-11

Family

ID=64547637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810979619.XA Pending CN108989132A (en) 2018-08-24 2018-08-24 Fault warning processing method, system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN108989132A (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135602A (en) * 2019-05-17 2019-08-16 伍兴佳 Steel tower failure monitoring dispatching method and device
CN110166297A (en) * 2019-05-22 2019-08-23 平安信托有限责任公司 O&M method, system, equipment and computer readable storage medium
CN110415115A (en) * 2019-06-18 2019-11-05 平安证券股份有限公司 The O&M method, apparatus and computer readable storage medium of transaction system
CN110601894A (en) * 2019-09-18 2019-12-20 中国工商银行股份有限公司 Alarm processing method and device, electronic equipment and readable storage medium
CN110635954A (en) * 2019-10-21 2019-12-31 中国民航信息网络股份有限公司 Method and system for processing network fault of data center
CN110728498A (en) * 2019-10-21 2020-01-24 北京百度网讯科技有限公司 Information interaction method and device
CN111030857A (en) * 2019-12-06 2020-04-17 深圳前海微众银行股份有限公司 Network alarm method, device, system and computer readable storage medium
CN111343017A (en) * 2020-02-22 2020-06-26 苏州浪潮智能科技有限公司 Method, system, equipment and medium for cloud platform resource alarm
CN111628888A (en) * 2020-04-30 2020-09-04 中国移动通信集团江苏有限公司 Fault diagnosis method, device, equipment and computer storage medium
CN111814999A (en) * 2020-07-08 2020-10-23 上海燕汐软件信息科技有限公司 Fault work order generation method, device and equipment
CN111835566A (en) * 2020-07-08 2020-10-27 上海燕汐软件信息科技有限公司 System fault management method, device and system
CN111865673A (en) * 2020-07-08 2020-10-30 上海燕汐软件信息科技有限公司 Automatic fault management method, device and system
CN111844029A (en) * 2020-07-09 2020-10-30 上海有个机器人有限公司 Robot early warning monitoring method and device
CN111901140A (en) * 2020-06-11 2020-11-06 北京百度网讯科技有限公司 Exception handling method and device, electronic equipment and storage medium
CN112328372A (en) * 2020-11-27 2021-02-05 新华智云科技有限公司 Kubernetes node self-healing method and system
CN112447279A (en) * 2020-12-10 2021-03-05 上海联影医疗科技股份有限公司 Task processing method and device, electronic equipment and storage medium
CN112559569A (en) * 2020-12-11 2021-03-26 广东电力通信科技有限公司 Alarm rule processing method for composite condition
CN112711507A (en) * 2020-12-17 2021-04-27 浙江高速信息工程技术有限公司 Device alarm method, electronic device, and medium
CN113312200A (en) * 2021-06-01 2021-08-27 中国民航信息网络股份有限公司 Event processing method and device, computer equipment and storage medium
CN113553210A (en) * 2021-07-30 2021-10-26 平安普惠企业管理有限公司 Alarm data processing method, device, equipment and storage medium
CN113590370A (en) * 2021-08-06 2021-11-02 北京百度网讯科技有限公司 Fault processing method, device, equipment and storage medium
CN114490751A (en) * 2021-12-29 2022-05-13 深圳优地科技有限公司 Method, device and equipment for determining robot fault and readable storage medium
CN114567539A (en) * 2022-03-22 2022-05-31 中国农业银行股份有限公司 Method, device, equipment and medium for processing network system exception
CN114866400A (en) * 2022-04-29 2022-08-05 中国电子科技集团公司第五十四研究所 Alarm rule reasoning method based on cache space optimization
CN115827398A (en) * 2023-02-24 2023-03-21 天翼云科技有限公司 Method and device for calculating alarm information component value, electronic equipment and storage medium
CN115883330A (en) * 2023-02-08 2023-03-31 阿里云计算有限公司 Alarm event processing method, system, device, storage medium and program product
WO2024066346A1 (en) * 2022-09-27 2024-04-04 中兴通讯股份有限公司 Alarm processing method and apparatus, and storage medium and electronic apparatus

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105099783A (en) * 2015-08-20 2015-11-25 长威信息科技发展股份有限公司 Method and system for realizing automation of warning emergency disposal of business system
CN105262616A (en) * 2015-09-21 2016-01-20 浪潮集团有限公司 Failure repository-based automated failure processing system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105099783A (en) * 2015-08-20 2015-11-25 长威信息科技发展股份有限公司 Method and system for realizing automation of warning emergency disposal of business system
CN105262616A (en) * 2015-09-21 2016-01-20 浪潮集团有限公司 Failure repository-based automated failure processing system and method

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135602A (en) * 2019-05-17 2019-08-16 伍兴佳 Steel tower failure monitoring dispatching method and device
CN110166297A (en) * 2019-05-22 2019-08-23 平安信托有限责任公司 O&M method, system, equipment and computer readable storage medium
CN110415115A (en) * 2019-06-18 2019-11-05 平安证券股份有限公司 The O&M method, apparatus and computer readable storage medium of transaction system
CN110601894A (en) * 2019-09-18 2019-12-20 中国工商银行股份有限公司 Alarm processing method and device, electronic equipment and readable storage medium
CN110635954B (en) * 2019-10-21 2022-10-21 中国民航信息网络股份有限公司 Method and system for processing network fault of data center
CN110728498A (en) * 2019-10-21 2020-01-24 北京百度网讯科技有限公司 Information interaction method and device
CN110635954A (en) * 2019-10-21 2019-12-31 中国民航信息网络股份有限公司 Method and system for processing network fault of data center
CN111030857A (en) * 2019-12-06 2020-04-17 深圳前海微众银行股份有限公司 Network alarm method, device, system and computer readable storage medium
CN111343017A (en) * 2020-02-22 2020-06-26 苏州浪潮智能科技有限公司 Method, system, equipment and medium for cloud platform resource alarm
CN111343017B (en) * 2020-02-22 2022-12-09 苏州浪潮智能科技有限公司 Method, system, equipment and medium for cloud platform resource alarm
CN111628888A (en) * 2020-04-30 2020-09-04 中国移动通信集团江苏有限公司 Fault diagnosis method, device, equipment and computer storage medium
CN111628888B (en) * 2020-04-30 2022-08-12 中国移动通信集团江苏有限公司 Fault diagnosis method, device, equipment and computer storage medium
CN111901140A (en) * 2020-06-11 2020-11-06 北京百度网讯科技有限公司 Exception handling method and device, electronic equipment and storage medium
CN111835566A (en) * 2020-07-08 2020-10-27 上海燕汐软件信息科技有限公司 System fault management method, device and system
CN111814999B (en) * 2020-07-08 2024-01-16 上海燕汐软件信息科技有限公司 Fault work order generation method, device and equipment
CN111865673A (en) * 2020-07-08 2020-10-30 上海燕汐软件信息科技有限公司 Automatic fault management method, device and system
CN111814999A (en) * 2020-07-08 2020-10-23 上海燕汐软件信息科技有限公司 Fault work order generation method, device and equipment
CN111844029A (en) * 2020-07-09 2020-10-30 上海有个机器人有限公司 Robot early warning monitoring method and device
CN112328372A (en) * 2020-11-27 2021-02-05 新华智云科技有限公司 Kubernetes node self-healing method and system
CN112447279A (en) * 2020-12-10 2021-03-05 上海联影医疗科技股份有限公司 Task processing method and device, electronic equipment and storage medium
CN112559569A (en) * 2020-12-11 2021-03-26 广东电力通信科技有限公司 Alarm rule processing method for composite condition
CN112559569B (en) * 2020-12-11 2023-07-21 广东电力通信科技有限公司 Alarm rule processing method for composite condition
CN112711507A (en) * 2020-12-17 2021-04-27 浙江高速信息工程技术有限公司 Device alarm method, electronic device, and medium
CN113312200A (en) * 2021-06-01 2021-08-27 中国民航信息网络股份有限公司 Event processing method and device, computer equipment and storage medium
WO2022252860A1 (en) * 2021-06-01 2022-12-08 中国民航信息网络股份有限公司 Event processing method and apparatus, and computer device and storage medium
CN113553210A (en) * 2021-07-30 2021-10-26 平安普惠企业管理有限公司 Alarm data processing method, device, equipment and storage medium
CN113590370A (en) * 2021-08-06 2021-11-02 北京百度网讯科技有限公司 Fault processing method, device, equipment and storage medium
CN113590370B (en) * 2021-08-06 2022-06-21 北京百度网讯科技有限公司 Fault processing method, device, equipment and storage medium
WO2023011160A1 (en) * 2021-08-06 2023-02-09 北京百度网讯科技有限公司 Fault processing method and apparatus, device, and storage medium
CN114490751A (en) * 2021-12-29 2022-05-13 深圳优地科技有限公司 Method, device and equipment for determining robot fault and readable storage medium
CN114490751B (en) * 2021-12-29 2024-06-04 深圳优地科技有限公司 Method, device and equipment for determining robot faults and readable storage medium
CN114567539A (en) * 2022-03-22 2022-05-31 中国农业银行股份有限公司 Method, device, equipment and medium for processing network system exception
CN114567539B (en) * 2022-03-22 2024-04-12 中国农业银行股份有限公司 Network system exception handling method, device, equipment and medium
CN114866400A (en) * 2022-04-29 2022-08-05 中国电子科技集团公司第五十四研究所 Alarm rule reasoning method based on cache space optimization
CN114866400B (en) * 2022-04-29 2024-04-30 中国电子科技集团公司第五十四研究所 Alarm rule reasoning method based on buffer space optimization
WO2024066346A1 (en) * 2022-09-27 2024-04-04 中兴通讯股份有限公司 Alarm processing method and apparatus, and storage medium and electronic apparatus
CN115883330A (en) * 2023-02-08 2023-03-31 阿里云计算有限公司 Alarm event processing method, system, device, storage medium and program product
CN115827398A (en) * 2023-02-24 2023-03-21 天翼云科技有限公司 Method and device for calculating alarm information component value, electronic equipment and storage medium
WO2024174700A1 (en) * 2023-02-24 2024-08-29 天翼云科技有限公司 Method and apparatus for calculating component value of alarm information, and electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN108989132A (en) Fault warning processing method, system and computer readable storage medium
CN112653586B (en) Time-space big data platform application performance management method based on full link monitoring
US11442803B2 (en) Detecting and analyzing performance anomalies of client-server based applications
CN109783322A (en) A kind of monitoring analysis system and its method of enterprise information system operating status
CN111176879A (en) Fault repairing method and device for equipment
CN111162949A (en) Interface monitoring method based on Java byte code embedding technology
CN112965874A (en) Configurable monitoring alarm method and system
CN111756582A (en) Service chain monitoring method based on NFV log alarm
WO2007143943A1 (en) Method, system and network device of centralized maintenance of multiple devices
CN112559237B (en) Operation and maintenance system troubleshooting method and device, server and storage medium
CN108845912A (en) Service interface calls the alarm method of failure and calculates equipment
CN116755992B (en) Log analysis method and system based on OpenStack cloud computing
CN114528175A (en) Micro-service application system root cause positioning method, device, medium and equipment
WO2015187001A2 (en) System and method for managing resources failure using fast cause and effect analysis in a cloud computing system
CN117194142A (en) Integrated application performance diagnosis system and method based on link tracking
CN114374600A (en) Network operation and maintenance method, device, equipment and product based on big data
CN116662127A (en) Method, system, equipment and medium for classifying and early warning equipment alarm information
CN116795631A (en) Service system monitoring alarm method, device, equipment and medium
CN114138522A (en) Micro-service fault recovery method and device, electronic equipment and medium
CN115549953B (en) Network security alarm method and system
CN115174350B (en) Operation and maintenance alarm method, device, equipment and medium
CN114531338A (en) Monitoring alarm and tracing method and system based on call chain data
CN114510364A (en) Abnormal data root cause analysis method and device combining text clustering with link calling
CN114168371A (en) Intelligent automatic fault alarm system
CN113342596A (en) Distributed monitoring method, system and device for equipment indexes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181211