CN108989132A - Fault warning processing method, system and computer readable storage medium - Google Patents
Fault warning processing method, system and computer readable storage medium Download PDFInfo
- Publication number
- CN108989132A CN108989132A CN201810979619.XA CN201810979619A CN108989132A CN 108989132 A CN108989132 A CN 108989132A CN 201810979619 A CN201810979619 A CN 201810979619A CN 108989132 A CN108989132 A CN 108989132A
- Authority
- CN
- China
- Prior art keywords
- fault warning
- fault
- failure cause
- warning information
- alarm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0677—Localisation of faults
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a kind of fault warning processing methods, comprising the following steps: regulation engine receives the various dimensions fault warning information of monitor supervision platform input;Logic judgment is carried out to the fault warning information based on configuration rule, to determine the failure cause of this alarm;Based on the failure cause, operation/maintenance data library is retrieved, the corresponding troubleshooting scheme of the failure cause is obtained.The invention also discloses a kind of fault warning processing system and computer readable storage mediums.The present invention improves the timeliness and accuracy of fault warning processing, enhances failure O&M efficiency.
Description
Technical field
The present invention relates to failure O&M technical field more particularly to a kind of fault warning processing methods, system and computer
Readable storage medium storing program for executing.
Background technique
The network equipment or application are broken down and can usually be alerted in time by on-line monitoring mode.When accusing
When alert, the prior art is usually that there are failures for directly prompt, then allows operation maintenance personnel that detection is gone to determine failure cause and analyzes
It is out of order after exclusion program and debugs again, the whole story may need to take a long time, and then timeliness is not high.
Summary of the invention
The main purpose of the present invention is to provide a kind of fault warning processing method, system and computer-readable storage mediums
Matter, it is intended to solve the technical issues of how improving fault warning processing timeliness.
To achieve the above object, the present invention provides a kind of fault warning processing method, the fault warning processing method packet
Include following steps:
Regulation engine receives the various dimensions fault warning information of monitor supervision platform input;
Logic judgment is carried out to the fault warning information based on configuration rule, to determine the failure cause of this alarm;
Based on the failure cause, operation/maintenance data library is retrieved, the corresponding troubleshooting scheme of the failure cause is obtained.
Optionally, described that logic judgment is carried out to the fault warning information based on configuration rule, to determine this alarm
Failure cause the step of include:
The warning information of various dimensions is compared with configuration rule, to search the alarm with various dimensions in configuration rule
The information of information same field;
According to the mapping relations of preset information and failure cause, the corresponding failure cause of the information searched is determined.
Optionally, it is based on the failure cause described, retrieves operation/maintenance data library, obtain the corresponding event of the failure cause
After the step of hindering processing scheme, the fault warning processing method includes:
Fault point is corresponded to based on the corresponding scheme automatic processing that automatically processes, wherein automatic processing includes automatic extensive
Multiple fault point or automatic separating fault point.
Optionally, described to be based on the failure cause when failure cause is predetermined registration operation, operation/maintenance data library is retrieved, is obtained
After the step of troubleshooting scheme corresponding to the failure cause, the fault warning processing method includes:
It calls robot to automatically generate examination and approval document to predetermined registration operation, examination and approval document is sent to default node and is examined,
And execute the corresponding operating process of the examination and approval document automatically after the completion of examination & approval.
Optionally, it is based on the failure cause described, retrieves operation/maintenance data library, obtain the corresponding event of the failure cause
After the step of hindering processing scheme, the fault warning processing method further include:
According to the fault warning information, the failure cause and the troubleshooting scheme, this alarm is generated
Difference notice words art;
Notice where calling robot that the notice words art is pushed to corresponding partner and O&M side respectively respectively
Group.
Optionally, in the calling robot, by notice words art, to push to corresponding partner respectively each with O&M side
After the step of from the notice group at place, the fault warning processing method further include:
During fault recovery, the regulation engine caches identical fault warning information;
Fault recovery progress notification is generated, and every preset duration is pushed to the fault recovery progress notification corresponding
Notice group where partner.
Optionally, the fault warning processing method further include:
When receiving the various dimensions fault warning information of monitor supervision platform input, the regulation engine is also flat from specified alarm
Platform pulls all application class warning information that associated alarm source is alerted with this, for being accurately positioned the failure of this alarm
Reason.
Optionally, before the step of regulation engine receives the various dimensions fault warning information of monitor supervision platform input,
The fault warning processing method further include:
The monitor supervision platform collects the fault warning information that a variety of alarm sources report to be associated alarm, wherein described
Alerting Source Type includes: host, network, database, platform, application program;
Based on preset threshold decision rule, regulation engine described in the fault warning information input of preset threshold will be above.
Further, to achieve the above object, the present invention also provides a kind of fault warning processing system, the fault warnings
Processing system includes:
Monitor supervision platform, for inputting various dimensions fault warning information to regulation engine;
Regulation engine, for carrying out logic judgment to the fault warning information based on configuration rule, to determine this announcement
Alert failure cause;Based on the failure cause, operation/maintenance data library is retrieved, the corresponding troubleshooting side of the failure cause is obtained
Case.
Optionally, the regulation engine is also used to: fault point is corresponded to based on the corresponding scheme automatic processing that automatically processes,
Wherein, automatic processing includes automatic recovery fault point or automatic separating fault point.
Optionally, when failure cause is predetermined registration operation, the regulation engine is also used to: calling robot to predetermined registration operation
Examination and approval document is automatically generated, examination and approval document is sent to default node and is examined, and executes described examine automatically after the completion of examination & approval
Criticize single corresponding operating process.
Optionally, the regulation engine is also used to: according to the fault warning information, the failure cause and the event
Hinder processing scheme, generates the different notice words arts of this alarm;Call robot that notice words art is pushed to correspondence respectively
Partner and O&M side respectively where notice group.
Optionally, the regulation engine is also used to:
During fault recovery, identical fault warning information is cached;
Fault recovery progress notification is generated, and every preset duration is pushed to the fault recovery progress notification corresponding
Notice group where partner.
Optionally, the regulation engine is also used to:
Receive monitor supervision platform input various dimensions fault warning information when, also pulled from specified alarm platform and this
All application class warning information of associated alarm source are alerted, for being accurately positioned the failure cause of this alarm.
Optionally, the monitor supervision platform is also used to:
The fault warning information that a variety of alarm sources report is collected to be associated alarm, wherein the alarm Source Type packet
It includes: host, network, database, platform, application program;
Based on preset threshold decision rule, regulation engine described in the fault warning information input of preset threshold will be above.
Optionally, the fault warning processing system further include:
Robot, for when receiving the call request of the regulation engine, the regulation engine is exported described in
Notice words art pushes to the notice group at the respective place of corresponding partner and O&M side respectively.
Further, to achieve the above object, the present invention also provides a kind of computer readable storage medium, the computers
It is stored with fault warning processing routine on readable storage medium storing program for executing, is realized such as when the fault warning processing routine is executed by processor
The step of fault warning processing method described in any of the above embodiments.
The present invention will carry out a system in the fault warning information input regulation engine for multiple dimensions that a variety of alarm sources export
Complicated logic judgment is arranged, and then realizes the rapid computations of failure cause, improves the timeliness of troubleshooting.The present invention can also be
While alarm, further automatically retrieval goes out corresponding troubleshooting scheme, and then improves the timeliness of alarm on the whole,
Shorten fault location time.
Detailed description of the invention
Fig. 1 is the flow diagram of fault warning processing method first embodiment of the present invention;
Fig. 2 is the flow diagram of fault warning processing method second embodiment of the present invention;
Fig. 3 is the flow diagram of fault warning processing method 3rd embodiment of the present invention;
Fig. 4 is the flow diagram of fault warning processing method fourth embodiment of the present invention;
Fig. 5 is the functional block diagram of fault warning processing system first embodiment of the present invention;
Fig. 6 is the functional block diagram of fault warning processing system second embodiment of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.
The present invention provides a kind of fault warning processing method.
Referring to Fig.1, Fig. 1 is the flow diagram of fault warning processing method first embodiment of the present invention.In the present embodiment,
The fault warning processing method the following steps are included:
Step S10, regulation engine receive the various dimensions fault warning information of monitor supervision platform input;
In the present embodiment, regulation engine can be carried out logic judgment based on parameter of the built-in rule to input and export judgement
As a result.It is optional to use experian regulation engine defined function, script is write, creation is advised for the operation of fault warning informix
Then, and then logic judgment can be carried out to various dimensions fault warning information.
The present embodiment is unlimited for the specifically setting of monitor supervision platform, be chosen as based on Falcon frame carry out secondary development and
Monitor supervision platform generated is configured, which can collect a variety of warning information and realize second grade alarm.
To realize the accurate judgement to alarm failure reason, preferably believed by the fault warning of a variety of dimensions in the present embodiment
Breath carries out total failure alert analysis.Multiple dimensions are embodied in, by monitor supervision platform collect host, network, database, platform and
The warning information generated in various application programs promotes the standard of fault warning analysis to cover all possible failure cause
True property, and then alert analysis event is saved, improve alarming processing timeliness.
Further alternative, horn of plenty alerts source information, and then regulation engine is made to possess more comprehensive information positioning failure root
Source, therefore, when receiving the various dimensions fault warning information of monitor supervision platform input, regulation engine is also drawn from specified alarm platform
All application class warning information that associated alarm source is alerted with this are taken, so that the failure for being accurately positioned this alarm is former
Cause.Wherein, all application class warning information for alerting associated alarm source with this are pulled from specified alarm platform, are logical
It crosses the keyword pulled in application and applies class warning information to determine.In the present embodiment, it should be noted that although monitor supervision platform
It can all be obtained from application with alarm platform using class warning information, but monitor supervision platform crawl is that system is more running
Warning information such as physical index, and alert that platform pulls is the running log of system and each trading volume, time delay, success rate and
The warning information such as the exception reported.
It should be understood that monitor supervision platform from the positions such as host, network, database, platform collect warning information when, host, network,
The warning information that the positions such as database, platform and java virtual machine generate is the warning information of base-level, and warning information obtains
Dimension it is more single, although monitor supervision platform can also pull part warning information from application, the information pulled is only
Some physical indexs.It is the alarm of application level from the warning information that pulls of alarm platform by regulation engine in the present embodiment
Information, and the warning information type pulled is more abundant, by the acquisition of the warning information of two kinds of different stages, can be improved
Position the accuracy of the failure cause of this alarm.
Step S20 carries out logic judgment to the fault warning information based on configuration rule, to determine the event of this alarm
Hinder reason;
In the present embodiment, regulation engine judges rule based on built-in alarm, quickly carries out to multidimensional fault warning information
Logic judgment, in the present embodiment, the embodiment of the step S20 includes:
1) mode one, the warning information of various dimensions is compared with configuration rule, with searched in configuration rule with it is more
The information of the warning information same field of dimension;
According to the mapping relations of preset information and failure cause, the corresponding failure cause of the information searched is determined.
2) mode two, in the warning information of various dimensions, according to the warning information of each dimension determine break down source
Head, the source are any one or more in host, network, database, platform or application;
When the source for generating failure is 1, failure cause that the failure cause which is generated is alerted as this;
When the source for generating failure includes multiple, each submodule in warning information is generated to each source and is divided
Analysis, with the final score value for the failure cause that each source of determination generates, by the corresponding failure cause in the highest source of final score value
Failure cause as this alarm.Wherein, it generates each submodule in warning information to source to analyze, to determine source
The mode of the final score value for the failure cause that head generates includes: the power for the warning information for identifying that each submodule in the source generates
Weight values and actual value calculate the source according to the weighted value and actual value of the warning information of the generation of each submodule
Final score value.
It is appreciated that since warning information dimension is more, the failure cause that is obtained by logic judgment there may be
Multiple, therefore, under the present embodiment, preference rule engine carries out comprehensive weight by the failure cause to each seed module
It scores and sorts, so that exporting this alerts most accurate failure cause.For example, the public network packet loss of system cooperating side A is greater than
30%, position is Guangzhou, meanwhile, the public network packet loss of system cooperating side B, C are greater than 50%, and position is also Guangzhou.
There are also the alarms that there is transaction success rate decline using partner simultaneously this moment, therefore, the comprehensive all dimensions of final regulation engine
Fault warning information carry out logical operation, find this failure cause of In Guangzhou Area public network packet loss score highest, thus
Obtain the public network this alarmed most possible failure cause is In Guangzhou Area it is abnormal rather than because system faults itself causes to trade
Success rate decline;For another operation system executes effect because of SQL (Structured Query Language, structured query language)
Rate is low and returns to Mass Result, inquires slowly to generate database, and database IO (Input/Output) is abnormal, and java is virtual
Machine Indexes Abnormality has dragged slowly entire business processing efficiency, and upstream and downstream system average delay increases, and success rate decline passes through rule
Engine judges the source system led to the problem of, then weights (such as database IO, CPU and JVM to every alarm index of source original system
The GC time, queue depth's alarm weight successively successively decreases, and while being higher than operation system extends to the alarm such as success rate), it is final right
Items alarm index is weighted and averaged to obtain the final score value of source system, and the failure of the highest source system of final score value is former
Because of the failure cause alerted as this.
Step S30 is based on the failure cause, retrieves operation/maintenance data library, obtains at the corresponding failure of the failure cause
Reason scheme.
In the present embodiment, for the quick exclusion for realizing alarm failure, regulation engine is returned while determining failure cause
Corresponding troubleshooting scheme out.It is preferably provided in operation/maintenance data library and the operation/maintenance data library to be stored with and covers all possible event
Hinder the corresponding processing scheme (O&M standard operating procedure SOP (Standard Operating Procedure)) of reason.
In this embodiment, the troubleshooting scheme is chosen as manual processing scheme and automatically processes scheme, wherein uses
Automatically process the mode of scheme processing are as follows: fault point is corresponded to restorer based on the corresponding scheme automatic processing that automatically processes
System, automatic processing include restoring fault point or automatic separating fault point automatically, specifically: according to determining troubleshooting scheme
Corresponding script is called, the node for generating failure cause is isolated by the script, it then will be in the node
The state dowm of director's feelings gets off, and goes analysis program in machine code section, to judge practical reason, wherein reason include disk overfill,
Hostdown etc. determines the practical corresponding processing hand of reason finally according to the mapping relations of practical reason and processing means
Section, and execute the processing means.For example, when practical reason is disk overfill, if the corresponding processing means of disk overfill
It is that automatic remove caches, then executes the automatic mode for removing caching and carry out Disk Cleanup;For another example java program emerged in operation line
Journey deadlock situation, automatically find the problem after can automatism isolation correspond to node.
Regulation engine passes through the O&M SOP stored in searching database, and then can get failure cause and correspond to troubleshooting
Scheme, and realize that the combination of warning information+failure cause+troubleshooting scheme exports, to promote the processing of failure whole process
High-timeliness.It is inconvenient there are being obtained in txt, word, excel of different directories compared to conditional electronic version SOP, even if each SOP
Unified storage, but be also inconvenient to retrieve when failure occurs, and the processing of the failure whole process of the present embodiment can greatly shorten failure
Recovery time improves O&M efficiency.For example, synchronization, there are N number of system alarm, regulation engine obtains announcement by logic judgment
Alert basic failure cause is that some city XX runner public-network is abnormal, and retrieves operation/maintenance data library, passes through abnormal IP+ system name
Claim and is matched to corresponding SOP.
In addition, when failure cause is predetermined registration operation, after the step S30, the method also includes:
It calls robot to automatically generate examination and approval document to predetermined registration operation, examination and approval document is sent to default node and is examined,
And execute the corresponding operating process of the examination and approval document automatically after the completion of examination & approval.
Wherein, the predetermined registration operation includes deleting data, and multiple nodes are isolated, restart the high-risk operation such as core system,
When failure cause is high-risk operation, robot is called to automatically generate examination and approval document to these high-risk operations, then examination and approval document is sent to
Default node (being chosen as leader node) is examined, and executes after the completion of examination & approval that the examination and approval document is corresponding to be operated automatically
Journey.
The present embodiment will carry out one in the fault warning information input regulation engine for multiple dimensions that a variety of alarm sources export
Serial complicated logic judgment, and then realize the rapid computations of failure cause, improve the timeliness of troubleshooting.The present embodiment is also
Can alarm while, further automatically retrieval goes out corresponding troubleshooting scheme, so improve on the whole alarm when
Effect property, shortens fault location time.
It is the flow diagram of fault warning processing method second embodiment of the present invention referring to Fig. 2, Fig. 2.Based on above-mentioned
One embodiment, in the present embodiment, after above-mentioned steps S30, the fault warning processing method further include:
Step S40 generates this according to the fault warning information, the failure cause and the troubleshooting scheme
The different notice words arts of secondary alarm;
Step S50 calls robot that notice words art is pushed to corresponding partner and respective institute of O&M side respectively
Notice group.
In the present embodiment, regulation engine needs after obtaining fault warning information, failure cause and troubleshooting scheme
Above- mentioned information are pushed to partner and O&M side.For convenient for partner and O&M side can easy fault details, rule draws
It holds up further by fault warning information, failure cause and troubleshooting scheme according to setting form collator as corresponding notice words
Art.
In the present embodiment, first builds partner and group and O&M side is notified to notify group that (different products build difference respectively
Partner notify group or different O&M sides to notify group), then such as wechat group passes through interface again and calls robot (one
Kind application program), such as the WeChat robot of wechat, pass through the notice for this alarm that robot recognition rule engine is pushed
The specific corresponding notice group of art is talked about, then classifying again is pushed to corresponding partner group and O&M side is notified to notify group.
The present embodiment will carry out one in the fault warning information input regulation engine for multiple dimensions that a variety of alarm sources export
Serial complicated logic judgment, and then realize the rapid computations of failure cause and troubleshooting scheme, improve troubleshooting when
Fault message is finally pushed in the notice group at the respective place of corresponding partner and O&M side, this implementation by effect property respectively again
Example can be while alarm, it is further provided the active forewarning traded between troubleshooting scheme and realization and partner, into
And the timeliness of alarm is improved on the whole, shorten fault location time and improves the service quality to partner.
It is the flow diagram of fault warning processing method 3rd embodiment of the present invention referring to Fig. 3, Fig. 3.Based on above-mentioned
Two embodiments, in the present embodiment, after above-mentioned steps S50, the fault warning processing method further include:
Step S60, during fault recovery, the regulation engine caches identical fault warning information;
Step S70 generates fault recovery progress notification, and pushes the fault recovery progress notification every preset duration
To the notice group where corresponding partner.
In the present embodiment, since fault warning is usually all real-time acquisition real-time report, when to avoid alerting long
Between brush screen, during fault recovery, regulation engine caches identical fault warning information, while it is logical to generate fault recovery progress
Know, and notify progress notification of the partner in relation to fault recovery every preset duration, avoid warning information long-time brush screen, reduces
The processing pressure of regulation engine, and the service quality to partner is more preferably promoted, it improves customer satisfaction.
It is the flow diagram of fault warning processing method fourth embodiment of the present invention referring to Fig. 4, Fig. 4.Based on above-mentioned
One embodiment, in the present embodiment, before above-mentioned steps S10, the fault warning processing method further include:
Step S80, the monitor supervision platform collect the fault warning information that a variety of alarm sources report be associated alarm,
In, the alarm Source Type includes: host, network, database, platform, application program;
In the present embodiment, for the fault warning analysis for realizing various dimensions, gamut, monitor supervision platform collects a variety of alarm sources
Fault warning information simultaneously reports regulation engine to be handled after being summarized.
In the present embodiment, alarm Source Type includes:
(1) alarm of the generations such as host, such as computer, server;
(2) network, such as the alarm that Internet occurs, such as suspension, packet loss;
(3) database, for example, various products the generations such as database, trading account database alarm;
(4) alarm of the generations such as platform, such as message-oriented middleware platform, service management platform;
When monitor supervision platform is collected into various faults warning information, since this variety of alarm failure information is same failure
The information that reason generates, therefore identical zone bit information (such as field XX) is found in various fault warning information, according to this
The various faults warning information of collection is associated alarm by identical zone bit information.
(5) alarm of the generations such as application program, such as account application, payment application.
Step S90 will be above rule described in the fault warning information input of preset threshold based on preset threshold decision rule
Then engine.
In general, influence of the alarm to system varies, if influencing smaller or even hardly having any influence, this
Class alarm is without reporting.Therefore, in the present embodiment, it is previously provided with the judgment threshold of various alarms, it is corresponding by configuring
Judgment rule realizes that second grade output is higher than the abnormal failure information of threshold value to regulation engine.
For example, public network packet loss is greater than 50%, then such warning information needs report;Host disk use space rate reaches
50%, then such warning information needs report;Cpu busy percentage is more than 70%, then such warning information needs report.
In the present embodiment, by preset threshold decision mechanism, to filter out the fault warning letter for being higher than preset threshold
Breath carries out alarming processing, avoids all warning information and is all reported to rules engines processes, reduces the processing pressure of regulation engine
Power.
The present invention also provides a kind of fault warning processing systems.
It is the functional block diagram of fault warning processing system first embodiment of the present invention referring to Fig. 5, Fig. 5.This implementation
In example, the fault warning processing system includes:
Monitor supervision platform 10, for inputting various dimensions fault warning information to regulation engine;
The present embodiment is unlimited for the specifically setting of monitor supervision platform, be chosen as based on Falcon frame carry out secondary development and
Monitor supervision platform generated is configured, which can collect a variety of warning information and realize second grade alarm.
Further alternative, horn of plenty alerts source information, and then regulation engine is made to possess more comprehensive information positioning failure root
Source, therefore, when receiving the various dimensions fault warning information of monitor supervision platform input, regulation engine is also drawn from specified alarm platform
All application class warning information that associated alarm source is alerted with this are taken, so that the failure for being accurately positioned this alarm is former
Cause.
Wherein, it pulls from specified alarm platform and believes with this all application classes alarm for alerting associated alarm source
Breath is to determine by pulling the keyword in application and apply class warning information.In the present embodiment, it should be noted that although
Monitor supervision platform and alarm platform can be all obtained from application using class warning information, but monitor supervision platform crawl is system operation
In the warning information such as some physical indexs, and alert that platform pulls is the running log of system and each trading volume, time delay,
The warning information such as success rate and the exception reported.
It should be understood that monitor supervision platform from the positions such as host, network, database, platform collect warning information when, host, network,
The warning information that the positions such as database, platform and java virtual machine generate is the warning information of base-level, and warning information obtains
Dimension it is more single, although monitor supervision platform can also pull part warning information from application, the information pulled is only
Some physical indexs.It is the alarm of application level from the warning information that pulls of alarm platform by regulation engine in the present embodiment
Information, and the warning information type pulled is more abundant, by the acquisition of the warning information of two kinds of different stages, can be improved
Position the accuracy of the failure cause of this alarm.
Regulation engine 20, for carrying out logic judgment to the fault warning information based on configuration rule, to determine this
The failure cause of alarm;Based on the failure cause, operation/maintenance data library is retrieved, the corresponding troubleshooting of the failure cause is obtained
Scheme;
In the present embodiment, regulation engine can be carried out logic judgment based on parameter of the built-in rule to input and export judgement
As a result.It is optional to use experian regulation engine defined function, script is write, creation is advised for the operation of fault warning informix
Then, and then logic judgment can be carried out to various dimensions fault warning information.
To realize the accurate judgement to alarm failure reason, preferably believed by the fault warning of a variety of dimensions in the present embodiment
Breath carries out total failure alert analysis.Multiple dimensions are embodied in, by monitor supervision platform collect host, network, database, platform and
The warning information generated in various application programs promotes the standard of fault warning analysis to cover all possible failure cause
True property, and then alert analysis event is saved, improve alarming processing timeliness.
In the present embodiment, regulation engine judges rule based on built-in alarm, quickly carries out to multidimensional fault warning information
Logic judgment, in the present embodiment, specific embodiment includes:
1) mode one, the warning information of various dimensions is compared with configuration rule, with searched in configuration rule with it is more
The information of the warning information same field of dimension;
According to the mapping relations of preset information and failure cause, the corresponding failure cause of the information searched is determined.
2) mode two, in the warning information of various dimensions, according to the warning information of each dimension determine break down source
Head, the source are any one or more in host, network, database, platform or application;
When the source for generating failure is 1, failure cause that the failure cause which is generated is alerted as this;
When the source for generating failure includes multiple, each submodule in warning information is generated to each source and is divided
Analysis, with the final score value for the failure cause that each source of determination generates, by the corresponding failure cause in the highest source of final score value
Failure cause as this alarm.Wherein, it generates each submodule in warning information to source to analyze, to determine source
The mode of the final score value for the failure cause that head generates includes: the power for the warning information for identifying that each submodule in the source generates
Weight values and actual value calculate the source according to the weighted value and actual value of the warning information of the generation of each submodule
Final score value.
It is appreciated that since warning information dimension is more, the failure cause that is obtained by logic judgment there may be
Multiple, therefore, under the present embodiment, preference rule engine carries out comprehensive weight by the failure cause to each seed module
It scores and sorts, so that exporting this alerts most accurate failure cause.
For example, the public network packet loss of system cooperating side A is greater than 30%, position is Guangzhou, meanwhile, system cooperating side B,
The public network packet loss of C is greater than 50%, and position is also Guangzhou.There are also exist under transaction success rate using partner simultaneously this moment
The alarm of drop, therefore, the fault warning information of the comprehensive all dimensions of final regulation engine carry out logical operation, find In Guangzhou Area
The score highest of this failure cause of public network packet loss, to show that this most possible failure cause of alarming is In Guangzhou Area
Public network it is abnormal rather than because system faults itself leads to success rate decline of trading;For another operation system is because of SQL (Structured
Query Language, structured query language) execution efficiency is low and returns to Mass Result, it is looked into slowly to generate database
It askes, database IO (Input/Output) is abnormal, and java virtual machine Indexes Abnormality has dragged slowly entire business processing efficiency, up and down
Trip system average delay increases, and success rate decline judges the source system led to the problem of by regulation engine, then to source original system
Every alarm index weighting (the GC time of such as database IO, CPU and JVM, queue depth's alarm weight are successively successively decreased, and
The alarm such as success rate is extended to when higher than operation system), finally items alarm index is weighted and averaged to obtain source system most
Whole score value, and the failure cause that the failure cause of the highest source system of final score value is alerted as this.
In the present embodiment, for the quick exclusion for realizing alarm failure, regulation engine is returned while determining failure cause
Corresponding troubleshooting scheme out.It is preferably provided in operation/maintenance data library and the operation/maintenance data library to be stored with and covers all possible event
Hinder the corresponding processing scheme (O&M standard operating procedure SOP (Standard Operating Procedure)) of reason.
In this embodiment, the troubleshooting scheme is chosen as manual processing scheme and automatically processes scheme, wherein uses
Automatically process the mode of scheme processing are as follows: fault point is corresponded to restorer based on the corresponding scheme automatic processing that automatically processes
System, automatic processing include restoring fault point or automatic separating fault point automatically, specifically: according to determining troubleshooting scheme
Corresponding script is called, the node for generating failure cause is isolated by the script, it then will be in the node
The state dowm of director's feelings gets off, and goes analysis program in machine code section, to judge practical reason, wherein reason include disk overfill,
Hostdown etc. determines the practical corresponding processing hand of reason finally according to the mapping relations of practical reason and processing means
Section, and execute the processing means.For example, when practical reason is disk overfill, if the corresponding processing means of disk overfill
It is that automatic remove caches, then executes the automatic mode for removing caching and carry out Disk Cleanup;For another example java program emerged in operation line
Journey deadlock situation, automatically find the problem after can automatism isolation correspond to node.
Regulation engine passes through the O&M SOP stored in searching database, and then can get failure cause and correspond to troubleshooting
Scheme, and realize that the combination of warning information+failure cause+troubleshooting scheme exports, to promote the processing of failure whole process
High-timeliness.It is inconvenient there are being obtained in txt, word, excel of different directories compared to conditional electronic version SOP, even if each SOP
Unified storage, but be also inconvenient to retrieve when failure occurs, and the processing of the failure whole process of the present embodiment can greatly shorten failure
Recovery time improves O&M efficiency.For example, synchronization, there are N number of system alarm, regulation engine obtains announcement by logic judgment
Alert basic failure cause is that some city XX runner public-network is abnormal, and retrieves operation/maintenance data library, passes through abnormal IP+ system name
Claim and is matched to corresponding SOP.
In addition, regulation engine is also used to when failure cause is predetermined registration operation, robot is called to give birth to predetermined registration operation automatically
At examination and approval document, examination and approval document is sent to default node and is examined, and executes the examination and approval document pair automatically after the completion of examination & approval
The operating process answered.
Wherein, the predetermined registration operation includes deleting data, and multiple nodes are isolated, restart the high-risk operation such as core system,
When failure cause is high-risk operation, robot is called to automatically generate examination and approval document to these high-risk operations, then examination and approval document is sent to
Default node (being chosen as leader node) is examined, and executes after the completion of examination & approval that the examination and approval document is corresponding to be operated automatically
Journey.
The present embodiment will carry out one in the fault warning information input regulation engine for multiple dimensions that a variety of alarm sources export
Serial complicated logic judgment, and then realize the rapid computations of failure cause, improve the timeliness of troubleshooting.The present embodiment is also
Can alarm while, further automatically retrieval goes out corresponding troubleshooting scheme, so improve on the whole alarm when
Effect property, shortens fault location time.
Further, in another embodiment of fault warning processing system of the present invention, the regulation engine 20 is also used to:
According to the fault warning information, the failure cause and the troubleshooting scheme, this alarm is generated
Difference notice words art;
Notice where calling robot that the notice words art is pushed to corresponding partner and O&M side respectively respectively
Group.
In the present embodiment, regulation engine needs after obtaining fault warning information, failure cause and troubleshooting scheme
Above- mentioned information are pushed to partner and O&M side.For convenient for partner and O&M side can easy fault details, rule draws
It holds up further by fault warning information, failure cause and troubleshooting scheme according to setting form collator as corresponding notice words
Art.
In the present embodiment, first builds partner and group and O&M side is notified to notify group that (different products build difference respectively
Partner notify group or different O&M sides to notify group), then such as wechat group passes through interface again and calls robot (one
Kind application program), such as the WeChat robot of wechat, pass through the notice for this alarm that robot recognition rule engine is pushed
The specific corresponding notice group of art is talked about, then classifying again is pushed to corresponding partner group and O&M side is notified to notify group.
The present embodiment will carry out one in the fault warning information input regulation engine for multiple dimensions that a variety of alarm sources export
Serial complicated logic judgment, and then realize the rapid computations of failure cause and troubleshooting scheme, improve troubleshooting when
Fault message is finally pushed in the notice group at the respective place of corresponding partner and O&M side, this implementation by effect property respectively again
Example can be while alarm, it is further provided the active forewarning traded between troubleshooting scheme and realization and partner, into
And the timeliness of alarm is improved on the whole, shorten fault location time and improves the service quality to partner.
It is further alternative, unify in embodiment in fault warning processing system of the present invention, the regulation engine 20 is also used to:
During fault recovery, identical fault warning information is cached;
Fault recovery progress notification is generated, and every preset duration is pushed to the fault recovery progress notification corresponding
Notice group where partner.
In the present embodiment, since fault warning is usually all real-time acquisition real-time report, when to avoid alerting long
Between brush screen, during fault recovery, regulation engine caches identical fault warning information, while it is logical to generate fault recovery progress
Know, and notify progress notification of the partner in relation to fault recovery every preset duration, avoid warning information long-time brush screen, reduces
The processing pressure of regulation engine, and the service quality to partner is more preferably promoted, it improves customer satisfaction.
Further alternative, in another embodiment of fault warning processing system of the present invention, the monitor supervision platform 10 is also used
In:
The fault warning information that a variety of alarm sources report is collected to be associated alarm, wherein the alarm Source Type packet
It includes: host, network, database, platform, application program;
Based on preset threshold decision rule, regulation engine described in the fault warning information input of preset threshold will be above.
In the present embodiment, for the fault warning analysis for realizing various dimensions, gamut, monitor supervision platform collects a variety of alarm sources
Fault warning information simultaneously reports regulation engine to be handled after being summarized.
In the present embodiment, alarm Source Type includes:
(1) alarm of the generations such as host, such as computer, server;
(2) network, such as the alarm that Internet occurs, such as suspension, packet loss;
(3) database, for example, various products the generations such as database, trading account database alarm;
(4) alarm of the generations such as platform, such as message-oriented middleware platform, service management platform;
When monitor supervision platform is collected into various faults warning information, since this variety of alarm failure information is same failure
The information that reason generates, therefore identical zone bit information (such as field XX) is found in various fault warning information, according to this
The various faults warning information of collection is associated alarm by identical zone bit information.
(5) alarm of the generations such as application program, such as account application, payment application.
In general, influence of the alarm to system varies, if influencing smaller or even hardly having any influence, this
Class alarm is without reporting.Therefore, in the present embodiment, it is previously provided with the judgment threshold of various alarms, it is corresponding by configuring
Judgment rule realizes that second grade output is higher than the abnormal failure information of threshold value to regulation engine.
For example, public network packet loss is greater than 50%, then such warning information needs report;Host disk use space rate reaches
50%, then such warning information needs report;Cpu busy percentage is more than 70%, then such warning information needs report.
In the present embodiment, by preset threshold decision mechanism, to filter out the fault warning letter for being higher than preset threshold
Breath carries out alarming processing, avoids all warning information and is all reported to rules engines processes, reduces the processing pressure of regulation engine
Power.
It is the functional block diagram of fault warning processing system second embodiment of the present invention referring to Fig. 6, Fig. 6.This implementation
In example, the fault warning processing system further include:
Robot 30, the institute for when receiving the call request of the regulation engine, the regulation engine to be exported
State the notice group where notice words art pushes to corresponding partner and O&M side respectively respectively.
In the present embodiment, robot 30 is a kind of application program, such as WeChat robot.Announcement can be realized by robot 30
The classification push of alert notice.
It is built in advance in the present embodiment and links up group, for example partner is established according to product classification and links up group, or may be used also
Group is linked up to establish the corresponding O&M of different product, then activates robot.It is notified in group for example, in advance pulling in robot,
The instruction for sending " activation " printed words can one-touch activation notification group.
In the present embodiment, then various types of other alarm notification that robot 30 can be pushed with recognition rule engine is classified
Corresponding partner is pushed to link up in group and the logical group of corresponding O&M box drain.For example, robot will be accused according to corresponding group ID
Alert notice is pushed to corresponding partner respectively and links up in group and the logical group of O&M box drain.
The present invention also provides a kind of computer readable storage mediums.
Fault warning processing routine, the fault warning processing are stored in the present invention, on computer readable storage medium
The step of fault warning processing method as described in the examples such as any of the above-described is realized when program is executed by processor.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in a storage medium
In (such as ROM/RAM), including some instructions are used so that a terminal (can be mobile phone, computer, server or network are set
It is standby etc.) execute method described in each embodiment of the present invention.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific
Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art
Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much
Form, it is all using equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, directly or indirectly
Other related technical areas are used in, all of these belong to the protection of the present invention.
Claims (12)
1. a kind of fault warning processing method, which is characterized in that the fault warning processing method the following steps are included:
Regulation engine receives the various dimensions fault warning information of monitor supervision platform input;
Logic judgment is carried out to the fault warning information based on configuration rule, to determine the failure cause of this alarm;
Based on the failure cause, operation/maintenance data library is retrieved, the corresponding troubleshooting scheme of the failure cause is obtained.
2. fault warning processing method as described in claim 1, which is characterized in that the configuration rule that is based on is to the failure
Warning information carries out logic judgment, includes: with the step of determining failure cause that this is alerted
The warning information of various dimensions is compared with configuration rule, to search the warning information with various dimensions in configuration rule
The information of same field;
According to the mapping relations of preset information and failure cause, the corresponding failure cause of the information searched is determined.
3. fault warning processing method as described in claim 1, which is characterized in that be based on the failure cause, inspection described
Rope operation/maintenance data library, after the step of obtaining the failure cause corresponding troubleshooting scheme, the fault warning processing side
Method includes:
Fault point is corresponded to based on the corresponding scheme automatic processing that automatically processes, wherein automatic processing includes automatic recovery event
Barrier point or automatic separating fault point.
4. fault warning processing method as described in claim 1, which is characterized in that when failure cause is predetermined registration operation, institute
The step of stating based on the failure cause, retrieve operation/maintenance data library, obtaining the failure cause corresponding troubleshooting scheme it
Afterwards, the fault warning processing method includes:
It calls robot to automatically generate examination and approval document to predetermined registration operation, examination and approval document is sent to default node and is examined, and
Automatically the corresponding operating process of the examination and approval document is executed after the completion of examination & approval.
5. fault warning processing method as described in claim 1, which is characterized in that be based on the failure cause, inspection described
Rope operation/maintenance data library, after the step of obtaining the failure cause corresponding troubleshooting scheme, the fault warning processing side
Method further include:
According to the fault warning information, the failure cause and the troubleshooting scheme, the difference of this alarm is generated
Notice words art;
Notice group where calling robot that the notice words art is pushed to corresponding partner and O&M side respectively respectively.
6. fault warning processing method as claimed in claim 5, which is characterized in that in the calling robot by the notice
Words art was pushed to respectively after the step of notice group at corresponding partner and O&M side respective place, the fault warning processing
Method further include:
During fault recovery, the regulation engine caches identical fault warning information;
Fault recovery progress notification is generated, and the fault recovery progress notification is pushed into corresponding cooperation every preset duration
Notice group where side.
7. fault warning processing method as described in claim 1, which is characterized in that the fault warning processing method is also wrapped
It includes:
When receiving the various dimensions fault warning information of monitor supervision platform input, the regulation engine is also drawn from specified alarm platform
All application class warning information that associated alarm source is alerted with this are taken, so that the failure for being accurately positioned this alarm is former
Cause.
8. such as fault warning processing method of any of claims 1-7, which is characterized in that connect in the regulation engine
Before the step of receiving the various dimensions fault warning information of monitor supervision platform input, the fault warning processing method further include:
The monitor supervision platform collects the fault warning information that a variety of alarm sources report to be associated alarm, wherein the alarm
Source Type includes: host, network, database, platform, application program;
Based on preset threshold decision rule, regulation engine described in the fault warning information input of preset threshold will be above.
9. a kind of fault warning processing system, which is characterized in that the fault warning processing system includes:
Monitor supervision platform, for inputting various dimensions fault warning information to regulation engine;
Regulation engine, for carrying out logic judgment to the fault warning information based on configuration rule, to determine this alarm
Failure cause;Based on the failure cause, operation/maintenance data library is retrieved, the corresponding troubleshooting scheme of the failure cause is obtained.
10. fault warning processing system as claimed in claim 9, which is characterized in that the monitor supervision platform is also used to:
The fault warning information that a variety of alarm sources report is collected to be associated alarm, wherein the alarm Source Type includes: master
Machine, network, database, platform, application program;
Based on preset threshold decision rule, regulation engine described in the fault warning information input of preset threshold will be above.
11. the fault warning processing system as described in claim 9 or 10, which is characterized in that the fault warning processing system
Further include:
Robot, the notice for when receiving the call request of the regulation engine, the regulation engine to be exported
Words art pushes to the notice group at the respective place of corresponding partner and O&M side respectively.
12. a kind of computer readable storage medium, which is characterized in that store faulty announcement on the computer readable storage medium
Alert processing routine, is realized when the fault warning processing routine is executed by processor as of any of claims 1-8
The step of fault warning processing method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810979619.XA CN108989132A (en) | 2018-08-24 | 2018-08-24 | Fault warning processing method, system and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810979619.XA CN108989132A (en) | 2018-08-24 | 2018-08-24 | Fault warning processing method, system and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108989132A true CN108989132A (en) | 2018-12-11 |
Family
ID=64547637
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810979619.XA Pending CN108989132A (en) | 2018-08-24 | 2018-08-24 | Fault warning processing method, system and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108989132A (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135602A (en) * | 2019-05-17 | 2019-08-16 | 伍兴佳 | Steel tower failure monitoring dispatching method and device |
CN110166297A (en) * | 2019-05-22 | 2019-08-23 | 平安信托有限责任公司 | O&M method, system, equipment and computer readable storage medium |
CN110415115A (en) * | 2019-06-18 | 2019-11-05 | 平安证券股份有限公司 | The O&M method, apparatus and computer readable storage medium of transaction system |
CN110601894A (en) * | 2019-09-18 | 2019-12-20 | 中国工商银行股份有限公司 | Alarm processing method and device, electronic equipment and readable storage medium |
CN110635954A (en) * | 2019-10-21 | 2019-12-31 | 中国民航信息网络股份有限公司 | Method and system for processing network fault of data center |
CN110728498A (en) * | 2019-10-21 | 2020-01-24 | 北京百度网讯科技有限公司 | Information interaction method and device |
CN111030857A (en) * | 2019-12-06 | 2020-04-17 | 深圳前海微众银行股份有限公司 | Network alarm method, device, system and computer readable storage medium |
CN111343017A (en) * | 2020-02-22 | 2020-06-26 | 苏州浪潮智能科技有限公司 | Method, system, equipment and medium for cloud platform resource alarm |
CN111628888A (en) * | 2020-04-30 | 2020-09-04 | 中国移动通信集团江苏有限公司 | Fault diagnosis method, device, equipment and computer storage medium |
CN111814999A (en) * | 2020-07-08 | 2020-10-23 | 上海燕汐软件信息科技有限公司 | Fault work order generation method, device and equipment |
CN111835566A (en) * | 2020-07-08 | 2020-10-27 | 上海燕汐软件信息科技有限公司 | System fault management method, device and system |
CN111865673A (en) * | 2020-07-08 | 2020-10-30 | 上海燕汐软件信息科技有限公司 | Automatic fault management method, device and system |
CN111844029A (en) * | 2020-07-09 | 2020-10-30 | 上海有个机器人有限公司 | Robot early warning monitoring method and device |
CN111901140A (en) * | 2020-06-11 | 2020-11-06 | 北京百度网讯科技有限公司 | Exception handling method and device, electronic equipment and storage medium |
CN112328372A (en) * | 2020-11-27 | 2021-02-05 | 新华智云科技有限公司 | Kubernetes node self-healing method and system |
CN112447279A (en) * | 2020-12-10 | 2021-03-05 | 上海联影医疗科技股份有限公司 | Task processing method and device, electronic equipment and storage medium |
CN112559569A (en) * | 2020-12-11 | 2021-03-26 | 广东电力通信科技有限公司 | Alarm rule processing method for composite condition |
CN112711507A (en) * | 2020-12-17 | 2021-04-27 | 浙江高速信息工程技术有限公司 | Device alarm method, electronic device, and medium |
CN113312200A (en) * | 2021-06-01 | 2021-08-27 | 中国民航信息网络股份有限公司 | Event processing method and device, computer equipment and storage medium |
CN113553210A (en) * | 2021-07-30 | 2021-10-26 | 平安普惠企业管理有限公司 | Alarm data processing method, device, equipment and storage medium |
CN113590370A (en) * | 2021-08-06 | 2021-11-02 | 北京百度网讯科技有限公司 | Fault processing method, device, equipment and storage medium |
CN114490751A (en) * | 2021-12-29 | 2022-05-13 | 深圳优地科技有限公司 | Method, device and equipment for determining robot fault and readable storage medium |
CN114567539A (en) * | 2022-03-22 | 2022-05-31 | 中国农业银行股份有限公司 | Method, device, equipment and medium for processing network system exception |
CN114866400A (en) * | 2022-04-29 | 2022-08-05 | 中国电子科技集团公司第五十四研究所 | Alarm rule reasoning method based on cache space optimization |
CN115827398A (en) * | 2023-02-24 | 2023-03-21 | 天翼云科技有限公司 | Method and device for calculating alarm information component value, electronic equipment and storage medium |
CN115883330A (en) * | 2023-02-08 | 2023-03-31 | 阿里云计算有限公司 | Alarm event processing method, system, device, storage medium and program product |
WO2024066346A1 (en) * | 2022-09-27 | 2024-04-04 | 中兴通讯股份有限公司 | Alarm processing method and apparatus, and storage medium and electronic apparatus |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105099783A (en) * | 2015-08-20 | 2015-11-25 | 长威信息科技发展股份有限公司 | Method and system for realizing automation of warning emergency disposal of business system |
CN105262616A (en) * | 2015-09-21 | 2016-01-20 | 浪潮集团有限公司 | Failure repository-based automated failure processing system and method |
-
2018
- 2018-08-24 CN CN201810979619.XA patent/CN108989132A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105099783A (en) * | 2015-08-20 | 2015-11-25 | 长威信息科技发展股份有限公司 | Method and system for realizing automation of warning emergency disposal of business system |
CN105262616A (en) * | 2015-09-21 | 2016-01-20 | 浪潮集团有限公司 | Failure repository-based automated failure processing system and method |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135602A (en) * | 2019-05-17 | 2019-08-16 | 伍兴佳 | Steel tower failure monitoring dispatching method and device |
CN110166297A (en) * | 2019-05-22 | 2019-08-23 | 平安信托有限责任公司 | O&M method, system, equipment and computer readable storage medium |
CN110415115A (en) * | 2019-06-18 | 2019-11-05 | 平安证券股份有限公司 | The O&M method, apparatus and computer readable storage medium of transaction system |
CN110601894A (en) * | 2019-09-18 | 2019-12-20 | 中国工商银行股份有限公司 | Alarm processing method and device, electronic equipment and readable storage medium |
CN110635954B (en) * | 2019-10-21 | 2022-10-21 | 中国民航信息网络股份有限公司 | Method and system for processing network fault of data center |
CN110728498A (en) * | 2019-10-21 | 2020-01-24 | 北京百度网讯科技有限公司 | Information interaction method and device |
CN110635954A (en) * | 2019-10-21 | 2019-12-31 | 中国民航信息网络股份有限公司 | Method and system for processing network fault of data center |
CN111030857A (en) * | 2019-12-06 | 2020-04-17 | 深圳前海微众银行股份有限公司 | Network alarm method, device, system and computer readable storage medium |
CN111343017A (en) * | 2020-02-22 | 2020-06-26 | 苏州浪潮智能科技有限公司 | Method, system, equipment and medium for cloud platform resource alarm |
CN111343017B (en) * | 2020-02-22 | 2022-12-09 | 苏州浪潮智能科技有限公司 | Method, system, equipment and medium for cloud platform resource alarm |
CN111628888A (en) * | 2020-04-30 | 2020-09-04 | 中国移动通信集团江苏有限公司 | Fault diagnosis method, device, equipment and computer storage medium |
CN111628888B (en) * | 2020-04-30 | 2022-08-12 | 中国移动通信集团江苏有限公司 | Fault diagnosis method, device, equipment and computer storage medium |
CN111901140A (en) * | 2020-06-11 | 2020-11-06 | 北京百度网讯科技有限公司 | Exception handling method and device, electronic equipment and storage medium |
CN111835566A (en) * | 2020-07-08 | 2020-10-27 | 上海燕汐软件信息科技有限公司 | System fault management method, device and system |
CN111814999B (en) * | 2020-07-08 | 2024-01-16 | 上海燕汐软件信息科技有限公司 | Fault work order generation method, device and equipment |
CN111865673A (en) * | 2020-07-08 | 2020-10-30 | 上海燕汐软件信息科技有限公司 | Automatic fault management method, device and system |
CN111814999A (en) * | 2020-07-08 | 2020-10-23 | 上海燕汐软件信息科技有限公司 | Fault work order generation method, device and equipment |
CN111844029A (en) * | 2020-07-09 | 2020-10-30 | 上海有个机器人有限公司 | Robot early warning monitoring method and device |
CN112328372A (en) * | 2020-11-27 | 2021-02-05 | 新华智云科技有限公司 | Kubernetes node self-healing method and system |
CN112447279A (en) * | 2020-12-10 | 2021-03-05 | 上海联影医疗科技股份有限公司 | Task processing method and device, electronic equipment and storage medium |
CN112559569A (en) * | 2020-12-11 | 2021-03-26 | 广东电力通信科技有限公司 | Alarm rule processing method for composite condition |
CN112559569B (en) * | 2020-12-11 | 2023-07-21 | 广东电力通信科技有限公司 | Alarm rule processing method for composite condition |
CN112711507A (en) * | 2020-12-17 | 2021-04-27 | 浙江高速信息工程技术有限公司 | Device alarm method, electronic device, and medium |
CN113312200A (en) * | 2021-06-01 | 2021-08-27 | 中国民航信息网络股份有限公司 | Event processing method and device, computer equipment and storage medium |
WO2022252860A1 (en) * | 2021-06-01 | 2022-12-08 | 中国民航信息网络股份有限公司 | Event processing method and apparatus, and computer device and storage medium |
CN113553210A (en) * | 2021-07-30 | 2021-10-26 | 平安普惠企业管理有限公司 | Alarm data processing method, device, equipment and storage medium |
CN113590370A (en) * | 2021-08-06 | 2021-11-02 | 北京百度网讯科技有限公司 | Fault processing method, device, equipment and storage medium |
CN113590370B (en) * | 2021-08-06 | 2022-06-21 | 北京百度网讯科技有限公司 | Fault processing method, device, equipment and storage medium |
WO2023011160A1 (en) * | 2021-08-06 | 2023-02-09 | 北京百度网讯科技有限公司 | Fault processing method and apparatus, device, and storage medium |
CN114490751A (en) * | 2021-12-29 | 2022-05-13 | 深圳优地科技有限公司 | Method, device and equipment for determining robot fault and readable storage medium |
CN114490751B (en) * | 2021-12-29 | 2024-06-04 | 深圳优地科技有限公司 | Method, device and equipment for determining robot faults and readable storage medium |
CN114567539A (en) * | 2022-03-22 | 2022-05-31 | 中国农业银行股份有限公司 | Method, device, equipment and medium for processing network system exception |
CN114567539B (en) * | 2022-03-22 | 2024-04-12 | 中国农业银行股份有限公司 | Network system exception handling method, device, equipment and medium |
CN114866400A (en) * | 2022-04-29 | 2022-08-05 | 中国电子科技集团公司第五十四研究所 | Alarm rule reasoning method based on cache space optimization |
CN114866400B (en) * | 2022-04-29 | 2024-04-30 | 中国电子科技集团公司第五十四研究所 | Alarm rule reasoning method based on buffer space optimization |
WO2024066346A1 (en) * | 2022-09-27 | 2024-04-04 | 中兴通讯股份有限公司 | Alarm processing method and apparatus, and storage medium and electronic apparatus |
CN115883330A (en) * | 2023-02-08 | 2023-03-31 | 阿里云计算有限公司 | Alarm event processing method, system, device, storage medium and program product |
CN115827398A (en) * | 2023-02-24 | 2023-03-21 | 天翼云科技有限公司 | Method and device for calculating alarm information component value, electronic equipment and storage medium |
WO2024174700A1 (en) * | 2023-02-24 | 2024-08-29 | 天翼云科技有限公司 | Method and apparatus for calculating component value of alarm information, and electronic device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108989132A (en) | Fault warning processing method, system and computer readable storage medium | |
CN112653586B (en) | Time-space big data platform application performance management method based on full link monitoring | |
US11442803B2 (en) | Detecting and analyzing performance anomalies of client-server based applications | |
CN109783322A (en) | A kind of monitoring analysis system and its method of enterprise information system operating status | |
CN111176879A (en) | Fault repairing method and device for equipment | |
CN111162949A (en) | Interface monitoring method based on Java byte code embedding technology | |
CN112965874A (en) | Configurable monitoring alarm method and system | |
CN111756582A (en) | Service chain monitoring method based on NFV log alarm | |
WO2007143943A1 (en) | Method, system and network device of centralized maintenance of multiple devices | |
CN112559237B (en) | Operation and maintenance system troubleshooting method and device, server and storage medium | |
CN108845912A (en) | Service interface calls the alarm method of failure and calculates equipment | |
CN116755992B (en) | Log analysis method and system based on OpenStack cloud computing | |
CN114528175A (en) | Micro-service application system root cause positioning method, device, medium and equipment | |
WO2015187001A2 (en) | System and method for managing resources failure using fast cause and effect analysis in a cloud computing system | |
CN117194142A (en) | Integrated application performance diagnosis system and method based on link tracking | |
CN114374600A (en) | Network operation and maintenance method, device, equipment and product based on big data | |
CN116662127A (en) | Method, system, equipment and medium for classifying and early warning equipment alarm information | |
CN116795631A (en) | Service system monitoring alarm method, device, equipment and medium | |
CN114138522A (en) | Micro-service fault recovery method and device, electronic equipment and medium | |
CN115549953B (en) | Network security alarm method and system | |
CN115174350B (en) | Operation and maintenance alarm method, device, equipment and medium | |
CN114531338A (en) | Monitoring alarm and tracing method and system based on call chain data | |
CN114510364A (en) | Abnormal data root cause analysis method and device combining text clustering with link calling | |
CN114168371A (en) | Intelligent automatic fault alarm system | |
CN113342596A (en) | Distributed monitoring method, system and device for equipment indexes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181211 |