CN107562556A - Restoration methods, recovery device and the storage medium of failure - Google Patents

Restoration methods, recovery device and the storage medium of failure Download PDF

Info

Publication number
CN107562556A
CN107562556A CN201710691358.7A CN201710691358A CN107562556A CN 107562556 A CN107562556 A CN 107562556A CN 201710691358 A CN201710691358 A CN 201710691358A CN 107562556 A CN107562556 A CN 107562556A
Authority
CN
China
Prior art keywords
node
alarm
failure
alarm parameter
recovery
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710691358.7A
Other languages
Chinese (zh)
Other versions
CN107562556B (en
Inventor
陈薪
袁佳
秦涛
雷教敏
朱志武
赵志辉
刘光华
付惠
田盈盈
杨文兵
陈雷
王正迪
党受辉
刘章雄
王建学
杨继宁
梅璠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201710691358.7A priority Critical patent/CN107562556B/en
Publication of CN107562556A publication Critical patent/CN107562556A/en
Application granted granted Critical
Publication of CN107562556B publication Critical patent/CN107562556B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of restoration methods of failure, recovery device and storage medium, the restoration methods include:Obtain fault message;According to default alarm matched rule, for alarm parameter and troubleshooting process corresponding to fault message matching;Convergence process is carried out to the alarm parameter, to generate convergent alarm parameter;The troubleshooting process is performed according to the convergent alarm parameter, to carry out fault recovery.The present invention is by default alarm matched rule, for alarm parameter and troubleshooting process corresponding to fault message matching, and carries out convergence process to alarm parameter, have the advantages that compatible strong, handling process simply, occupy that system resource is few and treatment effeciency is high.

Description

Restoration methods, recovery device and the storage medium of failure
Technical field
The invention belongs to data processing field, more particularly to a kind of restoration methods of failure, recovery device and storage medium.
Background technology
With the rise of types of applications program, all types of software faults also frequently occurs, and not only results in business It is disconnected, more can be that corresponding businessman and user bring bad experience.Therefore, seem the problem of automatically restoring fault it is particularly urgent and It is prominent.
Traditional fault automatic recovery system, both for specific failure.By the exploit person of fault automatic recovery system Then member, configuration pin are performed by flow system again to the handling process of this specific failure.The configuration of handling process, it is typically By the description file of specified Domain Specific Language (Domain Specific Language, DSL) type, in addition, part The processing of flow even also needs to developer and carries out targetedly coding work.
Traditional fault automatic recovery system, the drawbacks of certain be present, such as:User-defined threshold is very high, leads to Some bases, public troubleshooting process can only often be covered.In terms of access and maintenance, there is compatibility between recovery scheme Property poor and handling process it is more and numerous and diverse cause the technical problems such as system resources consumption is larger, feedback is slow.
The content of the invention
The embodiment of the present invention provides a kind of restoration methods of failure, recovery device and storage medium, have compatibility strong and The advantages that handling process is simple.
In order to solve the above technical problems, the embodiment of the present invention provides following technical scheme:
A kind of restoration methods of failure, including:
Obtain fault message;
According to default alarm matched rule, for alarm parameter and troubleshooting stream corresponding to fault message matching Journey;
Convergence process is carried out to the alarm parameter, to generate convergent alarm parameter;And
The troubleshooting process is performed according to the convergent alarm parameter, to carry out fault recovery.
In order to solve the above technical problems, the embodiment of the present invention also provides following technical scheme:
A kind of recovery device of failure, including:
Acquisition module, for obtaining fault message;
Matching module, for being alarm parameter corresponding to fault message matching according to default alarm matched rule And troubleshooting process;
Module is restrained, for carrying out convergence process to the alarm parameter, to generate convergent alarm parameter;And
Recovery module, it is extensive to carry out failure for performing the troubleshooting process according to the convergent alarm parameter It is multiple.
In order to solve the above technical problems, the embodiment of the present invention also provides following technical scheme:
A kind of computer-readable recording medium, is stored thereon with computer program, and the computer program is held by processor During row, the restoration methods of above-mentioned failure are realized.
Restoration methods, recovery device and the storage medium of failure provided in an embodiment of the present invention, pass through default alarm It is alarm parameter and troubleshooting process corresponding to fault message matching with rule, and convergence process, tool is carried out to alarm parameter Have the advantages that compatible strong, handling process is simple, occupy that system resource is few and treatment effeciency is high.
Brief description of the drawings
Below in conjunction with the accompanying drawings, by the way that the embodiment of the present invention is described in detail, technical scheme will be made And other beneficial effects are apparent.
Fig. 1 is the schematic flow sheet of the restoration methods of failure provided in an embodiment of the present invention;
Fig. 2 is another flow signal of the restoration methods of the processing method failure of video data provided in an embodiment of the present invention Figure;
Fig. 3 is the user configuration interface schematic diagram of alarm matching provided in an embodiment of the present invention;
Fig. 4 is the interface schematic diagram of tree-like troubleshooting process provided in an embodiment of the present invention;
Fig. 5 is the processing logical schematic of tree-like troubleshooting process provided in an embodiment of the present invention;
Fig. 6 is the technology side schematic diagram that user-defined alarm provided in an embodiment of the present invention is set;
Fig. 7 is the module diagram of the recovery device of failure provided in an embodiment of the present invention;
Fig. 8 is another module diagram of the recovery device of failure provided in an embodiment of the present invention;
Fig. 9 is that the hardware environment of the restoration methods of failure provided in an embodiment of the present invention, recovery device and storage medium is shown It is intended to.
Embodiment
Schema is refer to, wherein identical element numbers represent identical component, and principle of the invention is to implement one Illustrated in appropriate computing environment.The following description is based on the illustrated specific embodiment of the invention, and it should not be by It is considered as the limitation present invention other specific embodiments not detailed herein.
In the following description, specific embodiment of the invention is by with reference to as the step performed by one or multi-section computer And symbol illustrates, unless otherwise stating clearly.Therefore, these steps and operation will have to mention for several times is performed by computer, this paper institutes The computer of finger, which performs, to be included by representing with the computer processing unit of the electronic signal of the data in a structuring pattern Operation.The data or the opening position being maintained in the memory system of the computer are changed in this operation, and its is reconfigurable Or change the running of the computer in a manner familiar to those skilled in the art in addition.The data structure that the data are maintained For the provider location of the internal memory, it has the particular characteristics as defined in the data format.But the principle of the invention is with above-mentioned text Word illustrates that it is not represented as a kind of limitation, those skilled in the art will appreciate that plurality of step as described below and behaviour Also may be implemented among hardware.
Term as used herein " module ", " unit " can regard the software object to be performed in the arithmetic system as.This Different components, module, engine and service described in text can be regarded as the objective for implementation in the arithmetic system.It is and as described herein Device and method is preferably implemented in a manner of software, can also be implemented certainly on hardware, is protected in the present invention Within the scope of.
Referring to Fig. 1, Fig. 1 is the schematic flow sheet of the restoration methods of failure provided in an embodiment of the present invention.The failure Restoration methods, on the server that can be applied to there is operation maintenance correlation function.
The restoration methods of the failure, comprise the following steps:
In step S101, fault message is obtained.
Wherein, the fault message includes but is not limited to:Abnormality alarming and healthy early warning.
The abnormality alarming, service exception caused by referring to a variety of causes or pseudo- service exception.It is common to be attributed to: The abnormal caused service exception of network or Internet data center (Internet Data Center, IDC), key modules performance Service exception, host hardware or service exception caused by system exception caused by problem and the invalid pseudo- business for accusing initiation by mistake It is abnormal etc..Wherein, the ratio highest that service exception caused by host hardware or system exception occurs.
The healthy early warning, refers to all kinds of indexs of system obtained, and all kinds of indexs are used to carry out failure to system Assessment and detection.It is understood that the healthy early warning can be understood as system physical examination report, for index threshold After value is contrasted, to discover whether to exist abnormity point, the abnormity point can be considered fault message.
In step s 102, according to default alarm matched rule, for alarm parameter corresponding to fault message matching And troubleshooting process.
Referring to Fig. 3, it show the user configuration interface schematic diagram of alarm matching.Wherein, the alarm matched rule, bag Include:Alarm parameter 310 and troubleshooting process 320 corresponding to fault message, wherein, the alarm parameter 310 and the event Hinder handling process 320 and support that user's is self-defined, so that the attribute to fault message and relative strategy are adjusted.
The alarm parameter 310, " self-healing scene " can be defined or be shown as in practical operation interface, that is, need to recover Scene in corresponding all kinds of parameters.The alarm parameter 310 includes:Fault type 31, failure thing corresponding to fault message Part numbering 32, word description 33, environment attribute 34, business subregion 35 and business module 36.
Specifically, fault type 31, also referred to as alarm type, it is self-defined for being carried out to the attribute of failure.Event of failure Numbering 32, it is that corresponding event of failure numbering is searched and generated under fault type, such as:973469Hostsrv renewal failures Alert (alarm type).Word description 33, for supporting to press Content Selection, the part alarm of content matching can be only screened, Matched using regular expression.This word description 33 is option, is not filled out not filter.Environment attribute 34, such as figure also may be used SET attributes are shown as, for being configured using environmental form.The type of the application environment includes but is not limited to:Test Environment, experience environment and formal environments.Business subregion 35, for carrying out virtual partition according to business tine.Such as:318 (area of Guangdong one), it is to be understood that the area of Guangdong one, typically refer to user's prioritizing selection of suggestion In Guangdong Province, but not shadow Ring the entrance of other zone users.Business module 36, refers to the logic module of business, such as logic module on main frame, game Module, chat module etc..
The troubleshooting process 320, " self-healing processing " can be defined or is shown as in user configuration interface, for carrying For the relevant options of failure flow processing, carried out for user self-defined.The troubleshooting process 320 includes:Tree-like failure Handling process 37, notify 38 and enable control 39.
Specifically, control 39 is enabled, is used to determine whether to enable corresponding troubleshooting process.
It is described to notify 38, including:The triggering of notice, the channel of notice and notifier.Specifically, the notice is tactile Give out a contract for a project and include:Triggered when triggering and fail when being triggered during beginning, be successful.The channel of the notice, including:Short message, mail, social activity Software (such as wechat, Tengxun lead to RTX) and phone.The notifier includes:Responsible person concerned (such as main frame director) and outer Enclose person liable's (additional notifications people can be added).
The tree-like troubleshooting process 37, can further regard to Fig. 4, show the interface of tree-like troubleshooting process Schematic diagram.Wherein, the tree-like troubleshooting process 37, including:One root node 371, multiple child nodes 372 and its association Relation 373.Wherein, the child node 372 still further comprises:Leaf node 3720, it is meant that corresponding troubleshooting process Terminate.
It is understood that each node (371,372 or 3720) includes corresponding recovery parameter instruction, it is used for When triggering the node, failure is recovered according to the recovery instruction.
It is understood that the corresponding unique root node 371 of each tree-like troubleshooting process 37.This step can provide Multiple triggering nodes, are selected for user.Node is wherein triggered by setting label or word to be shown, with " fast in figure It is prompt " exemplified by, represent the node for triggering node 371 and 374.Further, it is also possible to group is carried out to multiple tree-like troubleshooting processes Close, such as:Child node using the root node of a tree-like failure flow as another tree-like troubleshooting process.
In step s 103, convergence process is carried out to the alarm parameter, to generate convergent alarm parameter.
It is understood that when receiving an alarm parameter, subsequent step can be directly carried out.When receiving multiple alarms During parameter, judge whether the multiple alarm parameter is associated with same event, and when being associated with same event, to the multiple Alarm parameter is restrained, the alarm parameter after being restrained with generation.Specifically, the convergence, refer to work as multiple alarm parameters When what is represented is same event, only perform any bar alarm parameter therein, and other alarm parameters are stored and Do not perform.
For example, such as:When system in case of system halt, the more of communication module, memory module and treatment progress can be received respectively Individual alarm parameter, it can now be restrained, with the alarm parameter after one convergence of generation.
In step S104, according to troubleshooting process corresponding to the convergent alarm parameter execution, to carry out failure Recover.
Specifically, this step includes:
(1) service logic of the middle present node of the troubleshooting process is obtained.Wherein, the service logic includes: Customized ESB or the corresponding function of operating system are called by procotol, the corresponding function includes:Weight Open function, transfer function, and/or script perform function.
(2) according to the convergent alarm parameter, the service logic of the troubleshooting is performed.
Wherein, if being successfully recovered, alarming processing list is generated, and the processing procedure is maintained at the alarming processing list On, and the alarming processing list is preserved into record alert database.If recovering failure or time-out, the warning information is carried out Tracking, troubleshooting process described in one-step optimization of going forward side by side.
The restoration methods of failure provided in an embodiment of the present invention, matched by default alarm matched rule for fault message Corresponding alarm parameter and troubleshooting process, and convergence process is carried out to alarm parameter, it is not necessary to for every kind of specific failure Carry out it is separately encoded, therefore with compatibility it is strong the advantages of, while configure and recover handling process it is simple, failure can be improved The treatment effeciency of recovery.
Referring to Fig. 2, Fig. 2 is another schematic flow sheet of the restoration methods of failure provided in an embodiment of the present invention.It is described The restoration methods of failure, on the server that can be applied to there is operation maintenance correlation function.
The restoration methods of the failure, comprise the following steps:
In step s 201, tree-like troubleshooting process is configured.
Please refer to the interface schematic diagram of the tree-like troubleshooting process shown in Fig. 4, specifically, this step includes:
(1) triggering node is set, as root node 371.
It is understood that the corresponding unique root node 371 of each tree-like troubleshooting process 37.
This step can be with:Multiple triggering nodes are obtained, and are labeled by label or word, are with " quick " in figure Example, it is shown with triggering node 371 and triggering node 374;One triggering node conduct of selection from the multiple triggering node again Unique root node.
(2) according to logic is handled, the association option of the root node is obtained successively, to generate child node 372.
It is understood that it is described tree-like, binary tree is typically referred to, i.e.,:Each node decomposable asymmetric choice net go out a left subtree and One right subtree, the processing logic for being successfully recovered and recovering failure is corresponded to respectively.
Further, this step, which can perform, is:(2.1) since the root node 371, present node is set gradually Recover instruction, it is described to recover instruction for carrying out fault recovery to fault message;(2.2) according to processing logic, will be successfully recovered Association option with recovery failure respectively as present node, and it is generated as the child node 372 of present node;(2.3) institute is recorded State the incidence relation 373 of present node and the father node of the present node.
(3) according to the root node 371, the child node 372 and incidence relation 373, tree-like troubleshooting stream is generated Journey 37.
Wherein, multiple tree-like troubleshooting processes can also be combined.I.e.:It is right when including multiple triggering nodes The root node, non-selected triggering node and the child node carry out the combination of tree-like logic process flow.It is appreciated that For:When multiple triggering nodes be present, using the root node of a tree-like failure flow as another tree-like troubleshooting process Child node, reference can be made to node 374.
Wherein, the child node 372 still further comprises:Leaf node 3720, it is meant that corresponding troubleshooting process Terminate.
In step S202, fault message is obtained.
This step can obtain fault message from the warning system of association.The fault message includes but is not limited to:It is different Often alarm and healthy early warning.
In step S203, according to default alarm matched rule, for alarm parameter corresponding to fault message matching With tree-like troubleshooting process.
Referring to Fig. 3, it show the user configuration interface schematic diagram of alarm matching.Wherein, the alarm matched rule, bag Include:Alarm parameter 310 and troubleshooting process 320 corresponding to fault message, wherein, the alarm parameter 310 and the event Hinder handling process 320 and support that user's is self-defined, so that the attribute to fault message and relative strategy are adjusted.
The alarm parameter 310, " self-healing scene " can be defined or be shown as in practical operation interface, that is, need to recover Scene in corresponding all kinds of parameters.The alarm parameter 310 includes:Fault type 31, failure thing corresponding to fault message Part numbering 32, word description 33, environment attribute 34, business subregion 35 and business module 36.
The troubleshooting process 320, " self-healing processing " can be defined or is shown as in user configuration interface, for carrying For the relevant options of failure flow processing, carried out for user self-defined.The troubleshooting process 320 includes:Tree-like failure Handling process 37, notify 38 and enable control 39.
In step S204, whether the alarm event according to corresponding to the alarm parameter occurs first, and alarm is joined Number carries out convergence process, to generate convergent alarm parameter.
Specifically, this step includes:
(1) judge whether alarm event corresponding to alarm parameter is to occur first, wherein the alarm event is corresponding at least The alarm parameter of one.Wherein, if occurring first, then step (2) is performed, if not occurring first, then performs step (3).
For example, such as:When system in case of system halt, it is multiple communication module, memory module, treatment progress etc. to be received respectively Alarm parameter, above-mentioned multiple alarm parameters can now be restrained, with the alarm parameter after one convergence of generation.
(2) using the alarm parameter as the alarm parameter after convergence, and send to the troubleshooting process.
(3) alarm parameter is recorded in the alarm event in record alert database, and by the alarm parameter Labeled as having restrained.
In step S205, the troubleshooting process is performed according to the convergent alarm parameter, it is extensive to carry out failure It is multiple.
Specifically, this step includes:
(1) alarm parameter after the convergence is sent to root node corresponding to troubleshooting process;
(2) obtain and recover instruction corresponding to the root node;
(3) instruction is recovered to fault message progress fault recovery according to described;
(4) judge whether present node is leaf node;
(5) when present node is leaf node, the fault recovery is terminated;
(6) when present node is not leaf node, according to fault recovery result, child node corresponding to root node is obtained Recover instruction, to carry out fault recovery, wherein the fault recovery result includes:It is successfully recovered or recovers failure.
For further instruction above-mentioned steps, please refer to Fig. 5, tree-like failure provided in an embodiment of the present invention is shown The processing logical schematic of handling process.The processing logic of the tree-like troubleshooting process includes:
Alert matching module 51:Fault message is obtained, and is the fault message according to default alarm matched rule With corresponding alarm parameter and tree-like troubleshooting process.
Alarm convergence module 52:The fault message is subjected to convergence process, and the convergence process result is write into number According to storehouse 58.
Pending flow queue 53:The alarm parameter after the convergence is received, and determines current node;
Process control module 54:According to the current node, next node for needing to perform is obtained;
Unit assignment queue 55:If without next node for needing to perform, it is leaf node to judge present node, knot Shu Suoshu fault recoveries, and preserved recovery process to database 58 by process control module 54;If not leaf node, then obtain Take and cache and recover instruction corresponding to present node;
Task execution module 56:The recovery instruction is read from unit assignment queue 55 successively, and according to the recovery Instruction carries out fault recovery to the fault message;
It is understood that the node in unit assignment queue 55, can be taken out by task execution module 56, and according to this The task configuration of node, specific service logic is performed, and worked as in the pending flow queue renewal of unit assignment Inform when done Front nodal point.Here service logic calls the ESB of enterprise oneself all through http protocol, or other with execution class behaviour The system of work, it can such as restart system, the system that file and perform script can be transmitted of main frame.It is each to perform connecing for class system Mouth calls masterplate, is completed by developing, and user only needs the config option provided by Web page, carrys out personalized tool Body calling logic.
Polling system and readjustment control module 57:Part needs to be polled the node of control and readjustment control, may Circulated between unit assignment queue, task execution module and poll/readjustment control module repeatedly, until the node performs knot Beam.
Wherein, the flow corresponding to the unit assignment of end is performed, can be pushed into again by task execution module 56 pending Flow queue 53, its next execution node is calculated by process control module 54.So circulation, until whole flow process terminates.
In step S206, after terminating the fault recovery, the process of the fault recovery is stored in record alert database In.
Wherein, if being successfully recovered, alarming processing list is generated, and the processing procedure is maintained at the alarming processing list On, and the alarming processing list is preserved into record alert database.
If recovering failure or time-out, the warning information is tracked, troubleshooting stream described in one-step optimization of going forward side by side Journey.
In step S207, according to query statement, inquired about from record alert database, and export Query Result.
Wherein, above-mentioned record alert database mainly supports the inquiry of processing state and the inquiry of statistical magnitude.The processing shape State, refer to the current state of a certain fault message, such as:It has been restrained that, be successfully recovered, having recovered unsuccessfully etc..The statistical magnitude, it is Refer to certain a period of time, in a certain subregion or the fault message of a certain type quantity.
The restoration methods of failure provided in an embodiment of the present invention, matched by default alarm matched rule for fault message Corresponding alarm parameter and troubleshooting process, and convergence process is carried out to alarm parameter, it is not necessary to for every kind of specific failure Carry out it is separately encoded, therefore with compatibility it is strong the advantages of, while configure and recover handling process it is simple, therefore run when account for Have that system resource is few, and treatment effeciency is high;Further, it is also possible to fault recovery result is stored in record alert database, so as to right Warning information is tracked, and then optimization process flow.
Fig. 6 is the technology side schematic diagram that user-defined alarm provided in an embodiment of the present invention is set.User 61 can be with Man-machine interaction is carried out to warning system 62 and the recovery device of failure 63, and all kinds of fault messages are handled.
The warning system 62, including:Alarm configuration module 621, alarm generation module 622, alarm memory module 623, The application interface (Application interface, API) 624 pulled with alarm.
Wherein, the alarm configuration module 621, for configuring the type of warning information, such as:Abnormality alarming and health are pre- It is alert.
Generation module 622 is alerted, alarm configuration module 621 is connected to, for producing fault message in the process of running.
Alarm memory module 623, alarm generation module 622 is connected to, in failure described in the memory storage of warning system 62 Information.
The application interface 624 pulled is alerted, alarm memory module 623 is connected to, for being read from alarm memory module 623 Fault message is taken, and is sent to the recovery device 63 of failure, to support the acquisition of fault message in the embodiment of the present invention.
The recovery device 63 of the failure includes:Back-end logic 631, database 632 and user interface 633.
Wherein, the back-end logic 631, including:Alert acquisition module 6311, alarm matching module 6312, alarm convergence 6,314 4 parts of module 6313 and fault processing module, for details, reference can be made to Fig. 1 or Fig. 2.
Database 632, is connected to back-end logic 631, and the database 632 includes:Configuration storage module 6321 and processing Logging modle 6322.Wherein, the configuration storage module 6321 is used for the configuration for storing user.The processing logging modle 6322 For storing the processing procedure of warning information.
User interface 633, is connected to database 632, and the user interface 633 includes:Handling process configuration module 6331, Associated alarm and process flow module 6332, processing state-query module 6333 and statistics enquiry module 6334, to provide Corresponding man-machine interface.Wherein, handling process configuration module 6331, logic is handled for dispensing unit, and generates corresponding set Shape combination process, for details, reference can be made to Fig. 4.The associated alarm and process flow module 6332, matched rule is alerted for configuring, And specified alarming processing flow, it for details, reference can be made to Fig. 3.State-query module 6333 is handled, for being carried out based on database 632 The inquiry of processing state.Wherein, the processing state, the current state of a certain fault message is referred to, such as:Restrain, recovered Success, recover unsuccessfully etc..The statistics enquiry module 6334, for carrying out statistical magnitude based on the database 632 Inquiry.Wherein, the statistical magnitude, refer to certain a period of time, in a certain subregion or the fault message of a certain type quantity.
Referring to Fig. 7, Fig. 7 is the module diagram of the recovery device of failure provided in an embodiment of the present invention.The failure Recovery device, on the server that can be applied to there is operation maintenance correlation function.
The recovery device 700 of the failure, including:Acquisition module 71, matching module 72, convergence module 73 and recovery mould Block 74.
Acquisition module 71, for obtaining fault message.The fault message includes but is not limited to:Abnormality alarming and health are pre- It is alert.
Matching module 72, acquisition module 71 is connected to, for being the fault message according to default alarm matched rule Alarm parameter corresponding to matching and troubleshooting process.
Referring to Fig. 3, it show the user configuration interface schematic diagram of alarm matching.Wherein, the alarm matched rule, bag Include:Alarm parameter 310 and troubleshooting process 320 corresponding to fault message, wherein, the alarm parameter 310 and the event Hinder handling process 320 and support that user's is self-defined, so that the attribute to fault message and relative strategy are adjusted.
The alarm parameter 310, " self-healing scene " can be defined or be shown as in practical operation interface, that is, need to recover Scene in corresponding all kinds of parameters.The alarm parameter 310 includes:Fault type 31, failure thing corresponding to fault message Part numbering 32, word description 33, environment attribute 34, business subregion 35 and business module 36.
The troubleshooting process 320, " self-healing processing " can be defined or is shown as in user configuration interface, for carrying For the relevant options of failure flow processing, carried out for user self-defined.The troubleshooting process 320 includes:Tree-like failure Handling process 37, notify 38 and enable control 39.
Specifically, control 39 is enabled, is used to determine whether to enable corresponding troubleshooting process.
It is described to notify 38, including:The triggering of notice, the channel of notice and notifier.Specifically, the notice is tactile Give out a contract for a project and include:Triggered when triggering and fail when being triggered during beginning, be successful.The channel of the notice, including:Short message, mail, social activity Software (such as wechat, Tengxun lead to RTX) and phone.The notifier includes:Responsible person concerned (such as main frame director) and outer Enclose person liable's (additional notifications people can be added).
The tree-like troubleshooting process 37, can further regard to Fig. 4, show the interface of tree-like troubleshooting process Schematic diagram.Wherein, the tree-like troubleshooting process 37, including:One root node 371, multiple child nodes 372 and its association Relation 373.Wherein, the child node 372 still further comprises:Leaf node 3720, it is meant that corresponding troubleshooting process Terminate.
Wherein, each node (371,372 or 3720) includes corresponding recovery parameter instruction, in triggering institute When stating node, failure is recovered according to the recovery instruction.It is understood that each tree-like troubleshooting process 37 is right Answer unique root node 371.This step can provide multiple triggering nodes, be selected for user.Wherein triggering node passes through Set label or word to be shown, in figure by taking " quick " as an example, represent the node for triggering node 371 and 374.In addition, also Multiple tree-like troubleshooting processes can be combined, such as:It regard the root node of a tree-like failure flow as another The child node of tree-like troubleshooting process.
Module 73 is restrained, is connected to matching module 72, for carrying out convergence process to the alarm parameter, to generate convergence Alarm parameter.
It is understood that when receiving an alarm parameter, subsequent step can be directly carried out.When receiving multiple alarms During parameter, judge whether the multiple alarm parameter is associated with same event, and when being associated with same event, to the multiple Alarm parameter is restrained, the alarm parameter after being restrained with generation.
Recovery module 74, convergence module 73 is connected to, for being performed according to the convergent alarm parameter at the failure Flow is managed, to carry out fault recovery.
Wherein, recovery module 74 includes:Logic unit 741 and parameters unit 742.
Specifically, logic unit 741, for obtaining the service logic of the troubleshooting process.
Parameters unit 742, for according to the convergent alarm parameter, performing the service logic of the troubleshooting, its In, the service logic includes:Customized ESB or the corresponding work(of operating system are called by procotol Can, the corresponding function includes:Restart function, transfer function, and/or script perform function.
Wherein, if being successfully recovered, alarming processing list is generated, and the processing procedure is maintained at the alarming processing list On, and the alarming processing list is preserved into record alert database.If recovering failure or time-out, the warning information is carried out Tracking, troubleshooting process described in one-step optimization of going forward side by side.
The recovery device of failure provided in an embodiment of the present invention, matched by default alarm matched rule for fault message Corresponding alarm parameter and troubleshooting process, and convergence process is carried out to alarm parameter, it is not necessary to for every kind of specific failure Carry out it is separately encoded, therefore with compatibility it is strong the advantages of, while configure and recover handling process it is simple, failure can be improved The treatment effeciency of recovery.
Referring to Fig. 8, Fig. 8 is another module diagram of the recovery device of failure provided in an embodiment of the present invention.It is described The recovery device of failure, on the server that can be applied to there is operation maintenance correlation function.
The recovery device 800 of the failure, including:Configuration module 81, acquisition module 71, matching module 72, convergence module 73 and recovery module 74.
Configuration module 81, for configuring tree-like troubleshooting process.
Wherein, the configuration module 81 includes:Root dispensing unit 811, sub- dispensing unit 812 and tree dispensing unit 813, Unit 814 is marked with triggering.Please refer to Fig. 4, the interface schematic diagram of shown tree-like troubleshooting process.
Specifically, described dispensing unit 811, node is triggered for setting, as root node 371.
It is understood that the corresponding unique root node 371 of each tree-like troubleshooting process 37.
The recovery device 800 of the failure can provide multiple triggering nodes, so that root dispensing unit 811 is selected. When multiple triggering nodes be present, triggering mark unit 814, for obtaining multiple triggering nodes, and entered by label or word Rower is noted, and in figure by taking " quick " as an example, is shown with triggering node 371 and triggering node 374;Described dispensing unit 811 is also used In from it is the multiple triggering node in selection one triggering node as unique root node.
Sub- dispensing unit 812, is connected to root dispensing unit 811, for since the root node, setting gradually and working as prosthomere The recovery instruction of point, it is described to recover instruction for carrying out fault recovery to fault message, and according to processing logic, will be successfully recovered Association option with recovery failure respectively as present node, and it is generated as the child node 372 of present node.
It is understood that it is described tree-like, binary tree is typically referred to, i.e.,:Each node decomposable asymmetric choice net go out a left subtree and One right subtree, the processing logic for being successfully recovered and recovering failure is corresponded to respectively.
Further, please set successively since the root node 371 in combination with Fig. 4, the sub- dispensing unit 812 The recovery instruction of present node is put, it is described to recover instruction for carrying out fault recovery to fault message;Then, patrolled according to processing Volume, it will be successfully recovered and recover failure as the association option of present node, and be generated as the child node of present node 372;And record the incidence relation 373 of the present node and the father node of the present node.
Dispensing unit 813 is set, for according to the root node 371, the child node 372 and incidence relation 373, generation Tree-like troubleshooting process 37.
Wherein, the tree dispensing unit, it is additionally operable to be combined multiple tree-like troubleshooting processes.I.e.:When including more During individual triggering node, tree-like logic process flow is carried out to the root node, non-selected triggering node and the child node Combination.It can be understood as:When multiple triggering nodes be present, the root node using a tree-like failure flow is tree-like as another The child node of troubleshooting process, reference can be made to node 374.
Wherein, the child node 372 still further comprises:Leaf node 3720, it is meant that corresponding troubleshooting process Terminate.
Acquisition module 82, for obtaining fault message.
Wherein, acquisition module 82 can obtain fault message from the warning system of association.The fault message include but It is not limited to:Abnormality alarming and healthy early warning.
Matching module 83, acquisition module 82 is connected to, for being the fault message according to default alarm matched rule Alarm parameter corresponding to matching and troubleshooting process.Matching process refers to the tree shown in alarm matching and Fig. 4 shown in Fig. 3 Shape troubleshooting process.
Restrain module 84, be connected to matching module 83, for the alarm event according to corresponding to the alarm parameter whether Occur first, convergence process is carried out to the alarm parameter, to generate convergent alarm parameter.
Wherein, the convergence module 84 includes:Event judging unit 841, parameter transmitting element 842 and parameter convergence are single Member 843.
Specifically, event judging unit 841, for judging whether alarm event corresponding to alarm parameter is to go out first It is existing, wherein the alarm event corresponds at least one alarm parameter.For example, such as:When system in case of system halt, can receive respectively To multiple alarm parameters such as communication module, memory module, treatment progress, it can now be restrained, after one convergence of generation Alarm parameter.
Parameter transmitting element 842, if for occurring first, then join the alarm parameter as the alarm after convergence Number, and send to the troubleshooting process.
Parameter restrains unit 843, for if not occur first, then recording the alarm parameter in record alert database It is labeled as having restrained in the alarm event, and by the alarm parameter.
Recovery module 85, for performing the troubleshooting process according to the convergent alarm parameter, to carry out failure Recover.
The Failure Recovery Module 85 includes:Parameter processing unit 851, instruction process unit 852, recover processing unit 853rd, endpoint processing unit 854, leaf judging unit 855 and end processing unit 856.
Specifically, parameter processing unit 851, for the alarm parameter after the convergence to be sent into troubleshooting process Corresponding root node.
Instruction process unit 852, parameter processing unit 851 is connected to, referred to for obtaining corresponding to the root node Order.
Recover processing unit 853, be connected to instruction process unit 852, for recovering instruction to the failure according to described Information carries out fault recovery.
Endpoint processing unit 854, it is connected to and recovers processing unit 853, for according to fault recovery result, corresponding to acquisition Present node.Wherein described fault recovery result includes:It is successfully recovered or recovers failure.
Leaf judging unit 855, endpoint processing unit 854 is connected to, for judging whether present node is leaf node.
Terminate processing unit 856, be connected to leaf judging unit 855, for when present node is leaf node, terminating The fault recovery, and the process of the fault recovery is stored in record alert database.
The endpoint processing unit 854, is connected to leaf judging unit 855, and it is not leaf section to be additionally operable in present node During point, according to fault recovery result, the recovery instruction of present node is obtained, to carry out fault recovery.
Record alert database 86, recovery module 85 is connected to, for after the fault recovery is terminated, it is extensive to preserve the failure Multiple process.
Wherein, if being successfully recovered, alarming processing list is generated, and process will be successfully processed and be stored in the alarming processing list On, then the alarming processing list preserved into record alert database 86.
If recovering failure or time-out, the warning information is tracked, and failure or timeout treatment process are preserved Preserved on the alarming processing list, then by the alarming processing list into record alert database 86, mistake described in one-step optimization of going forward side by side Lose or time-out troubleshooting process.
Enquiry module 87, record alert database 86 is connected to, for according to query statement, exporting Query Result.
Wherein, above-mentioned record alert database 86 mainly supports the inquiry of processing state and the inquiry of statistical magnitude.The processing State, refer to the current state of a certain fault message, such as:It has been restrained that, be successfully recovered, having recovered unsuccessfully etc..The statistical magnitude, Refer to certain a period of time, in a certain subregion or the fault message of a certain type quantity.
The recovery device of failure provided in an embodiment of the present invention, matched by default alarm matched rule for fault message Corresponding alarm parameter and troubleshooting process, and convergence process is carried out to alarm parameter, it is not necessary to for every kind of specific failure Carry out it is separately encoded, therefore with compatibility it is strong the advantages of, while configure and recover handling process it is simple, therefore run when account for Have that system resource is few, and treatment effeciency is high;Further, it is also possible to fault recovery result is stored in record alert database, so as to right Warning information is tracked, and then optimization process flow.
Accordingly, the embodiment of the present invention also provides a kind of server, for showing the restoration methods and recovery device of failure Hardware environment schematic diagram.As shown in figure 9, server is used for the restoration methods or service chart 7-8 for performing the failure in Fig. 1-2 In failure recovery device.Server 900 includes:The processor of one or more than one processing core 901, one or one The memory 902 of individual above computer-readable recording medium, input block 903, short range wireless transmission (WiFi) module 904, The part such as display screen 905 and power supply 906, for performing the restoration methods of the failure and/or the recovery dress of operation troubles Put 907.
It will be understood by those skilled in the art that said structure does not form the restriction to terminal device 900, can include than Above-mentioned more or less parts, combine some parts or different parts arrangement.Wherein:
Specifically in the present embodiment, in terminal device 900, processor 901 can be according to following instruction, by one or one Executable file corresponding to the process of application program more than individual is loaded into memory 902, and is deposited by processor 901 to run The application program in memory 902 is stored up, it is as follows so as to realize various functions:A kind of restoration methods of failure, including:Obtain Fault message;According to default alarm matched rule, for alarm parameter and troubleshooting stream corresponding to fault message matching Journey;Convergence process is carried out to the alarm parameter, to generate convergent alarm parameter;Performed according to the convergent alarm parameter The troubleshooting process, to carry out fault recovery.
Preferably, the processor 901 can be also used for:Triggering node is set, as root node;According to processing logic, The association option of the root node is obtained successively, to generate child node;Closed according to the root node, the child node and association System, generates tree-like troubleshooting process.
Preferably, the processor 901 can be also used for:Since the root node, the extensive of present node is set gradually Multiple instruction, it is described to recover instruction for carrying out fault recovery to fault message;According to processing logic, it will be successfully recovered and recover to lose The association option respectively as present node is lost, and is generated as the child node of present node;Record the present node with it is described The incidence relation of the father node of present node.
Preferably, the processor 901 can be also used for:Judge whether alarm event corresponding to alarm parameter is to go out first It is existing, wherein the alarm event corresponds at least one alarm parameter;If occurring first, then using the alarm parameter as receipts Alarm parameter after holding back, and send to the troubleshooting process;And/or if not occur first, then in record alert database The alarm parameter is recorded in the alarm event, and the alarm parameter is labeled as having restrained.
Preferably, the processor 901 can be also used for:Alarm parameter after convergence is sent to troubleshooting process pair The root node answered;Obtain and recover instruction corresponding to the root node;Recover instruction to fault message progress event according to described Barrier recovers;According to fault recovery result, the recovery instruction of child node corresponding to root node is obtained, to carry out fault recovery, wherein Fault recovery result includes:It is successfully recovered or recovers failure.
Preferably, the processor 901 can be also used for:Judge whether present node is leaf node;In present node When being leaf node, terminate the fault recovery;When present node is not leaf node, performs basis and be successfully recovered or recover Failure, the recovery instruction of child node corresponding to root node is obtained, the step of to carry out fault recovery.
Preferably, the processor 901 can be also used for:Obtain the service logic of the troubleshooting process;According to institute Convergent alarm parameter is stated, performs the service logic of the troubleshooting, wherein, the service logic includes:Assisted by network View calls customized ESB or the corresponding function of operating system, the corresponding function to include:Restart function, pass Transmission function, and/or script perform function.
Preferably, the processor 901 can be also used for:After terminating the fault recovery, by the mistake of the fault recovery Journey is stored in record alert database.
Server provided in an embodiment of the present invention, it is to be accused corresponding to fault message matching by default alarm matched rule Alert parameter and troubleshooting process, and convergence process is carried out to alarm parameter, with compatibility is strong, handling process is simple, occupies System resource is few and the advantages that treatment effeciency is high.
The server provided in an embodiment of the present invention, restoration methods and recovery device with the failure in foregoing embodiments Belong to same design.
It should be noted that for the restoration methods of failure of the present invention, those of ordinary skill in the art can manage Solution realizes all or part of flow in the embodiment of the present invention, is that can control the hardware of correlation by computer program come complete Into, the computer program can be stored in a computer read/write memory medium, be such as stored in the memory of terminal device, And by least one computing device in the terminal device, it may include in the process of implementation such as the reality of described information sharing method Apply the flow of example.Wherein, described storage medium can be magnetic disc, CD, read-only storage (ROM, Read Only Memory), Random access memory (RAM, Random Access Memory) etc..
For the recovery device of the failure of the embodiment of the present invention, its each functional module can be integrated in a processing In chip or modules are individually physically present, can also two or more modules be integrated in a module. Above-mentioned integrated module can both be realized in the form of hardware, can also be realized in the form of software function module.It is described If integrated module is realized in the form of software function module and as independent production marketing or in use, can also stored In a computer read/write memory medium, the storage medium is for example read-only storage, disk or CD etc..
Restoration methods, recovery device, storage medium and the service of a kind of failure provided above the embodiment of the present invention Device is described in detail, and specific case used herein is set forth to the principle and embodiment of the present invention, the above The explanation of embodiment is only intended to help the method and its core concept for understanding the present invention;Meanwhile for those skilled in the art Member, according to the thought of the present invention, there will be changes in specific embodiments and applications, in summary, this explanation Book content should not be construed as limiting the invention.

Claims (13)

  1. A kind of 1. restoration methods of failure, it is characterised in that including:
    Obtain fault message;
    According to default alarm matched rule, for alarm parameter and troubleshooting process corresponding to fault message matching;
    Convergence process is carried out to the alarm parameter, to generate convergent alarm parameter;And
    The troubleshooting process is performed according to the convergent alarm parameter, to carry out fault recovery.
  2. 2. the restoration methods of failure as claimed in claim 1, it is characterised in that obtain fault message, also include before:
    Triggering node is set, as root node;
    According to logic is handled, the association option of the root node is obtained successively, to generate child node;
    According to the root node, the child node and incidence relation, tree-like troubleshooting process is generated;
    Described is alarm parameter and troubleshooting process corresponding to fault message matching, including:For the fault message With corresponding alarm parameter and the tree-like troubleshooting process.
  3. 3. the restoration methods of failure as claimed in claim 2, it is characterised in that according to processing logic, obtain described successively The incidence relation of node, to generate child node, including:
    Since the root node, the recovery instruction of present node is set gradually, it is described to recover instruction for entering to fault message Row fault recovery;
    According to processing logic, it will be successfully recovered and recover failure as the association option of present node, and be generated as current The child node of node;And
    Record the incidence relation of the present node and the father node of the present node.
  4. 4. the restoration methods of failure as claimed in claim 1, it is characterised in that convergence process is carried out to the alarm parameter, To generate convergent alarm parameter, including:
    Judge whether alarm event corresponding to alarm parameter is to occur first, wherein the alarm event corresponds at least one alarm Parameter;
    If occurring first, then using the alarm parameter as the alarm parameter after convergence, and send to the troubleshooting stream Journey;And/or
    If not occurring first, then the alarm parameter is recorded in the alarm event in record alert database, and by institute Alarm parameter is stated to be labeled as having restrained.
  5. 5. the restoration methods of failure as claimed in claim 2, it is characterised in that perform institute according to the convergent alarm parameter Troubleshooting process is stated, to carry out fault recovery, including:
    Alarm parameter after the convergence is sent to root node corresponding to troubleshooting process;
    Obtain and recover instruction corresponding to the root node;
    Recover instruction to the preliminary fault recovery of fault message progress according to described;
    According to fault recovery result, the recovery instruction of child node corresponding to root node is obtained, to carry out successively fault recovery, wherein The fault recovery result includes:It is successfully recovered or recovers failure.
  6. 6. the restoration methods of failure as claimed in claim 5, it is characterised in that according to being successfully recovered or recovering failure, obtain The recovery instruction of child node corresponding to root node, to carry out fault recovery, also includes before:
    Judge whether present node is leaf node;And
    When present node is leaf node, terminate the fault recovery;
    When present node is not leaf node, performs according to being successfully recovered or recovering failure, obtain son section corresponding to root node The recovery instruction of point, the step of to carry out fault recovery.
  7. 7. the restoration methods of failure as claimed in claim 2, it is characterised in that according to the root node, the child node and Incidence relation, tree-like troubleshooting process is generated, in addition to:
    Multiple triggering nodes are obtained, and are labeled by label or word;
    One triggering node of selection is as root node from the multiple triggering node;
    According to processing logic, tree-like logical process stream is carried out to the root node, non-selected triggering node and the child node The combination of journey.
  8. A kind of 8. recovery device of failure, it is characterised in that including:
    Acquisition module, for obtaining fault message;
    Matching module, for being alarm parameter and event corresponding to fault message matching according to default alarm matched rule Hinder handling process;
    Module is restrained, for carrying out convergence process to the alarm parameter, to generate convergent alarm parameter;And
    Recovery module, for performing the troubleshooting process according to the convergent alarm parameter, to carry out fault recovery.
  9. 9. the recovery device of failure as claimed in claim 8, it is characterised in that also including configuration module, the configuration module Including:
    Root dispensing unit, node is triggered for setting, as root node;
    Sub- dispensing unit, the recovery for since the root node, setting gradually present node instruct, described to recover instruction use In carrying out fault recovery to fault message, and according to processing logic, it will be successfully recovered and recover unsuccessfully as present node Association option, and be generated as the child node of present node;
    Dispensing unit is set, for according to the root node, the child node and incidence relation, generating tree-like troubleshooting stream Journey;
    The matching module, it is additionally operable to as alarm parameter and the tree-like troubleshooting stream corresponding to fault message matching Journey.
  10. 10. the recovery device of failure as claimed in claim 8, it is characterised in that the convergence module includes:
    Event judging unit, for judging whether alarm event corresponding to alarm parameter is to occur first, wherein the alarm thing Part corresponds at least one alarm parameter;
    Parameter transmitting element, if for occurring first, then using the alarm parameter as the alarm parameter after convergence, and send To the troubleshooting process;And
    Parameter restrains unit, for if not occur first, being then recorded in the alarm parameter in record alert database described In alarm event, and the alarm parameter is labeled as having restrained.
  11. 11. the restoration methods of failure as claimed in claim 9, it is characterised in that the Failure Recovery Module includes:
    Parameter processing unit, for the alarm parameter after the convergence to be sent into root node corresponding to troubleshooting process;
    Instruction process unit, recover instruction corresponding to the root node for obtaining;
    Recover processing unit, for recovering instruction to the preliminary fault recovery of fault message progress according to described;
    Endpoint processing unit, for according to fault recovery result, present node corresponding to acquisition, wherein the fault recovery result Including:It is successfully recovered or recovers failure;
    Leaf judging unit, for judging whether present node is leaf node;
    Terminate processing unit, for when present node is leaf node, terminating the fault recovery, and by the fault recovery Process be stored in record alert database;
    The endpoint processing unit, it is additionally operable to when present node is not leaf node, according to fault recovery result, obtains current The recovery instruction of node, to carry out successively fault recovery.
  12. 12. the restoration methods of failure as claimed in claim 9, it is characterised in that in the configuration module:
    Also include:Triggering mark unit, for obtaining multiple triggering nodes, and is labeled by label or word;
    Described dispensing unit, it is additionally operable to the selection one from the multiple triggering node and triggers node as root node;
    The tree dispensing unit, it is additionally operable to according to processing logic, to the root node, non-selected triggering node and the son Node carries out the combination of tree-like logic process flow.
  13. 13. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the computer program When being executed by processor, the restoration methods of the failure any one of claim 1 to 7 are realized.
CN201710691358.7A 2017-08-14 2017-08-14 Failure recovery method, recovery device and storage medium Active CN107562556B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710691358.7A CN107562556B (en) 2017-08-14 2017-08-14 Failure recovery method, recovery device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710691358.7A CN107562556B (en) 2017-08-14 2017-08-14 Failure recovery method, recovery device and storage medium

Publications (2)

Publication Number Publication Date
CN107562556A true CN107562556A (en) 2018-01-09
CN107562556B CN107562556B (en) 2020-05-12

Family

ID=60974496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710691358.7A Active CN107562556B (en) 2017-08-14 2017-08-14 Failure recovery method, recovery device and storage medium

Country Status (1)

Country Link
CN (1) CN107562556B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144829A (en) * 2018-08-30 2019-01-04 深圳供电局有限公司 Fault processing method and device, computer equipment and storage medium
CN109558300A (en) * 2018-11-29 2019-04-02 郑州云海信息技术有限公司 A kind of whole machine cabinet alert processing method, device, terminal and storage medium
CN111092758A (en) * 2019-12-06 2020-05-01 上海上讯信息技术股份有限公司 Method and device for reducing alarm and recovering false alarm and electronic equipment
CN111722986A (en) * 2020-07-24 2020-09-29 杭州迪普科技股份有限公司 Software performance monitoring method and device
CN111953541A (en) * 2020-08-10 2020-11-17 腾讯科技(深圳)有限公司 Alarm information processing method and device, computer equipment and storage medium
CN112148463A (en) * 2020-10-23 2020-12-29 新华三大数据技术有限公司 Business process control method and device
CN112612929A (en) * 2020-12-29 2021-04-06 珠海金山网络游戏科技有限公司 Data processing method and device
CN112650642A (en) * 2020-12-07 2021-04-13 深圳前海微众银行股份有限公司 Alarm processing method and device, equipment and storage medium
CN113381874A (en) * 2020-03-10 2021-09-10 上海杰之能软件科技有限公司 Fault signal processing method, storage medium and terminal
CN114826877A (en) * 2022-02-24 2022-07-29 苏州浪潮智能科技有限公司 Asset alarm processing method and device, computer equipment and storage medium
CN115328094A (en) * 2022-08-27 2022-11-11 南京芯传汇电子科技有限公司 Redundancy fault recovery method and system for redundancy remote control terminal

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101562340A (en) * 2009-06-04 2009-10-21 湖南省电力公司调度通信局 Method for solving critical approximate power flow after failure of electric system
CN103200027A (en) * 2013-03-01 2013-07-10 中国工商银行股份有限公司 Method, device and system for locating network failure
CN106528723A (en) * 2016-10-27 2017-03-22 重庆大学 Fault tree-based numerical control machine tool fault removal scheme judgment indication method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101562340A (en) * 2009-06-04 2009-10-21 湖南省电力公司调度通信局 Method for solving critical approximate power flow after failure of electric system
CN103200027A (en) * 2013-03-01 2013-07-10 中国工商银行股份有限公司 Method, device and system for locating network failure
CN106528723A (en) * 2016-10-27 2017-03-22 重庆大学 Fault tree-based numerical control machine tool fault removal scheme judgment indication method

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144829A (en) * 2018-08-30 2019-01-04 深圳供电局有限公司 Fault processing method and device, computer equipment and storage medium
CN109144829B (en) * 2018-08-30 2022-03-22 深圳供电局有限公司 Fault processing method and device, computer equipment and storage medium
CN109558300A (en) * 2018-11-29 2019-04-02 郑州云海信息技术有限公司 A kind of whole machine cabinet alert processing method, device, terminal and storage medium
CN109558300B (en) * 2018-11-29 2023-01-06 郑州云海信息技术有限公司 Whole cabinet alarm processing method and device, terminal and storage medium
CN111092758A (en) * 2019-12-06 2020-05-01 上海上讯信息技术股份有限公司 Method and device for reducing alarm and recovering false alarm and electronic equipment
CN113381874A (en) * 2020-03-10 2021-09-10 上海杰之能软件科技有限公司 Fault signal processing method, storage medium and terminal
CN111722986A (en) * 2020-07-24 2020-09-29 杭州迪普科技股份有限公司 Software performance monitoring method and device
CN111953541A (en) * 2020-08-10 2020-11-17 腾讯科技(深圳)有限公司 Alarm information processing method and device, computer equipment and storage medium
CN111953541B (en) * 2020-08-10 2023-12-05 腾讯科技(深圳)有限公司 Alarm information processing method, device, computer equipment and storage medium
CN112148463A (en) * 2020-10-23 2020-12-29 新华三大数据技术有限公司 Business process control method and device
CN112148463B (en) * 2020-10-23 2023-07-21 新华三大数据技术有限公司 Business process control method and device
CN112650642A (en) * 2020-12-07 2021-04-13 深圳前海微众银行股份有限公司 Alarm processing method and device, equipment and storage medium
CN112612929A (en) * 2020-12-29 2021-04-06 珠海金山网络游戏科技有限公司 Data processing method and device
CN114826877A (en) * 2022-02-24 2022-07-29 苏州浪潮智能科技有限公司 Asset alarm processing method and device, computer equipment and storage medium
CN114826877B (en) * 2022-02-24 2023-07-14 苏州浪潮智能科技有限公司 Asset alarm processing method, device, computer equipment and storage medium
CN115328094A (en) * 2022-08-27 2022-11-11 南京芯传汇电子科技有限公司 Redundancy fault recovery method and system for redundancy remote control terminal

Also Published As

Publication number Publication date
CN107562556B (en) 2020-05-12

Similar Documents

Publication Publication Date Title
CN107562556A (en) Restoration methods, recovery device and the storage medium of failure
US12086150B2 (en) Generating files for visualizing query results
CN103605722B (en) Database monitoring method and device, equipment
CN108462750A (en) Distribution calls method for tracing, operation system, monitoring system and storage medium
US8149725B2 (en) Methods, systems, and computer program products for a hierarchical, redundant OAM&P architecture for use in an IP multimedia subsystem (IMS) network
US9929998B1 (en) Tagged messages to facilitate administration of a virtualization infrastructure
CN108289034B (en) A kind of fault discovery method and apparatus
WO2019223062A1 (en) Method and system for processing system exceptions
JP2005251191A (en) Method and system for troubleshooting misconfiguration of computer system based on configuration of other computer system
CN103716356B (en) Storing process operating method, device and system based on web
CN109800098A (en) Service exception node positioning method, device, computer equipment and storage medium
US20130227568A1 (en) Systems and methods involving virtual machine host isolation over a network
US20150095432A1 (en) Graphing relative health of virtualization servers
US20200076707A1 (en) Autonomic or AI-assisted validation, decision making, troubleshooting and/or performance enhancement within a telecommunications network
CN112799741A (en) Application program differentiation method and device, electronic equipment and storage medium
CN106484459A (en) It is applied to flow control method and the device of JavaScript
US11178080B2 (en) Mobile dashboard for automated contact center testing
JP5649840B2 (en) SIP servlet application cohosting
CN103561089B (en) Virtual machine desktop log-in, Apparatus and system
CN108039956A (en) Using monitoring method, system and computer-readable recording medium
US9329960B2 (en) Methods, systems, and computer readable media for utilizing abstracted user-defined data to conduct network protocol testing
CN106713014B (en) Monitored host in monitoring system, monitoring system and monitoring method
CN106445479A (en) Information pushing method and apparatus
US11057320B2 (en) Operation for multiple chat bots operation in organization
CN116009985A (en) Interface calling method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant