CN105988886A - Fault processing method and device in operation and maintenance process - Google Patents

Fault processing method and device in operation and maintenance process Download PDF

Info

Publication number
CN105988886A
CN105988886A CN201510192122.XA CN201510192122A CN105988886A CN 105988886 A CN105988886 A CN 105988886A CN 201510192122 A CN201510192122 A CN 201510192122A CN 105988886 A CN105988886 A CN 105988886A
Authority
CN
China
Prior art keywords
configuration item
event list
phenomenon
failure
emergency scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510192122.XA
Other languages
Chinese (zh)
Other versions
CN105988886B (en
Inventor
王中军
覃非
陈根
徐鸣亮
柴亚东
何剑华
吴正中
赵樑
施锦玲
徐景良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN201510192122.XA priority Critical patent/CN105988886B/en
Publication of CN105988886A publication Critical patent/CN105988886A/en
Application granted granted Critical
Publication of CN105988886B publication Critical patent/CN105988886B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The embodiment of the invention discloses a fault processing method and device in an operation and maintenance process. The embodiment of the invention comprises the following steps: after a first fault phenomenon is monitored, determining a first configuration item corresponding to the first fault phenomenon; according to a preset relationship between configuration items, calculating the associated configuration item of the first configuration item; then, determining whether an unprocessed event list contains the first configuration item and the associated configuration item of the first configuration item or not, generating a first event list of the first configuration item if the unprocessed event list does not contain the first configuration item and the associated configuration item of the first configuration item so as to reduce an amount of the generated event lists and save resource expenditures, wherein the first event list comprises the first configuration item and the associated configuration item of the first configuration item; and finally, carrying out targeted processing on the generated first event list to eliminate faults. Since the amount of the generated event lists is reduced, the fault can be processed timely through the event lists, and fault solving efficiency is improved.

Description

Fault handling method during a kind of O&M and device
Technical field
The present invention relates to IT (Information Technology, information technology) system operation maintenance technology field, particularly relate to the fault handling method during a kind of O&M and device.
Background technology
IT is the general name for managing and process the various technology that information is used.Its main applied computer science and communication technology design, develop, install and implementation information system and application software.IT operation management system is the general designation of a series of IT management product, and the product function that it is comprised is powerful, easy of use, solution is complete, one-stop can meet the various IT regulatory requirements of user.IT operation management system has stable performance, user interface enforcement friendly, cross-platform, easy, the feature such as easy of integration, can greatly simplify IT facility and the monitoring management of operation system, the IT efficiency of management of raising user, it is ensured that the network equipment and the operation system of user are properly functioning.
Effective IT operational system should be able to be fully understood by the behaviour in service of service resources, has found that it is likely that the hidden danger causing the system failure in time, is to realize the key that system operation ensures.At present, increasing client is in the scheme considering or adopting service set, but after operation system is concentrated, IT operation management system maintains the huge network equipment, main frame group, application group, for so many hardware device, not only the working strength of operation maintenance increases, and the system administration of concentration can be made to become more numerous and diverse.After operation system is concentrated, the alarm problem how processing bulky equipment group occurs the most therewith.In existing operational system, operation maintenance personnel processes alarm in time for convenience, can generate an event list for each alarm, so that it is guaranteed that each alarm has the person liable of correspondence, and facilitates monitoring alarm disposition.Owing to alarm is more, and then the event list according to alarm generation is the most more.Various event list not only consumes resource, and deals with and more bother, and treatment effeciency is relatively low.
Summary of the invention
The embodiment of the present invention provides the fault handling method during a kind of O&M and device, various in order to the event list solving to generate during IT O&M in prior art, technical problem inefficient when causing operation maintenance personnel handling failure.
The embodiment of the present invention provides the fault handling method during a kind of O&M, including:
After monitoring Fisrt fault phenomenon, determine the first configuration item that described Fisrt fault phenomenon is corresponding;
According to the relation between each configuration item pre-set, calculate the associated configuration item of described first configuration item;Determine and whether untreated event list comprises described first configuration item and the associated configuration item of described first configuration item, if nothing, then generate the first event list of described first configuration item;Described first event list includes described first configuration item and the associated configuration item of described first configuration item;
Described first event list is processed.
After monitoring Fisrt fault phenomenon described in it is preferred that, after determining the first configuration item that described Fisrt fault phenomenon is corresponding, also include:
Inquire about existing failure logging table;Every failure logging in described failure logging table at least includes configuration item, the phenomenon of the failure corresponding with described configuration item, the time of origin of described phenomenon of the failure;
Judge whether described failure logging table exists the record of described first configuration item and described Fisrt fault phenomenon, if nothing, then the time of origin of described first configuration item, described Fisrt fault phenomenon and this Fisrt fault phenomenon is stored in described failure logging table as a new record;If having, then the time of origin of Fisrt fault phenomenon existing in described failure logging table is updated to the time of origin of this Fisrt fault phenomenon.
It is preferred that described, described first event list is processed, including:
According to described first event list, obtain the emergency scene of the associated configuration item of described first configuration item and described first configuration item;Wherein, each emergency scene includes configuration item and one or more phenomenon of the failure corresponding to described configuration item;
The phenomenon of the failure that in acquisition the first setting time range, the associated configuration item of described first configuration item and described first configuration item occurs from described failure logging table, as Trouble Match parameter;
According to the phenomenon of the failure that described first configuration item is corresponding with each emergency scene of the associated configuration item of described first configuration item, calculate the matching degree of described Trouble Match parameter and each emergency scene;
Using emergency scene corresponding for the matching degree of satisfied setting threshold value as target emergency scene, perform described target emergency scene corresponding fault resolution policy to process described first event list.
It is preferred that after the first event list of described first configuration item of described generation, also include:
Obtain the owner information that described first event single pair of is answered;
Described first event list synthesis voice messaging is reported to described person liable.
It is preferred that described execution described target emergency scene corresponding fault resolution policy is after processing described first event list, also include:
Execution result is fed back to described person liable;
The described phenomenon of the failure corresponding with each emergency scene of the associated configuration item of described first configuration item according to described first configuration item, after calculating the matching degree of described Trouble Match parameter and each emergency scene, also includes:
If described Trouble Match parameter is all unsatisfactory for setting threshold value with the matching degree of each emergency scene, then the matching degree of described Trouble Match parameter Yu each emergency scene is fed back to described person liable, so that described person liable processes described first event list according to described matching degree.
It is preferred that the described first event list of described process, also include:
If the time of origin of the second phenomenon of the failure of the second configuration item sets in time range with the time interval of the time of origin of the Fisrt fault phenomenon of the first configuration item in described first event list second in second event list, and the associated configuration item that the second configuration item is the first configuration item in described first event list in described second event list, then described second event list and described first event list are merged process.
The embodiment of the present invention provides the fault treating apparatus during a kind of O&M, including:
Monitoring module, after being used for monitoring Fisrt fault phenomenon, determines the first configuration item that described Fisrt fault phenomenon is corresponding;
Event list generation module, for according to the relation between each configuration item pre-set, calculating the associated configuration item of described first configuration item;Determine and whether untreated event list comprises described first configuration item and the associated configuration item of described first configuration item, if nothing, then generate the first event list of described first configuration item;Described first event list includes described first configuration item and the associated configuration item of described first configuration item;
Processing module, for processing described first event list.
It is preferred that described event list generation module is additionally operable to:
Inquire about existing failure logging table;Every failure logging in described failure logging table at least includes configuration item, the phenomenon of the failure corresponding with described configuration item, the time of origin of described phenomenon of the failure;
Judge whether described failure logging table exists the record of described first configuration item and described Fisrt fault phenomenon, if nothing, then the current time of origin of described first configuration item, described Fisrt fault phenomenon and described Fisrt fault is stored in described failure logging table as a new record;If having, then the time of origin of described Fisrt fault phenomenon existing in described failure logging table is updated to the current time of origin of described Fisrt fault phenomenon.
It is preferred that described processing module specifically for:
According to described first event list, obtain the emergency scene of the associated configuration item of described first configuration item and described first configuration item;Wherein, each emergency scene includes configuration item and one or more phenomenon of the failure corresponding to described configuration item;
What the associated configuration item of described first configuration item and described first configuration item in the first setting time range occurred phenomenon of the failure is obtained, as Trouble Match parameter from described failure logging table;
According to the phenomenon of the failure that described first configuration item is corresponding with each emergency scene of the associated configuration item of described first configuration item, calculate the matching degree of described Trouble Match parameter and each emergency scene;
Using emergency scene corresponding for the matching degree of satisfied setting threshold value as target emergency scene, perform described target emergency scene corresponding fault resolution policy to process described first event list.
It is preferred that described processing module is additionally operable to:
Obtain the owner information that described first event single pair of is answered;
Described first event list synthesis voice messaging is reported to described person liable.
It is preferred that described processing module is additionally operable to:
In the described corresponding task of target emergency scene of execution after solving described event list, execution result is fed back to described person liable;
If described Trouble Match parameter is all unsatisfactory for setting threshold value with the matching degree of each emergency scene, then the matching degree of described Trouble Match parameter Yu each emergency scene is fed back to described person liable, so that described person liable processes described first event list according to described matching degree.
It is preferred that described processing module is additionally operable to:
If the time of origin of the second phenomenon of the failure of the second configuration item sets in time range with the time interval of the time of origin of the Fisrt fault phenomenon of the first configuration item in described first event list second in second event list, and the associated configuration item that the second configuration item is the first configuration item in described first event list in described second event list, then described second event list and described first event list are merged process.
In the embodiment of the present invention, after monitoring Fisrt fault phenomenon, determine the first configuration item that Fisrt fault phenomenon is corresponding;And according to the relation between each configuration item pre-set, calculate the associated configuration item of the first configuration item;Then, it is determined that whether untreated event list comprises the associated configuration item of the first configuration item and the first configuration item, if nothing, then generate the first event list of the first configuration item, thus greatly reduce the quantity of the event list of generation, save resource overhead;In the embodiment of the present invention, by when breaking down phenomenon, obtain configuration item corresponding to phenomenon of the failure and associated configuration item;The event list of the configuration item whether having this phenomenon of the failure corresponding is inquired about in the event list not yet processed, if it has, the event list that the most regeneration is not new, it is ensured that do not submit the event list of repetition to;If the most untreated event list has the event list of the associated configuration item of configuration item corresponding to this phenomenon of the failure, because incidence relation between the two, without generating new event list, i.e. can solve this phenomenon of the failure by the event list of existing associated configuration item, further avoid the generation of unnecessary event list, the quantity of the event list owing to generating reduces, therefore carrys out handling failure much sooner by event list, improves the efficiency solving fault.
Accompanying drawing explanation
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, in describing embodiment below, the required accompanying drawing used is briefly introduced, apparently, accompanying drawing in describing below is only some embodiments of the present invention, from the point of view of those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
The schematic flow sheet corresponding to fault handling method during a kind of O&M that Fig. 1 provides for the embodiment of the present invention;
Fig. 2 is the model schematic that the embodiment of the present invention is assumed;
Fault treating apparatus schematic diagram during a kind of O&M that Fig. 3 provides for the embodiment of the present invention.
Detailed description of the invention
In order to make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing, the present invention is described in further detail, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments.Based on the embodiment in the present invention, all other embodiments that those of ordinary skill in the art are obtained under not making creative work premise, broadly fall into the scope of protection of the invention.
The schematic flow sheet corresponding to fault handling method during a kind of O&M that Fig. 1 provides for the embodiment of the present invention, the method includes:
Step 101, after monitoring Fisrt fault phenomenon, determines the first configuration item that described Fisrt fault phenomenon is corresponding;
Step 102, according to the relation between each configuration item pre-set, calculates the associated configuration item of described first configuration item;Determine and whether untreated event list comprises described first configuration item and the associated configuration item of described first configuration item, if nothing, then generate the first event list of described first configuration item;Described first event list includes described first configuration item and the associated configuration item of described first configuration item;
Step 103, processes described first event list.
In the embodiment of the present invention, for improving the treatment effeciency of event list further, in step 103, two or more untreated event lists can be merged process.
In case of two event lists are merged process, if the time of origin of the second phenomenon of the failure of the second configuration item sets in time range with the time interval of the time of origin of the Fisrt fault phenomenon of the first configuration item in described first event list second in second event list, and the associated configuration item that the second configuration item is the first configuration item in described first event list in described second event list, then described second event list and described first event list are merged process.Wherein, the second setting time range is that this area IT operation maintenance personnel is configured according to long-term O&M experience, for example, it is possible to be 1 minute.
During O&M, after a certain configuration item alerts, its associated configuration item can send alarm therewith, or the associated configuration item of this configuration item first sends alarm, then this configuration item sends alarm, for above-mentioned situation, after outputing different event lists, the time interval occurred due to alarm is nearer, again event list can be merged process, on the basis of the above-mentioned configuration item of solution is associated with the phenomenon of the failure of configuration item, improve the treatment effeciency of event list, save application resource.
In the embodiment of the present invention, the relation between each configuration item can be to affect relation, transitive relation and topological relation;Wherein, the described relation that affects includes direct relation, component relationship, load balancing relation, active and standby primary relation and active and standby standby relation;Described topological relation includes annexation and High Availabitity relation.These relations form huge configuration item network together with configuration item.For each relation, all there is the impact property algorithm of correspondence to calculate impact property.Specifically, for arbitrarily there is impact-direct relation, impact-component relationship or the configuration item of impact-active and standby primary relation, higher level's configuration item is and is affected configuration item, i.e. associated configuration item;For arbitrarily there is the configuration item of impact-load balancing relation, if the configuration item of all load balancing relations all breaks down below higher level, higher level's configuration item for being affected configuration item, i.e. associated configuration item;For arbitrarily there is impact-active and standby standby, transitive relation or the configuration item of topology-annexation, do not calculate impact property;For arbitrarily there is the configuration item of topology-High Availabitity relation, all break down if there are all configuration items of High Availabitity relation with a certain configuration item, then higher level's configuration item is for being affected configuration item, i.e. associated configuration item.By algorithm above, the convenient algorithm of breadth First of recursive algorithm and figure is utilized to obtain the configuration item that a certain configuration item all affects, the associated configuration item that the most a certain configuration item is whole.
Such as one, there is impact-direct relation in configuration item A and configuration item B, configuration item C, there is impact-direct relation in configuration item B and configuration item D, E, configuration item B and higher level's configuration item that configuration item C is configuration item A, configuration item D and higher level's configuration item that configuration item E is configuration item B, then when configuration item A breaks down, the associated configuration item of calculated configuration item A is respectively configuration item B, configuration item C, configuration item D and configuration item E.
Such as two, there is impact-load balancing relation in configuration item P, configuration item Q and configuration item R, configuration item R is higher level's configuration item of configuration item P and configuration item Q, there is impact-direct relation, configuration item H and higher level's configuration item that configuration item I is configuration item R between configuration item H, configuration item I and configuration item R.When configuration item P breaks down, judge that other configuration items (i.e. configuration item Q) that there is load balancing relation with configuration item R the most also break down, if configuration item Q breaks down, then the associated configuration item being calculated configuration item P is respectively configuration item R, configuration item H and configuration item I.
Further, in the embodiment of the present invention, after determining the first configuration item that described Fisrt fault phenomenon is corresponding in a step 101, also include:
Inquire about existing failure logging table;Every failure logging in described failure logging table at least includes configuration item, the phenomenon of the failure corresponding with described configuration item, the time of origin of described phenomenon of the failure;
Judge whether described failure logging table exists the record of described first configuration item and described Fisrt fault phenomenon, if nothing, then the time of origin of described first configuration item, described Fisrt fault phenomenon and this Fisrt fault phenomenon is stored in described failure logging table as a new record;If having, then the time of origin of Fisrt fault phenomenon existing in described failure logging table is updated to the time of origin of this Fisrt fault phenomenon.
A situation arises to record phenomenon of the failure by failure logging table in the embodiment of the present invention, facilitates inquiry, and selects corresponding fault resolution policy to provide foundation in non-subsequent step.Further, when failure logging table exists the record of the first configuration item and Fisrt fault phenomenon, a new record of not restoring, but update the time of origin of Fisrt fault phenomenon, effectively save storage resource, improve search efficiency.
In step 103, according to the first event list generated in step 102, obtain the emergency scene of the associated configuration item of described first configuration item and described first configuration item;Wherein, each emergency scene includes configuration item and one or more phenomenon of the failure corresponding to described configuration item;
The phenomenon of the failure that in acquisition the first setting time range, the associated configuration item of described first configuration item and described first configuration item occurs from above-mentioned failure logging table, as Trouble Match parameter;Wherein, the second setting time range is that this area IT operation maintenance personnel is configured according to long-term O&M experience.
According to the phenomenon of the failure that described first configuration item is corresponding with each emergency scene of the associated configuration item of described first configuration item, calculate the matching degree of described Trouble Match parameter and each emergency scene;
Using emergency scene corresponding for the matching degree of satisfied setting threshold value as target emergency scene, perform this target emergency scene corresponding fault resolution policy to process the first event list.Wherein, adequately taking measures to solve phenomenon of the failure, the setting threshold value in the embodiment of the present invention may be configured as 100%.
The embodiment of the present invention passes through said process, it is achieved that automatically searching and execution of fault resolution policy.For system occurs some faults the most frequently, this fault can be solved by automatically performing corresponding fault resolution policy, and without by manually processing, not only increase the efficiency solving fault, and save manpower, the energy decreasing personnel puts into.
Preferably, after generating the first event list of described first configuration item in a step 102, also include:
Obtain the owner information that described first event single pair of is answered;
Described first event list synthesis voice messaging is reported to described person liable.
In the embodiment of the present invention, after being successfully generated event list, can automatically trigger the function of Advise By Wire event list person liable according to the notification rule pre-set.Specifically, by interface interchange, the owner information (including name, telephone number etc.) corresponding to the first event list generated is obtained.Then, telephone number according to person liable is called from trend person liable, and the event handling rank of the first event list, event description etc. are carried out TTS (TextToSpeech, from text to language) phonetic synthesis, and after closing of the circuit voice broadcast to person liable.The access failure if first time automatic poking is made a phone call, then redial number of times according to pre-set, the most repeatedly attempt dialing.Meanwhile, record is called time, dial whether the information such as success and the duration of call so that inquiry.
In the embodiment of the present invention, perform the described corresponding task of target emergency scene to solve after described event list, execution result is fed back to described person liable, so that described responsibility is known this fault and is solved.
In the embodiment of the present invention, if Trouble Match parameter is all unsatisfactory for setting threshold value with the matching degree of each emergency scene, then the matching degree of described Trouble Match parameter Yu each emergency scene is fed back to described person liable, person liable, after receiving matching degree, can take corresponding strategy to process described first event list according to the emergency scene that matching degree is high.
In the embodiment of the present invention, after monitoring Fisrt fault phenomenon, determine the first configuration item that Fisrt fault phenomenon is corresponding, it is achieved that be quickly and accurately positioned out the configuration item of generation phenomenon of the failure;And according to the relation between each configuration item pre-set, calculate the associated configuration item of the first configuration item;Then, it is determined that whether untreated event list comprises the associated configuration item of the first configuration item and the first configuration item, if nothing, then generate the first event list of the first configuration item, thus greatly reduce the quantity of the event list of generation, save resource overhead;Wherein, the first event list includes the associated configuration item of the first configuration item and the first configuration item;Finally, processing targetedly to solve fault to the first event list generated, the quantity of the event list owing to generating reduces, therefore carrys out handling failure much sooner by event list, it is provided that solve the efficiency of fault.
In order to be more clearly understood that the present invention, below in conjunction with concrete scene, the generation process of event list in the embodiment of the present invention is described in detail.
During concrete IT O&M, when a certain key node configuration item breaks down, numerous associated configuration item can be caused abnormal.Fig. 2 is the model schematic assumed in the embodiment of the present invention, and this model includes two application APP-1, APP-2 on LPAR (logicpartition, logical partition) and LPAR.Wherein, LPAR correspondence configuration item X, corresponding phenomenon of the failure A, B;APP-1 correspondence configuration item Y, corresponding phenomenon of the failure C;APP-2 correspondence configuration item Z, corresponding phenomenon of the failure D.Wherein, configuration item Y and configuration item Z is higher level's configuration item of configuration item X, there is direct relation, do not have incidence relation between configuration item Y and configuration item Z between configuration item X and configuration item Y, Z.
Assuming that LPAR delays machine, the phenomenon of the failure occurred successively is C, A, B, D.
(1) phenomenon of the failure C occurs, inquires about failure logging table, it may be judged whether there is the record of U-C.Now judged result is not for exist, therefore inserts a new record.As shown in Table A 1:
Table A 1
Configuration item Phenomenon of the failure Time of origin
Y C T
Inquire about untreated event list, it may be judged whether there is the associated configuration item of configuration item Y or configuration item Y, now judged result is not for exist, therefore generate event list 1.Calculate the associated configuration item of configuration item Y, if having, then inserting in event list 1, if nothing, not filling out.As shown in table B1:
Table B1
Configuration item Event list ID Associated configuration item
Y Event list 1
(2) phenomenon of the failure A occurs, inquires about failure logging table, it may be judged whether there is the record of X-A.Now judged result is not for exist, therefore inserts a new record.As shown in Table A 2:
Table A 2
Configuration item Phenomenon of the failure Time of origin
Y C T
X A The T+1 second
Inquire about untreated event list, it may be judged whether there is the associated configuration item of configuration item X or configuration item X, now judged result is not for exist, therefore generate event list 2.Calculate the associated configuration item of configuration item X, obtain associated configuration item Y, Z of configuration item X.As shown in table B2:
Table B2
Configuration item Event list ID Associated configuration item
Y Event list 1
X Event list 2 Y、Z
(3) phenomenon of the failure B occurs, inquires about failure logging table, it may be judged whether there is the record of X-B.Now judged result is not for exist, therefore inserts a new record.As shown in Table A 3:
Table A 3
Configuration item Phenomenon of the failure Time of origin
Y C T
X A The T+1 second
X B The T+2 second
Inquire about untreated event list, it may be judged whether there is the associated configuration item of configuration item X or configuration item X, now judged result is for existing, therefore the event list that regeneration is not new.As shown in table B3:
Table B3
Configuration item Event list ID Associated configuration item
Y Event list 1
X Event list 2 Y、Z
(4) phenomenon of the failure D occurs, inquires about failure logging table, it may be judged whether there is the record of Z-D.Now judged result is not for exist, therefore inserts a new record.As shown in Table A 4:
Table A 4
Configuration item Phenomenon of the failure Time of origin
Y C T
X A The T+1 second
X B The T+2 second
Z D The T+3 second
Inquire about untreated event list, it may be judged whether there is the associated configuration item of configuration item Z or configuration item Z, now judged result is for existing, therefore the event list that regeneration is not new.As shown in table B4:
Table B4
Configuration item Event list ID Associated configuration item
Y Event list 1
X Event list 2 Y、Z
By said process, finally give event list 1 and event list 2.
In above-described embodiment, due in event list 1 phenomenon of the failure C of configuration item Y time of origin and in event list 2 time interval of the time of origin of phenomenon of the failure A of configuration item X less, and the associated configuration item that configuration item Y is configuration item X, therefore event list 1 and event list 2 are merged process.Due to the associated configuration item that the configuration item Y in event list 1 is the configuration item X in event list 2, when event list 2 is processed, can take into account the emergency scene of configuration item Y in event list 1, and then can determine that corresponding fault resolution policy solves the phenomenon of the failure of configuration item Y;And due in event list 1 the associated configuration item of configuration item Y be empty, when event list 1 is processed, only consider the emergency scene of configuration item Y, and cannot be in view of the emergency scene of configuration item X in event list 2, therefore when event list 1 is processed, it is impossible to solve the phenomenon of the failure of configuration item X in event list 2.Therefore, when event list 1 and event list 2 are merged process, event list 1 should be merged in event list 2, can only event list 2 be processed.
The emergency scene of associated configuration item Y, Z of configuration item X and configuration item X in acquisition event list 2.Assume that emergency scene O of configuration item X is set to phenomenon of the failure A and B occurs, perform Fisrt fault resolution policy, emergency scene P of configuration item X is set to only phenomenon of the failure A and occurs then to perform the second fault resolution policy, emergency scene Q of configuration item Y is set to phenomenon of the failure C and the 3rd fault resolution policy occurs then to perform, and emergency scene R of configuration item Z is set to phenomenon of the failure D and the 4th fault resolution policy occurs then to perform.
From above-mentioned failure logging table obtain first setting time range in configuration item X and configuration item X associated configuration item Y, Z occur phenomenon of the failure, as Trouble Match parameter;The phenomenon of the failure that each emergency scene of associated configuration item Y, Z according to configuration item X and configuration item X is corresponding, calculates the matching degree of Trouble Match parameter and each emergency scene, and concrete result of calculation is as shown in table P1.
Table P1
Configuration item Emergency scene Matching degree
X Emergency scene O 100%
X Emergency scene P 50%
Y Emergency scene Q 100%
Z Emergency scene R 100%
According to above-mentioned matching degree, event list 2 is processed by the fault resolution policy performing emergency scene O, emergency scene Q and emergency scene R.After being disposed, can event list 1 and event 2 be labeled as processed;Preferably, can deletion event list 1 and event 2, to save resource occupation space.
For said method flow process, the embodiment of the present invention also provides for the fault treating apparatus during a kind of O&M, and the particular content of this device is referred to said method to be implemented, and does not repeats them here.
Fault treating apparatus schematic diagram during a kind of O&M that Fig. 3 provides for the embodiment of the present invention, this device includes:
Monitoring module 301, after being used for monitoring Fisrt fault phenomenon, determines the first configuration item that described Fisrt fault phenomenon is corresponding;
Event list generation module 302, for according to the relation between each configuration item pre-set, calculating the associated configuration item of described first configuration item;Determine and whether untreated event list comprises described first configuration item and the associated configuration item of described first configuration item, if nothing, then generate the first event list of described first configuration item;Described first event list includes described first configuration item and the associated configuration item of described first configuration item;
Processing module 303, for processing described first event list.
It is preferred that described event list generation module 302 is additionally operable to:
Inquire about existing failure logging table;Every failure logging in described failure logging table at least includes configuration item, the phenomenon of the failure corresponding with described configuration item, the time of origin of described phenomenon of the failure;
Judge whether described failure logging table exists the record of described first configuration item and described Fisrt fault phenomenon, if nothing, then the current time of origin of described first configuration item, described Fisrt fault phenomenon and described Fisrt fault is stored in described failure logging table as a new record;If having, then the time of origin of described Fisrt fault phenomenon existing in described failure logging table is updated to the current time of origin of described Fisrt fault phenomenon.
It is preferred that described processing module 303 specifically for:
According to described first event list, obtain the emergency scene of the associated configuration item of described first configuration item and described first configuration item;Wherein, each emergency scene includes configuration item and one or more phenomenon of the failure corresponding to described configuration item;
What the associated configuration item of described first configuration item and described first configuration item in the first setting time range occurred phenomenon of the failure is obtained, as Trouble Match parameter from described failure logging table;
According to the phenomenon of the failure that described first configuration item is corresponding with each emergency scene of the associated configuration item of described first configuration item, calculate the matching degree of described Trouble Match parameter and each emergency scene;
Using emergency scene corresponding for the matching degree of satisfied setting threshold value as target emergency scene, perform described target emergency scene corresponding fault resolution policy to process described first event list.
It is preferred that described processing module 303 is additionally operable to:
Obtain the owner information that described first event single pair of is answered;
Described first event list synthesis voice messaging is reported to described person liable.
It is preferred that described processing module 303 is additionally operable to:
In the described corresponding task of target emergency scene of execution after solving described event list, execution result is fed back to described person liable;
If described Trouble Match parameter is all unsatisfactory for setting threshold value with the matching degree of each emergency scene, then the matching degree of described Trouble Match parameter Yu each emergency scene is fed back to described person liable, so that described person liable processes described first event list according to described matching degree.
It is preferred that described processing module 303 is additionally operable to:
If the time of origin of the second phenomenon of the failure of the second configuration item sets in time range with the time interval of the time of origin of the Fisrt fault phenomenon of the first configuration item in described first event list second in second event list, and the associated configuration item that the second configuration item is the first configuration item in described first event list in described second event list, then described second event list and described first event list are merged process.
It can be seen from the above: in the embodiment of the present invention, after monitoring Fisrt fault phenomenon, determines the first configuration item that Fisrt fault phenomenon is corresponding, it is achieved that be quickly and accurately positioned out the configuration item of generation phenomenon of the failure;And according to the relation between each configuration item pre-set, calculate the associated configuration item of the first configuration item;Then, it is determined that whether untreated event list comprises the associated configuration item of the first configuration item and the first configuration item, if nothing, then generate the first event list of the first configuration item, thus greatly reduce the quantity of the event list of generation, save resource overhead;Wherein, the first event list includes the associated configuration item of the first configuration item and the first configuration item;Finally, processing targetedly to solve fault to the first event list generated, the quantity of the event list owing to generating reduces, therefore carrys out handling failure much sooner by event list, it is provided that solve the efficiency of fault.
Those skilled in the art are it should be appreciated that embodiments of the invention can be provided as method or computer program.Therefore, the form of the embodiment in terms of the present invention can use complete hardware embodiment, complete software implementation or combine software and hardware.And, the present invention can use the form at one or more upper computer programs implemented of computer-usable storage medium (including but not limited to disk memory, CD-ROM, optical memory etc.) wherein including computer usable program code.
The present invention is to describe with reference to method, equipment (system) and the flow chart of computer program according to embodiments of the present invention and/or block diagram.It should be understood that can be by the flow process in each flow process in computer program instructions flowchart and/or block diagram and/or square frame and flow chart and/or block diagram and/or the combination of square frame.These computer program instructions can be provided to produce a machine to the processor of general purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device so that the instruction performed by the processor of computer or other programmable data processing device is produced for realizing the device of function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions may be alternatively stored in and can guide in the computer-readable memory that computer or other programmable data processing device work in a specific way, the instruction making to be stored in this computer-readable memory produces the manufacture including command device, and this command device realizes the function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions also can be loaded in computer or other programmable data processing device, make to perform sequence of operations step on computer or other programmable devices to produce computer implemented process, thus the instruction performed on computer or other programmable devices provides the step of the function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame for realization.
Although preferred embodiments of the present invention have been described, but those skilled in the art once know basic creative concept, then these embodiments can be made other change and amendment.So, claims are intended to be construed to include preferred embodiment and fall into all changes and the amendment of the scope of the invention.
Obviously, those skilled in the art can carry out various change and modification without departing from the spirit and scope of the present invention to the present invention.So, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.

Claims (12)

1. the fault handling method during an O&M, it is characterised in that including:
After monitoring Fisrt fault phenomenon, determine the first configuration item that described Fisrt fault phenomenon is corresponding;
According to the relation between each configuration item pre-set, calculate the associated configuration of described first configuration item ?;Determine and whether untreated event list comprises described first configuration item and the association of described first configuration item Configuration item, if nothing, then generates the first event list of described first configuration item;Described first event list includes Described first configuration item and the associated configuration item of described first configuration item;
Described first event list is processed.
2. the method for claim 1, it is characterised in that described in monitor Fisrt fault phenomenon after, After determining the first configuration item that described Fisrt fault phenomenon is corresponding, also include:
Inquire about existing failure logging table;Every failure logging in described failure logging table at least includes configuration , the phenomenon of the failure corresponding with described configuration item, the time of origin of described phenomenon of the failure;
Judge whether described failure logging table exists described first configuration item and described Fisrt fault phenomenon Record, if nothing, then by described first configuration item, described Fisrt fault phenomenon and this Fisrt fault phenomenon Time of origin be stored in described failure logging table as a new record;If having, then by described failure logging In table, the time of origin of existing Fisrt fault phenomenon is updated to the time of origin of this Fisrt fault phenomenon.
3. method as claimed in claim 2, it is characterised in that described described first event list is carried out Process, including:
According to described first event list, obtain described first configuration item and the associated configuration of described first configuration item The emergency scene of item;Wherein, each emergency scene includes configuration item and corresponding one or one of described configuration item Individual above phenomenon of the failure;
Described first configuration item and described the in the first setting time range is obtained from described failure logging table What the associated configuration item of one configuration item occurred phenomenon of the failure, as Trouble Match parameter;
According to described first configuration item and each emergency scene pair of the associated configuration item of described first configuration item The phenomenon of the failure answered, calculates the matching degree of described Trouble Match parameter and each emergency scene;
Using emergency scene corresponding for the matching degree of satisfied setting threshold value as target emergency scene, perform described mesh Mark emergency scene corresponding fault resolution policy is to process described first event list.
4. method as claimed in claim 3, it is characterised in that described first configuration item of described generation After first event list, also include:
Obtain the owner information that described first event single pair of is answered;
Described first event list synthesis voice messaging is reported to described person liable.
5. method as claimed in claim 4, it is characterised in that described execution described target emergency scene Corresponding fault resolution policy, after processing described first event list, also includes:
Execution result is fed back to described person liable;
Each emergent field of the described associated configuration item according to described first configuration item and described first configuration item The phenomenon of the failure that scape is corresponding, after calculating the matching degree of described Trouble Match parameter and each emergency scene, also Including:
If described Trouble Match parameter is all unsatisfactory for setting threshold value, then by institute with the matching degree of each emergency scene The matching degree stating Trouble Match parameter and each emergency scene feeds back to described person liable, so that described person liable Described first event list is processed according to described matching degree.
6. the method as according to any one of claim 1-5, it is characterised in that described process described One event list, also includes:
If the time of origin of the second phenomenon of the failure of the second configuration item and described first event in second event list In list, the time interval of the time of origin of the Fisrt fault phenomenon of the first configuration item sets time range second In, and the pass that the second configuration item in described second event list is the first configuration item in described first event list Connection configuration item, then merge process by described second event list and described first event list.
7. the fault treating apparatus during an O&M, it is characterised in that including:
Monitoring module, after being used for monitoring Fisrt fault phenomenon, determine that described Fisrt fault phenomenon is corresponding One configuration item;
Event list generation module, for according to the relation between each configuration item pre-set, calculates described The associated configuration item of the first configuration item;Determine whether untreated event list comprises described first configuration item and The associated configuration item of described first configuration item, if nothing, then generates the first event list of described first configuration item; Described first event list includes described first configuration item and the associated configuration item of described first configuration item;
Processing module, for processing described first event list.
8. device as claimed in claim 7, it is characterised in that described event list generation module is additionally operable to:
Inquire about existing failure logging table;Every failure logging in described failure logging table at least includes configuration , the phenomenon of the failure corresponding with described configuration item, the time of origin of described phenomenon of the failure;
Judge whether described failure logging table exists described first configuration item and described Fisrt fault phenomenon Record, if nothing, then working as described first configuration item, described Fisrt fault phenomenon and described Fisrt fault Front time of origin is stored in described failure logging table as a new record;If having, then by described failure logging In table, the time of origin of existing described Fisrt fault phenomenon is updated to current of described Fisrt fault phenomenon The raw time.
9. device as claimed in claim 8, it is characterised in that described processing module specifically for:
According to described first event list, obtain described first configuration item and the associated configuration of described first configuration item The emergency scene of item;Wherein, each emergency scene includes configuration item and corresponding one or one of described configuration item Individual above phenomenon of the failure;
Described first configuration item and described the in the first setting time range is obtained from described failure logging table What the associated configuration item of one configuration item occurred phenomenon of the failure, as Trouble Match parameter;
According to described first configuration item and each emergency scene pair of the associated configuration item of described first configuration item The phenomenon of the failure answered, calculates the matching degree of described Trouble Match parameter and each emergency scene;
Using emergency scene corresponding for the matching degree of satisfied setting threshold value as target emergency scene, perform described mesh Mark emergency scene corresponding fault resolution policy is to process described first event list.
10. device as claimed in claim 9, it is characterised in that described processing module is additionally operable to:
Obtain the owner information that described first event single pair of is answered;
Described first event list synthesis voice messaging is reported to described person liable.
11. devices as claimed in claim 10, it is characterised in that described processing module is additionally operable to:
In the described corresponding task of target emergency scene of execution after solving described event list, result will be performed Feed back to described person liable;
If described Trouble Match parameter is all unsatisfactory for setting threshold value, then by institute with the matching degree of each emergency scene The matching degree stating Trouble Match parameter and each emergency scene feeds back to described person liable, so that described person liable Described first event list is processed according to described matching degree.
12. devices as according to any one of claim 7-11, it is characterised in that described processing module is also For:
If the time of origin of the second phenomenon of the failure of the second configuration item and described first event in second event list In list, the time interval of the time of origin of the Fisrt fault phenomenon of the first configuration item sets time range second In, and the pass that the second configuration item in described second event list is the first configuration item in described first event list Connection configuration item, then merge process by described second event list and described first event list.
CN201510192122.XA 2015-04-21 2015-04-21 Fault handling method and device during a kind of O&M Active CN105988886B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510192122.XA CN105988886B (en) 2015-04-21 2015-04-21 Fault handling method and device during a kind of O&M

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510192122.XA CN105988886B (en) 2015-04-21 2015-04-21 Fault handling method and device during a kind of O&M

Publications (2)

Publication Number Publication Date
CN105988886A true CN105988886A (en) 2016-10-05
CN105988886B CN105988886B (en) 2018-03-16

Family

ID=57039582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510192122.XA Active CN105988886B (en) 2015-04-21 2015-04-21 Fault handling method and device during a kind of O&M

Country Status (1)

Country Link
CN (1) CN105988886B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885641A (en) * 2017-11-25 2018-04-06 山东浪潮云服务信息科技有限公司 A kind of operation management method and apparatus
CN108234196A (en) * 2017-12-12 2018-06-29 北京奇艺世纪科技有限公司 Fault detection method and device
CN110019916A (en) * 2018-08-17 2019-07-16 平安普惠企业管理有限公司 Event-handling method, device, equipment and storage medium based on user's portrait
CN110610598A (en) * 2019-08-08 2019-12-24 横琴善泊投资管理有限公司 Unmanned parking lot central control management system and method based on event mechanism
CN114513580A (en) * 2022-01-20 2022-05-17 广东职业技术学院 Electronic doorbell interaction method, device and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060010337A1 (en) * 2004-07-12 2006-01-12 Ntt Docomo, Inc. Management system and management method
CN102306355A (en) * 2011-09-19 2012-01-04 北京信城通数码科技有限公司 Management system for IT (Information Technology) operation and maintenance configuration
CN103455865A (en) * 2012-06-01 2013-12-18 成都勤智数码科技股份有限公司 Achieving method for integrated operation and maintenance
CN103455864A (en) * 2012-06-01 2013-12-18 成都勤智数码科技股份有限公司 Integrated operation and maintenance management platform based on uniform configuration library and knowledge base
CN103840955A (en) * 2012-11-22 2014-06-04 中国银联股份有限公司 Operation maintenance system of distributed IT system, and operation maintenance management method thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060010337A1 (en) * 2004-07-12 2006-01-12 Ntt Docomo, Inc. Management system and management method
CN102306355A (en) * 2011-09-19 2012-01-04 北京信城通数码科技有限公司 Management system for IT (Information Technology) operation and maintenance configuration
CN103455865A (en) * 2012-06-01 2013-12-18 成都勤智数码科技股份有限公司 Achieving method for integrated operation and maintenance
CN103455864A (en) * 2012-06-01 2013-12-18 成都勤智数码科技股份有限公司 Integrated operation and maintenance management platform based on uniform configuration library and knowledge base
CN103840955A (en) * 2012-11-22 2014-06-04 中国银联股份有限公司 Operation maintenance system of distributed IT system, and operation maintenance management method thereof

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885641A (en) * 2017-11-25 2018-04-06 山东浪潮云服务信息科技有限公司 A kind of operation management method and apparatus
CN108234196A (en) * 2017-12-12 2018-06-29 北京奇艺世纪科技有限公司 Fault detection method and device
CN110019916A (en) * 2018-08-17 2019-07-16 平安普惠企业管理有限公司 Event-handling method, device, equipment and storage medium based on user's portrait
CN110610598A (en) * 2019-08-08 2019-12-24 横琴善泊投资管理有限公司 Unmanned parking lot central control management system and method based on event mechanism
CN110610598B (en) * 2019-08-08 2021-02-26 善泊科技(珠海)有限公司 Unmanned parking lot central control management system and method based on event mechanism
CN114513580A (en) * 2022-01-20 2022-05-17 广东职业技术学院 Electronic doorbell interaction method, device and system

Also Published As

Publication number Publication date
CN105988886B (en) 2018-03-16

Similar Documents

Publication Publication Date Title
CN105988886A (en) Fault processing method and device in operation and maintenance process
CN104219316A (en) Method and device for processing call request in distributed system
Mennes et al. GRECO: A distributed genetic algorithm for reliable application placement in hybrid clouds
CN106155812A (en) Method, device, system and the electronic equipment of a kind of resource management to fictitious host computer
CN108415811B (en) Method and device for monitoring business logic
CN106888277A (en) A kind of domain name inquiry method and device
CN110969341A (en) Intelligent maintenance method, device and system for power distribution terminal
CN110855424B (en) Method and device for synthesizing asymmetric flow xDR in DPI field
CN110072251B (en) Method and device for analyzing user communication behavior and managing user
CN109992392A (en) A kind of calculation resource disposition method, device and Resource Server
CN104410511A (en) Server management method and system
CN105447384B (en) A kind of anti-method monitored, system and mobile terminal
CN108154343B (en) Emergency processing method and system for enterprise-level information system
CN111897643A (en) Thread pool configuration system, method, device and storage medium
CN107425994B (en) Method, terminal and server for realizing remote parameter management
CN104735134B (en) A kind of method and apparatus serviced for providing calculating
CN114744686B (en) Generator set identification method, device, equipment and storage medium
CN102263797A (en) Session control method and device
CN113220480B (en) Distributed data task cross-cloud scheduling system and method
CN107371141B (en) Junk information monitoring method and device and communication system
CN109101260B (en) Node software upgrading method and device and computer readable storage medium
CN105827418B (en) A kind of communication network warning correlating method and device
CN114169801A (en) Workflow scheduling method and device
CN110955579A (en) Ambari-based large data platform monitoring method
CN114338536B (en) Scheduling method, device, equipment and medium based on block chain

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant