CN102045186B - Event analysis method and system - Google Patents

Event analysis method and system Download PDF

Info

Publication number
CN102045186B
CN102045186B CN 200910235532 CN200910235532A CN102045186B CN 102045186 B CN102045186 B CN 102045186B CN 200910235532 CN200910235532 CN 200910235532 CN 200910235532 A CN200910235532 A CN 200910235532A CN 102045186 B CN102045186 B CN 102045186B
Authority
CN
China
Prior art keywords
event
entity
failure
root
influence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 200910235532
Other languages
Chinese (zh)
Other versions
CN102045186A (en
Inventor
高翔
侯春森
叶剑飞
张春
段森
石正贵
丁子哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN 200910235532 priority Critical patent/CN102045186B/en
Publication of CN102045186A publication Critical patent/CN102045186A/en
Application granted granted Critical
Publication of CN102045186B publication Critical patent/CN102045186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an event analysis method, comprising: A. collecting all fault events in an IT (information technology) system to form a first event assembly; B. according to the relationship of each IT entity in a preset IT system, finding fault events initiated by the fault events in the first event assembly to form a second event assembly; and C. judging whether the fault events in the first event assembly appear in the second event assembly, and extracting the fault events which do not appear in the second event assembly to form a cause event assembly. By utilizing the event analysis method and system provided by the invention, cause analysis can be carried out on the fault problem of the IT system by searching the cause fault event in the CAD (computer aided design) model of the IT system, so that network management personnel can quickly find the fault cause and remove network faults, thus saving the fault solving time and improving the working efficiency.

Description

A kind of affair analytical method and system
Technical field
The present invention relates to a kind of network management technology, relate in particular to a kind of affair analytical method and system.
Background technology
At present, in the modeling to entity and inter-entity relation, most representative achievement is exactly desktop system management role group (The Desktop Management Task Farce, abbreviation DMTF) common information model (Common Information Model, be called for short CIM) and Tele Management Forum (Tele-Management Forum, abbreviation TMF) secure identifier (Security Identifiers is called for short SID).CIM utilizes OO a series of theory, unified and expanded existing monitoring and administrative standard (SNMP, DMI, CMIP etc.), a generic concept framework that is used for definition, classification and integration networks environment parts is provided, be used in the IT environment with consistent, uniform way organization and administration object (comprising information such as system, network, application, software) logically, defined server, desktop, ancillary equipment, operating system, application, network components, user and other entities.The branch territory idea about modeling that SID proposes is more paid close attention to from high level the management object modeling.SID divides entity tight association in the territory, and entity associated is loose relatively between the territory, has accomplished high cohesion, low coupling, thereby can effectively cut apart complete traffic issues.
But these models are mainly paid close attention to the attribute description of single entity, and describe different entities how related ability a little less than, and how these weak descriptions are used for carrying out system management and also do not illustrate clear.For example, the current CIM model that is widely used, though substantially with information technology (InformationTechnology, be called for short IT) service environment IT entity and the incidence relation between them that might relate to all made the definition description, but to organize how with use these entities and their incidence relation aspect still relative a little less than, and the entity layering aspect also considered inadequately, cause the multiplexing inconvenience of entity, simultaneously because the CIM model is too considered versatility and flexibility, do not have unified entity layering and the abstract standards and norms of inter-entity incidence relation, make different keepers to the abstract unanimity that is difficult to of same system.
Therefore, these models can only be applicable to the comparatively IT business environment that constitutes of fixed network system (as the switching network for communication network and do not relate to the pure IP network of IT system) of the single operation system be made up of small number of devices at abstractdesription or inter-entity adduction relationship, and not too are applicable to the IT business environment of complicated incidence relation.Simultaneously also be difficult to directly to define a standard criterion configuration management model IT business environment is carried out abstract modeling.
By above-mentioned analysis we as can be seen the configuration management model in the current network management system have following deficiency:
1, to carry out layering abstract undesirable for the IT entity that the IT business environment is related to, and can not guarantee that model entity is by highly multiplexing.
2, the definition of the IT entity that the IT business environment is related to and inter-entity relation is clear and definite inadequately, causes different keepers abstract inconsistent to identical IT entity and inter-entity relation.
When 3, carrying out the event correlation relationship analysis, can not go to consider from the whole angle of operation system, cause administrative staff to lack integral monitoring ability to IT system.
Based on existing configuration management model in the regular job of IT system, a fault may produce a large amount of event informations.For example, a network is owing to reasons such as power failure quit work, can association go out the machine of the delaying warning information of delaying machine information and operating in the process of being monitored on these main frames of all main frames that are connected to this network, are heavy tasks and will allow the keeper analyze one by one solve these alarms.
In addition, use network management system to monitor in the process of a plurality of operation systems system maintenance person, after event takes place and the root event also identified, they need fast as far as possible which operation system of judgement to be affected and degree how, in order to reasonably arrange the priority of event handling.
Summary of the invention
The objective of the invention is to, a kind of affair analytical method and system are provided, make the network management personnel to have saved the time that solves fault after finding fault rootstock fast and solving network failure, improved operating efficiency.
For achieving the above object, according to an aspect of the present invention, provide a kind of affair analytical method, comprising: all event of failures in A, the collection IT system form first event sets; The relation of each IT entity for each event of failure in described first event sets, is found out the event of failure of its initiation in the IT system that B, basis are preset, and forms second event sets; C, judge that event of failure in described first event sets by whether appearing in described second event sets, extracts the event of failure that does not appear in second event sets, form the set of root event of failure.
Preferably, this affair analytical method also comprises: also comprise after the described step C and analyze described root event of failure to the influence of the health status of described IT entity, may further comprise the steps: D, for each root event of failure in the described root event sets, search the IT entity that influenced by it; E, calculate described each root event of failure to the influence value of the health status of described IT entity; F, will be weighted for a plurality of health effect values of same IT entity, obtain the health status of each IT entity.
For achieving the above object, according to another aspect of the present invention, provide a kind of event analysis system, comprising: the event harvester, be used for gathering all event of failures of IT system, form first event sets; Associated apparatus is used for the relation according to default each IT entity of IT system, for each event of failure in described first event sets, finds out the event of failure of its initiation, forms second event sets; Comparison means is used for more described first event sets and second event sets, obtains appearing at described first event sets but does not appear at event of failure in described second event sets, formation root event sets.
Preferably, this event analysis system also comprises: the health status analytical equipment, be used for analyzing described root event of failure to the influence of the health status of described IT entity, and comprising: search module, be used for, search the IT entity that influenced by each root event of failure; Computing module is used for calculating described each root event of failure to the influence value of the health status of described IT entity; Weighting block is used for and will be weighted for a plurality of health effect values of same IT entity, obtains the health status of each IT entity.
Affair analytical method of the present invention and system, by in the cad model of IT system, searching the root event of failure, failure problems to IT system is carried out root-cause analysis, make that the network management personnel can be after finding fault rootstock fast and solving network failure, save the time that solves fault, improved operating efficiency.
In addition, by the impact analysis of event of failure to the IT entity, make the network management personnel can judge fast that each IT entity is subjected to the event of failure effect in the IT system, and can be according to the event of failure impact analysis, rationally arrange the priority to the event of failure processing in advance, simplify accident analysis work, increased work efficiency, reasonably handled and solved fault.
Description of drawings
Fig. 1 is the structural representation of cad model embodiment of the present invention;
Fig. 2 is affair analytical method embodiment flow chart of the present invention;
Fig. 3 is event of failure diffusion schematic diagram one in the IT system of the present invention;
Fig. 4 is the flow chart of another embodiment of IT affair analytical method of the present invention;
Fig. 5 is each IT inter-entity graph of a relation of cad model;
Fig. 6 is event of failure diffusion schematic diagram two in the IT system of the present invention;
Fig. 7 is affair analytical method example structure figure of the present invention;
Fig. 8 is another example structure of affair analytical method of the present invention figure.
Embodiment
The present invention is described in detail below in conjunction with accompanying drawing.
The present invention proposes a kind of new configuration management model, the kinds of relationships of the IT business entity that relates in the IT business environmental management and inter-entity has been carried out the standardization definition and described, form a configuration management model with six layer entities and three kinds of relations, be called cad model.This model has improved the reusability of entity, has strengthened event association analysis disposal ability, has reduced webmaster personnel's workload.
With the entity of IT business environment abstract be the IT entity (IT Entity) of six kinds of levels, be in proper order from low to high: the network equipment, main frame, process, calculation services, application service and operation system.Below each entity is specifically described:
1, the network equipment (Networks Device abbreviates N as)
Refer to be serially connected in the physical equipment of IT network environment, the set of these physical equipments has constituted carrying IT business inter-entity information interaction communication environment.It comprises traditional network physical equipment, as two, three-tier switch, router, hardware firewall etc.; Comprised also that simultaneously other are serially connected in the physical equipment on the non-traditional meaning in the IT network, as be serially connected in 4 layer switch in the network, WEB application hardware fire compartment wall, user's internet behavior control hardware equipment etc.Be serially connected in physical equipment in the IT network and refer to be connected to equipment in the IT network environment, itself has also played the effect of other physical equipments that physically interconnect in the IT environment simultaneously, and namely its break-make will have influence on the communication of other physical equipments in the IT network.
2, main frame (Host abbreviates H as)
Refer to be attempted by and be used for carrying certain IT service function physical equipment in the IT network environment, the set of these physical equipments has constituted the hardware carrier environment of carrying IT business entity operation.It comprises has installed the traditional physics of operating system or virtual computer, as UNIX minicomputer, PC server with run on virtual server on the VmWare software etc.; Comprise that also other are attempted by the physical equipment in the IT network, as be attempted by 4 layer switch in the IT network, SSL VPN, network invasion monitoring, user's internet behavior control appliance etc.And connect the IT network equipment and refer to be connected to equipment in the IT network equipment, but itself is not used for other physical equipments that interconnect, be that the communication that its break-make can not have influence on other physical equipments of IT network (please notes it is the communication that does not influence other physical equipments of IT network, rather than to not influence of application aspect communication, for example for and 4 layer switch that connect the machine of delaying occurs and can not influence network service, use these 4 layers of exchanges to carry out the application of load balancing but influenced).
3, process (Process abbreviates P as)
Refer to run on the Computer Service process on the main frame.
4, calculation services (Computing Service abbreviates CS as)
It is an IT function service logic body that is combined into by one or more processes (these processes are carried by a main frame usually), realize a specific I T function service (in the SOA environment, being generally atomic service), this process group must be done as a whole, could the complete IT computing function service of completion logic.As: Domino OA calculation services is made up of two processes of server, http, and OA main frame HA calculation services is made up of process of hacmp, and CMPAK Domino Mail calculation services is made up of server, http, four processes of smtp, pop3.
5, application service (Application Service abbreviates AS as)
Be be combined into by a CS or a plurality of CS one complete application functionality in logic, the main standard of combination AS is the functional completeness of application level.As: Domino OA application service is made up of Domino OA and two calculation services of OA main frame HA, and CMPAK Domino Mail application service is made up of calculation services of CMPAK Domino Mail.
6, operation system (Business System abbreviates BS as)
Be may finish a complete business function in logic jointly by other operation system function simultaneously by one or more AS, the standard of composite service system is the functional completeness of business-level.As: auxiliary official documents gateway system, note service system, Subscriber Management System and domain name supervising system etc. form simultaneously by Domino OA application service in the official document system.
Emphasis of the present invention is to three aspects of IT inter-entity, namely topology (topology), carrier (carrier), function (functionality) are described, and 3 kinds of relations of above-mentioned 6 kinds of IT inter-entity have been determined based on these three aspects: annexation (Connecting, abbreviate C as), bearing relation (Attaching, abbreviate A as), dependence (Depending, abbreviate D as), be called for short CAD.Below these three kinds of relations are specifically described:
1, annexation
Be used for describing main frame (H) and the network equipment (N) these two kinds of interconnective topological relations of physical equipment inter-entity, i.e. annexations of main frame and the network equipment and LA Management Room.For example, certain main frame h is connected to the relation of certain network equipment n, is expressed as h → c n , Pronounce main frame h and be connected to network equipment n; Certain network equipment n 1Be connected to another network equipment n 2Relation, be expressed as n i → c n j , Pronounce network equipment n iBe connected to network equipment n jThis annexation is directive and is to transmit, i.e. the direction of arrow indication, and transmission in the direction of arrows.
2, bearing relation
Be used for describing main frame (H) the carrying carrier relation of the process (P) of operation thereon, i.e. main frame carrying runs on the bearing relation of process on it (P).For example, certain process p and the relation of carrying its main frame h are expressed as p → A h , The process p of pronouncing is carried on main frame h.This bearing relation is directive, does not still transmit, i.e. the direction of arrow indication.
3, dependence
Be used for the function dependence between description process (P), calculation services (CS), application service (AS) and operation system (BS).The relation of these inter-entity has: calculation services relies on the relation of the process of forming it, the relation that application service relies on the calculation services of forming it, and operation system depends on the relation of application service and other operation systems.For example, the relation of certain calculation services cs and certain the process p that forms it is expressed as cs → D p , Pronounce calculation services cs and depend on process p; The relation of certain application service as and certain the calculation services cs that forms it is expressed as as → D cs , Pronounce application service as and depend on calculation services cs; The relation of certain operation system bs and certain the application service as that forms it is expressed as bs → D as , Pronounce operation system bs and depend on application service as; Certain operation system bs iWith its another operation system of composition bs jRelation, be expressed as bs i → D bs j , Pronounce operation system bs iRely on and operation system bs jThis dependence is directive and is to transmit, i.e. the direction of arrow indication, and transmission in the direction of arrows.
Because the annexation of LA Management Room is network relation in the actual environment, if come network connection is described and will makes that the relationship description between model entity is very complicated according to network relation, will strengthen the construction difficulty of the IT network management system of observing this model like this.Therefore, this patent carries out further abstract simplification to the annexation in the cad model, is convenient to the IT network management system to fault rootstock and the impact analysis of IT business environment.
The present invention also is four kinds of root node, leaf node, father node and child nodes with device node abstract definition in the network, is simplified to tree-shaped relation with the annexation of LA Management Room is abstract.
(Root Network abbreviates N as to root node R) refer to the core network node equipment in the analyzed modeling network, usually actual to form the subnet that situation specifies one or more network equipments to form be core network according to network by the network manager of IT business environment, and the network equipment of forming this core network is root node.
(LeafNetwork abbreviates N as to leaf node L) refer to the network equipment of direct-connected main frame (H) in the network of analyzed modeling be generally the gateway switch of each system host or the access switch of user terminal.
Arbitrary network equipment N arrives and root node N RShortest path be designated as d<N, N R.For two direct interconnection network equipment N iAnd N jIf, d<N i, N R〉=d<N j, N R, then and N iAnd N jBe the brother of node (so all root node N RBetween equal brother of nodes each other); If d<N i, N R〉-d<N j, N R〉=1, then N jBe called N iFather node (Father), N iBe called N jChild node (Son).
Abstract in conjunction with above-mentioned each connection relation between nodes of the network equipment, for the annexation of standardization cad model further, we make following definitions to the annexation of model:
1) supposes not exist between the brother of node annexation of cad model.Two brother of node relations are designated as N ≅ N ′ , This relation does not belong to the cad model set of relations.
2) all the root node network equipment (N in the core network R) between be the brother of node, and root node in annexation can only as the relation end point, namely the annexation symbol can only appear in root node
Figure G2009102355322D00072
That side of arrow points, just N i → c N R .
3) network equipment father node (N F) and its child node (N S) or network equipment child node (N S) and its father node (N F) between annexation can only be expressed as N S → c N F .
By above-mentioned definition, all annexations that relate in the cad model will be reduced to following two kinds of annexations:
1) main frame is connected to the network leaf node with it, is designated as H → c N L .
2) annexation of LA Management Room has only a kind ofly, and namely the network child node is connected to its father node, is designated as N S → c N F , Namely N S → c N F ≡ N S → s N F ≡ N F → F N S .
Fig. 1 is the structural representation of cad model embodiment of the present invention.As shown in Figure 1, can see six layers of IT entity service topology constituent relation and and the relation of other operation systems in this official document system intuitively according to cad model.Wherein, six layer entities comprise:
Network equipment layer: two three-tier switch;
Host layer: an OA server, a MAIL server;
Process level: comprise a server process, a http process, two hacmp processes;
Calculation services layer: comprise Domino OA, OA main frame, MAIL main frame;
Application service layer: comprise the official document application service;
Business system layer: i.e. official document system.
Method embodiment one
Based on above-mentioned cad model, the invention provides a kind of affair analytical method.Fig. 2 is affair analytical method embodiment flow chart of the present invention.As shown in Figure 2, affair analytical method embodiment of the present invention may further comprise the steps:
All event of failures in step 201, the collection IT system form first event sets;
The relation of each IT entity for each event of failure in described first event sets, is found out the event of failure of its initiation in the IT system that step 202, basis are preset, and forms second event sets;
Fig. 3 is event of failure diffusion schematic diagram one in the IT system of the present invention, as shown in Figure 3, the diffusion of event then is along Network → Host → Process → CS → AS → BS direction diffusion, therefore carries out the analysis of root event and analyzes in proper order according to this exactly, can obtain each event of failure m 1All event of failures that cause in second event sets of formation, do not comprise event of failure m 1Itself;
In addition because the alarm/event that occurs in the network equipment, main frame and process aspect and part calculation services aspect (for example Oracle monitoring agent can find the alarm/event of Oracle calculation services) can only be gathered or be monitored to the IT network management system usually, therefore can not consider AS and these two aspects of BS of model carrying out Root alarm/event analysis based on cad model;
Step 203, judge event of failure in described first event sets by whether appearing in described second event sets, if not, execution in step 204;
Step 204, if event of failure m does not appear in second event sets, then this event of failure m is the root event of failure, extracts the event of failure m that does not appear in second event sets 1, form the set of root event of failure; Event of failure m 1Be fault event m 2Root mean, if event of failure m 1Take place, then event of failure m 2Will inevitably occur.
In the present embodiment, by in the cad model of IT system, searching the root event of failure, the failure problems of IT system is carried out root-cause analysis, make that the network management personnel can be after finding fault rootstock fast and solving network failure, save the time that solves fault, improved operating efficiency.
In the present embodiment, preferably, in step 202, also comprise: all event of failures are from the close-by examples to those far off sorted according to the IT entity of event generation and the distance of root node.Because the root node of the general more close network equipment of root event of failure by the ordering to event of failure, makes quicker to searching of root event of failure.
As shown in Figure 3, the machine if main frame H1 delays, will cause process p10, p11, p12 and calculation services CS1 the machine event of delaying all to occur, namely by event sets for { H1 delay machine, p10 delay machine, p11 machine, p12 machine, the CS1 machine of delaying of delaying of delaying } by model event root algorithm, process p10, p11, p12 and the calculation services CS1 machine event of delaying that can draw is caused by the main frame H1 machine event of delaying, thereby the root event sets is { H1 delay machine }.
Method embodiment two
Fig. 4 is the flow chart of another embodiment of IT affair analytical method of the present invention.As shown in Figure 4, another embodiment of IT affair analytical method of the present invention also comprises and analyzes described root event of failure to the influence of the health status of described IT entity after above-mentioned steps 208, may further comprise the steps:
Step 402 for each root event of failure in the root event sets, is searched the IT entity that influenced by it;
Step 404 is calculated each root event of failure to the influence value of the health status of IT entity;
Step 406 will be weighted for a plurality of health effect values of same IT entity, obtains the health status of each IT entity.
Preferably, in the step 402, also comprise: the root event of failure in the root event of failure is from the close-by examples to those far off sorted according to the IT entity of event generation and the distance of root node.
Fig. 5 is each IT inter-entity graph of a relation of cad model.As shown in Figure 5, entity e and e 1, e 2..., e nRelation is arranged, and itself has taken place again m in entity e 1, m 2... m kEvent.Suppose m 1, m 2... m kBe root sexual behavior part, thus the health status H of e eCalculating can be expressed as:
H e = 1 - f H ( f E ( f T ( H e 1 ) , . . . , f T ( H e n ) ) , f M ( f I ( m 1 ) , . . . , f I ( m k ) ) ) .
Wherein, f HFor calculating the function of the healthy influenced situation of certain entity, it is exactly f that this function has one of two variable EAnd f M, f EBe to calculate with this entity all entity health status of direct relation to be arranged to its transmission influence, and f MThen be to calculate all root events of this entity self generation to the influence of himself.f TBe to calculate the health status of certain and the related entity of this entity to its transmission influence.f IThen be that certain root event of this entity generation is to the influence of its this entity self.Three kinds of possible relations are arranged is annexation to the health status transitive relation between the T presentation-entity
Figure G2009102355322D00092
Bearing relation
Figure G2009102355322D00093
And dependence
Figure G2009102355322D00094
In the present embodiment, analyze described root event of failure the influence of the health status of described IT entity comprised that analysis entities self generation event is to the influence of entity health status:
In the described step 404, the health effect value is f I(m)=λ * I m+ δ, wherein m is the root event of failure, I mBe the factor of influence of described event of failure, λ is that correction coefficient, the δ of factor of influence is correction parameter.
In the step 406, the operation that will be weighted for a plurality of health effect values of same IT entity specifically comprises following several mode:
(1) for comprising a plurality of parts, a plurality of parts are finished the IT entity of a task jointly,
f M(f I(m 1) ..., f I(m k))=Max (f I(m 1) ..., f I(m k))+δ, wherein δ is correction parameter; For example for CPU, internal memory and the hard disk of a main frame;
(2) for comprising a plurality of parts, each parts can independently be finished the IT entity of a task,
f M(f I(m 1) ..., f I(m k))=Min (f I(m 1) ..., f I(m k))+δ, wherein δ is correction parameter; Two network interface cards on main frame for example;
(3) for comprising a plurality of parts, the IT entity of a task can be finished or finish separately to a plurality of parts jointly,
f M ( f I ( m 1 ) , . . . f I ( m k ) ) = Σ i = 1 k Weight ( m i ) × f I ( m i ) + δ , Wherein, Weight (m i) representing each event to the weight of entity health effect, δ is correction parameter; For example, for the main frame of two network interface cards, even standby network interface card goes wrong, also should reflect, remind the keeper in time to change network interface card;
(4) for the dissimilar root event of failure of described IT entity, adopt above-mentioned three kinds computational methods to be weighted respectively, three kinds of modes introducing previously have an implicit hypothesis to be exactly in fact: each class event only takes place once at an entity, and may have a plurality of similar events to occur in simultaneously on the same entity in the middle of actual, for example a main frame has a plurality of CPU, goes wrong simultaneously.For example for main frame, can classify according to its hardware component, and can classify based on CS for the application service that load balancing mode is realized, between every class event, adopt then and get maximum, minimum mode or weighted sum, and every class event internal condition situation employing is got maximum, minimum or is weighted summing mode again, and in the middle of this with the classification weighting, the every class inside also submode of weighting can reflect the most really that event is to the influence of entity.This double-weighted submode has two types weight, is respectively the inner weight of event category weight (being called weight between class) and event class (being called weight in the class); The weight sum can be greater than 1 between all classes of entity, and the weight sum also can be greater than 1 in the class of each event class.
Preferably, after step 208, also comprise and analyze annexation to the influence of the health status of described IT entity, may further comprise the steps:
The transmission factor of the described annexation of bandwidth calculation that takies according to described annexation For: T C → e i = BW e , e i Σ j = 1 n BW e , e j , Wherein, e represents IT entity, e iThe expression network equipment
Figure G2009102355322D00104
Show that IT entity e is to network equipment e iBandwidth,
Figure DEST_PATH_GDA00002963427500011
Represent that this IT entity is to it has the total bandwidth of the annexation network equipment with all;
Annexation to the influence value of IT entity health status is:
Figure DEST_PATH_GDA00002963427500012
Wherein, H ePresentation-entity health degree, δ are correction parameter;
All annexations of described IT entity to its health status influence value sum are:
f E ( f T ( H e 1 ) , . . . , f T ( H e n ) ) = λ Σ i = 1 n ( T C → e i f T ( H e i ) ) + δ , Wherein λ is that correction coefficient, the δ of annexation are the annexation correction parameter.
Preferably, after step 208, also comprise and analyze described bearing relation to the influence of the health status of described IT entity, may further comprise the steps:
Described bearing relation to the influence value of the health status of described main frame is:
f T(H e)=1-H e+ δ, wherein δ is correction parameter;
Described process to the influence value of the health status of described main frame is:
f E(f T(H e))=λ * f T(H e)+δ, wherein, H ePresentation-entity health degree, λ are that correction coefficient, the δ of bearing relation is the annexation correction parameter.
More preferably, after step 208, also comprise and analyze bearing relation to the influence of the health status of described IT entity, may further comprise the steps:
Described dependence to the influence value of described IT entity health status is:
Figure DEST_PATH_GDA00002963427500014
Wherein, H eThe presentation-entity health degree,
Figure DEST_PATH_GDA00002963427500015
Be set at dynamic value for the quiescent value of rule of thumb setting or by the statistical analysis to historical data, δ is correction parameter;
All dependences of described IT entity refer to its health status influence value and are
f E ( f T ( H e 1 ) , . . . , f T ( H e n ) ) = λ Σ i = 1 n ( T C → e i f T ( H e i ) ) + δ , Wherein λ is that correction coefficient, the δ of dependence are the annexation correction parameter.
In the present embodiment, by the impact analysis of event of failure to the IT entity, make the network management personnel can judge fast that each IT entity is subjected to the event of failure effect in the IT system, and can be according to the event of failure impact analysis, rationally arrange the priority to the event of failure processing in advance, simplify accident analysis work, increased work efficiency, reasonably handled and solved fault.
Fig. 6 is event of failure diffusion schematic diagram two in the IT system of the present invention, comprises AS and BS two layer entities.As shown in Figure 6, for example the transmission of the dependence of the calculation services CS1 of application service AS1 and CS2 influence is 0.5, the transmission influence of the dependence of the process p10 of calculation services CS1, p11, p12 is 0.8, the transmission influence that 1 couple of other operation systems BS uses the dependence of service AS1 is 0.8, and the degree of dependence of other operation systems BS2, BS3, BS4 is 0.1:
Suppose and have only a root event with regard to the H1 machine of delaying, service impact computing formula and service impact algorithm according to the front can obtain: H1, p10, p11, p12, CS1 health status are that 0, AS1 health degree is that the health degree of 0.5, BS1 is 0.6, and other are 1.
System embodiment
Fig. 7 is affair analytical method example structure figure of the present invention, and as shown in Figure 7, present embodiment comprises:
Event harvester 702 is used for gathering all event of failures of IT system, forms first event sets;
Associated apparatus 706 is used for the relation according to default each IT entity of IT system, for each event of failure in described first event sets, finds out the event of failure of its initiation, forms second event sets;
Comparison means 708 is used for more described first event sets and second event sets, obtains appearing at described first event sets but does not appear at event of failure in described second event sets, formation root event sets.
Preferably, present embodiment also comprises first collator 704, is used for all event of failures are from the close-by examples to those far off sorted according to the IT entity of event generation and the distance of root node.
In the present embodiment, by in the cad model of IT system, searching the root event of failure, the failure problems of IT system is carried out root-cause analysis, make that the network management personnel can be after finding fault rootstock fast and solving network failure, save the time that solves fault, improved operating efficiency.
System embodiment two
Fig. 8 is another example structure of affair analytical method of the present invention figure, and as shown in Figure 8, present embodiment also comprises:
Health status analytical equipment 804 is used for analyzing described root event of failure to the influence of the health status of described IT entity, comprising: search module, be used for, search the IT entity that influenced by each root event of failure; Computing module is used for calculating described each root event of failure to the influence value of the health status of described IT entity; Weighting block is used for and will be weighted for a plurality of health effect values of same IT entity, obtains the health status of each IT entity.
Preferably, present embodiment also comprises second collator 802, is used for described root event of failure is from the close-by examples to those far off sorted according to the IT entity of event generation and the distance of root node.
More preferably, present embodiment also comprises: annexation analytical equipment 806 is used for analyzing described annexation to the influence of the health status of described IT entity;
And/or bearing relation analytical equipment 808, be used for analyzing described bearing relation to the influence of the health status of described IT entity;
And/or dependence analytical equipment 810, be used for analyzing described dependence to the influence of the health status of described IT entity.
The analytical method of above-mentioned each analytical equipment specifies in said method embodiment, does not repeat them here.
The event analysis system of present embodiment, by to the impact analysis of event of failure to the IT entity, make the network management personnel can judge fast that each IT entity is subjected to the event of failure effect in the IT system, and can be according to the event of failure impact analysis, rationally arrange the priority to the event of failure processing in advance, simplify accident analysis work, increased work efficiency, reasonably handled and solved fault.
It should be noted that: above embodiment is only unrestricted in order to the present invention to be described, the present invention also is not limited in above-mentioned giving an example, and all do not break away from technical scheme and the improvement thereof of the spirit and scope of the present invention, and it all should be encompassed in the claim scope of the present invention.

Claims (13)

1. an affair analytical method is characterized in that, comprising:
All event of failures in A, the collection IT system form first event sets;
The relation of each IT entity for each event of failure in described first event sets, is found out the event of failure of its initiation in the IT system that B, basis are preset, and forms second event sets;
C, judge that event of failure in described first event sets by whether appearing in described second event sets, extracts the event of failure that does not appear in second event sets, form the set of root event of failure;
D, for each root event of failure in the described root event sets, search the IT entity that influenced by it;
E, calculate described each root event of failure to the influence value of the health status of described IT entity;
F, will be weighted for a plurality of health effect values of same IT entity, obtain the health status of each IT entity;
In the described step e, described health effect value is f I(m)=λ * I m+ δ, wherein m is the root event of failure, I mBe the factor of influence of described event of failure, λ is that correction coefficient, the δ of factor of influence is correction parameter.
2. affair analytical method according to claim 1 is characterized in that, the IT entity in the described IT system comprises: the network equipment, main frame, process, calculation services, application service and business service; Relation between described each IT entity comprises: the annexation of each inter-entity, bearing relation and dependence.
3. affair analytical method according to claim 2 is characterized in that,
Described annexation comprises: the annexation in the network equipment between child node and the father node, and the annexation between main frame and the network equipment;
Described bearing relation is that main frame is to the bearing relation of process;
Described dependence comprises: calculation services to the dependence of process, application service to calculation services, operation system to the dependence of the service of using and the operation system dependence to operation system.
4. affair analytical method according to claim 1 is characterized in that, also comprises in the described steps A: described all event of failures are from the close-by examples to those far off sorted according to the IT entity of event generation and the distance of root node.
5. affair analytical method according to claim 4 is characterized in that, also comprises among the described step D: the root event of failure in the described root event of failure is from the close-by examples to those far off sorted according to the IT entity of event generation and the distance of root node.
6. affair analytical method according to claim 1 is characterized in that, the operation that will be weighted for a plurality of health effect values of same IT entity in the described step F specifically comprises:
For comprising a plurality of parts, a plurality of parts are finished the IT entity of a task jointly,
f M(f I(m 1) ..., f I(m k))=Max (f I(m 1) ..., f I(m k))+δ, wherein δ is correction parameter;
Or
For comprising a plurality of parts, each parts can independently be finished the IT entity of a task,
f M(f I(m 1) ..., f I(m k))=Min (f I(m 1) ..., f I(m k))+δ, wherein δ is correction parameter;
Or
For comprising a plurality of parts, the IT entity of a task can be finished or finish separately to a plurality of parts jointly, f M ( f I ( m 1 ) , · · · f I ( m k ) ) = Σ i = 1 k weight ( m i ) × f I ( m i ) + δ , Wherein, Weight (m i) representing each event to the weight of entity health effect, δ is correction parameter;
Or
When having a plurality of similar events to occur on the same IT entity simultaneously, the dissimilar root event of failure for described IT entity adopts above-mentioned three kinds computational methods to be weighted respectively.
7. affair analytical method according to claim 2 is characterized in that, also comprises after the described step C analyzing described annexation to the influence of the health status of described IT entity, may further comprise the steps:
The transmission factor of the described annexation of bandwidth calculation that takies according to described annexation
Figure FDA00002963427400022
For:
Figure FDA00002963427400023
Wherein, e represents IT entity, e iThe expression network equipment,
Figure FDA00002963427400024
Expression IT entity e is to network equipment e iBandwidth,
Figure FDA00002963427400025
Represent that this IT entity e has the total bandwidth of the annexation network equipment to all with it;
Described annexation to the influence value of described IT entity health status is:
Wherein, H ePresentation-entity health degree, δ are correction parameter;
All annexations of described IT entity to its health status influence value sum are:
f E ( f T ( H e 1 ) , . . . , f T ( H e n ) ) = λ Σ i = 1 n ( T C → e i f T ( H e i ) ) + δ , Wherein λ is that correction coefficient, the δ of annexation are the annexation correction parameter.
8. affair analytical method according to claim 3 is characterized in that, also comprises after the described step C analyzing described bearing relation to the influence of the health status of described IT entity, may further comprise the steps:
Described bearing relation to the influence value of the health status of described main frame is:
f T(H e)=1-H e+ δ, wherein δ is correction parameter;
Described process to the influence value of the health status of described main frame is:
f E(f T(H e))=λ * f T(H e)+δ, wherein, H ePresentation-entity health degree, λ are that correction coefficient, the δ of bearing relation is the annexation correction parameter.
9. affair analytical method according to claim 2 is characterized in that, also comprises after the described step C analyzing described bearing relation to the influence of the health status of described IT entity, may further comprise the steps:
Described dependence to the influence value of described IT entity health status is:
Figure FDA00002963427400032
Wherein, H eThe presentation-entity health degree, Be set at dynamic value for the quiescent value of rule of thumb setting or by the statistical analysis to historical data, δ is correction parameter;
All dependences of described IT entity refer to its health status influence value and are
f E ( f T ( H e 1 ) , . . . , f T ( H e n ) ) = λ Σ i = 1 n ( T C → e i f T ( H e i ) ) + δ , Wherein λ is that correction coefficient, the δ of dependence are the annexation correction parameter.
10. an event analysis system is characterized in that, comprising:
The event harvester is used for gathering all event of failures of IT system, forms first event sets;
Associated apparatus is used for the relation according to default each IT entity of IT system, for each event of failure in described first event sets, finds out the event of failure of its initiation, forms second event sets;
Comparison means is used for more described first event sets and second event sets, obtains appearing at described first event sets but does not appear at event of failure in described second event sets, formation root event sets;
The health status analytical equipment is used for analyzing described root event of failure to the influence of the health status of described IT entity, comprising: search module, be used for searching the IT entity that influenced by each root event of failure; Computing module is used for calculating described each root event of failure to the influence value of the health status of described IT entity; Weighting block is used for and will be weighted for a plurality of health effect values of same IT entity, obtains the health status of each IT entity;
In the described computing module, the formula of described calculating health effect value is f I(m)=λ * I m+ δ, wherein m is the root event of failure, I mBe the factor of influence of described event of failure, λ is that correction coefficient, the δ of factor of influence is correction parameter.
11. event analysis according to claim 10 system is characterized in that, also comprises first collator, is used for described all event of failures are from the close-by examples to those far off sorted according to the IT entity of event generation and the distance of root node.
12. event analysis according to claim 10 system is characterized in that, also comprises: second collator is used for IT entity that described root event of failure is taken place according to event and the distance of root node and from the close-by examples to those far off sorts.
13. event analysis according to claim 10 system is characterized in that, also comprises:
The annexation analytical equipment is used for analyzing described annexation to the influence of the health status of described IT entity;
And/or
The bearing relation analytical equipment is used for analyzing described bearing relation to the influence of the health status of described IT entity;
And/or
The dependence analytical equipment is used for analyzing described dependence to the influence of the health status of described IT entity.
CN 200910235532 2009-10-19 2009-10-19 Event analysis method and system Active CN102045186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200910235532 CN102045186B (en) 2009-10-19 2009-10-19 Event analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200910235532 CN102045186B (en) 2009-10-19 2009-10-19 Event analysis method and system

Publications (2)

Publication Number Publication Date
CN102045186A CN102045186A (en) 2011-05-04
CN102045186B true CN102045186B (en) 2013-07-17

Family

ID=43911003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200910235532 Active CN102045186B (en) 2009-10-19 2009-10-19 Event analysis method and system

Country Status (1)

Country Link
CN (1) CN102045186B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103023028B (en) * 2012-12-17 2015-09-02 江苏省电力公司 A kind of electric network fault method for rapidly positioning based on inter-entity dependence graph
CN103368782B (en) * 2013-07-30 2016-08-10 浙江中烟工业有限责任公司 A kind of network status analysis method
CN106843111B (en) * 2017-03-10 2019-04-05 中国石油大学(北京) The accurate source tracing method of hydrocarbon production system alarm signal root primordium and device
CN109150635B (en) * 2018-10-26 2021-09-07 中国农业银行股份有限公司 Fault influence analysis method and device
CN112116262A (en) * 2020-09-24 2020-12-22 华能盐城大丰新能源发电有限责任公司 Evaluation method for health degree of wind generating set equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1713591A (en) * 2004-06-22 2005-12-28 中兴通讯股份有限公司 Alarm correlation analysis of light synchronous transmitting net
EP1657952A1 (en) * 2004-11-12 2006-05-17 Siemens Aktiengesellschaft A ring network for a burst switching network with distributed management
CN101345661A (en) * 2007-07-09 2009-01-14 大唐移动通信设备有限公司 Fault diagnosis method and device for communication equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1713591A (en) * 2004-06-22 2005-12-28 中兴通讯股份有限公司 Alarm correlation analysis of light synchronous transmitting net
EP1657952A1 (en) * 2004-11-12 2006-05-17 Siemens Aktiengesellschaft A ring network for a burst switching network with distributed management
CN101345661A (en) * 2007-07-09 2009-01-14 大唐移动通信设备有限公司 Fault diagnosis method and device for communication equipment

Also Published As

Publication number Publication date
CN102045186A (en) 2011-05-04

Similar Documents

Publication Publication Date Title
AU720871B2 (en) Apparatus and method for network capacity evaluation and planning
CN111885012B (en) Network situation perception method and system based on information acquisition of various network devices
CN102158360B (en) Network fault self-diagnosis method based on causal relationship positioning of time factors
US10505819B2 (en) Method and apparatus for computing cell density based rareness for use in anomaly detection
CN101567814B (en) Automatic network management method based on SNMP and stochastic Petri net
CN102752142B (en) A kind of method for supervising of the information system based on Conceptual Modeling and supervisory control system
CN101282237B (en) Synthetic network management system based on SNMP
CN107294764A (en) Intelligent supervision method and intelligent monitoring system
CN109544349A (en) One kind being based on networked asset information collecting method, device, equipment and storage medium
EP3304813A1 (en) Network behavior data collection and analytics for anomaly detection
CN101095307A (en) Network management appliance
CN107690776A (en) For the method and apparatus that feature is grouped into the case for having selectable case border in abnormality detection
CN102045186B (en) Event analysis method and system
CN105790990B (en) A kind of method and its system for supervising adapted telecommunication business
CN114167760A (en) Intention-driven network management system and method
CN110891283A (en) Small base station monitoring device and method based on edge calculation model
CN110175102A (en) A kind of information management system
CN102420700A (en) Network fault diagnosis system
CN103326874A (en) System and method for alarm management
Hou et al. A distributed deployment algorithm of process fragments with uncertain traffic matrix
Balducelli et al. Novelty detection and management to safeguard information-intensive critical infrastructures
Menete et al. Smart grid critical information infrasructure protection through multi-agency
Hou et al. Requirement Analysis of Operational Network Organization Based on PDOA
Dong et al. Multi-party Cooperative Network Fault Management Mechanism based on Federated Learning
Huang et al. An intelligent testing and monitoring management system in smart grid

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant