CN101674196B - Multi-domain collaborative distributed type fault diagnosis method and system - Google Patents

Multi-domain collaborative distributed type fault diagnosis method and system Download PDF

Info

Publication number
CN101674196B
CN101674196B CN2009101483713A CN200910148371A CN101674196B CN 101674196 B CN101674196 B CN 101674196B CN 2009101483713 A CN2009101483713 A CN 2009101483713A CN 200910148371 A CN200910148371 A CN 200910148371A CN 101674196 B CN101674196 B CN 101674196B
Authority
CN
China
Prior art keywords
symptom
domain
territory
cross
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009101483713A
Other languages
Chinese (zh)
Other versions
CN101674196A (en
Inventor
邹仕洪
褚灵伟
程时端
王文东
田春歧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN2009101483713A priority Critical patent/CN101674196B/en
Publication of CN101674196A publication Critical patent/CN101674196A/en
Application granted granted Critical
Publication of CN101674196B publication Critical patent/CN101674196B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a multi-domain collaborative distributed type fault diagnosis method and a system. Each domain of the system comprises a domain manager which can be communicated with other domain managers and assigns cross-domain warning/symptoms to the domain that is most probable to result in the cross-domain symptoms; and the domain managers is responsible for the fault diagnosis of the domains at which the domain managers are located. When the symptoms are exchanged, the current domain can only be communicated with partial other domains which may result in the symptoms so as to be capable of reducing communication overhead. The multi-domain collaborative distributed type fault diagnosis method and the system adopt the symptoms in the domains to carry out preliminary diagnosis, and then estimate the possibility for resulting in each cross-domain warning according to the preliminary fault assumption, improve the estimation accuracy, consider the possibility of wrongly distributed symptoms, calculate one fault symptom probability for each distributed symptom, and therefore can reduce the influence of wrong distribution on the diagnosis.

Description

A kind of distributed type fault diagnosis method of multi-domain collaborative and system
Technical field
The present invention relates to the fault diagnosis field in the computer network, relate in particular to a kind of distributed type fault diagnosis method and system of multi-domain collaborative.
Background technology
In large-scale complex environment, service management system need be handled a large amount of information.The researcher generally believes, should divide management information and function, and with distributed mode to its management (referring to paper: Fault-Locating Test summary in the computer network, A survey of fault localization techniques incomputer networks).Dividing data will force manager to carry out fault management based on imperfect information between manager.Propagate the problem of coming because fault propagation between the territory, manager may be received in other manager administration territories, and the symptom that specific fault is relevant in certain territory may be invisible in this territory.For fault location in such complex environment, need effective distributed diagnostics algorithm.
People such as Katzela are (referring to paper: centralized vs distributed fault location, Centralized vsDistributed Fault Localization) theoretical foundation of design distributed diagnostics system has been proposed, three kinds of failure diagnosis mechanism have been compared: centralized (Centralized), non-centralized (Decentralized) and distributed (Distributed), and assessed these machine-processed accuracy and feasibilities.There is not central manager in distributed diagnostics, by a minute domain manager collaborative work tracing trouble.When analyzing alarm, each territory all needs to handle all relative cross-domain alarms.The territory need be each cross-domain alarm association agent node, represent that this alarm may be explained by the fault in other territories, for agent node distributes a probable value, the expression alarm is used the single domain fault diagnosis algorithm to calculate optimum fault afterwards and is explained by the probability that fault in other territories causes simultaneously.
Steinder and Sethi (referring to paper: the multiple domain of end-to-end service fault diagnosis in the hierarchical routing network, MultiDomain diagnosis of end-to-end service failures in hierarchically routednetworks) have proposed to stride the distributed diagnostics algorithm in a plurality of autonomous territories.The network that comprises a plurality of autonomous territories is managed by a plurality of domain managers, and is coordinated by network manager.Global fault's propagation model is divided into each autonomous territory, and increases agent node in the fault propagation model in each territory, and expression is propagated the fault of coming from other autonomous territories.When fault management platform received warning information, fault location need be carried out in all autonomous territories, to seek most probable fault hypothesis.In practice, this will be very consuming time and poor efficiency.In addition, the also existence of dependant part global structure information of this article algorithm.
People such as Fischer (referring to paper: the case of cross-domain fault location-figure method of abstracting, Cross-DomainFault Localization:A Case for a Graph Digest Approach) have proposed the distributed diagnostics scheme under one the two territory environment.The territory of finding symptom to its dependency graph prune with union operation after, the dependency graph of hiding internal structural information is sent to another territory, be responsible for diagnosis by this territory and draw hypothesis.The scene of this article hypothesis is too simple, is difficult to its method is generalized to more multiple domain situation.
Summary of the invention
In order to solve above-mentioned technical problem, a kind of distributed type fault diagnosis method and system of multi-domain collaborative have been proposed, its purpose is, use certain method assessment to cause each territory possibility of cross-domain symptom, and these symptoms are assigned to the territory that most probable causes this symptom, after the processing of finishing cross-domain symptom, can use existing single domain fault diagnosis algorithm reasoning fault hypothesis.
The invention provides a kind of distributed diagnostics system of multi-domain collaborative, include a domain manager in each territory; Domain manager is used for the failure diagnosis in responsible this territory, domain manager place; Domain manager of the current field and most probable cause the domain manager in the territory of cross-domain symptom to communicate, and cross-domain alarm/symptom is distributed to the territory that this most probable causes cross-domain symptom.
Domain manager comprises interface module, and management/information presents module, symptom Switching Module, impact evaluation module, symptom distribution module, alarm/fault message module, fault diagnosis module and dependence model;
Interface module, be used for the management information of the domain manager of the current field or the domain manager that data send to the territory that may cause cross-domain symptom, and symptom Switching Module, impact evaluation module, symptom distribution module or fault diagnosis module in the domain manager that the management information that receives or data are sent to the current field;
Management/information presents module, is used for symptom or dependence data are write correspondence database;
The symptom Switching Module, be used for when the domain manager of the current field is found cross-domain symptom, this cross-domain symptom is mapped as the correlator service of selecting domain, and with the correlator service of selecting domain as the domain manager of cross-domain symptom report to the territory that may cause cross-domain symptom;
The impact evaluation module is used for after the domain manager of the current field is received cross-domain symptom, and assessment the current field internal fault causes the possibility of this cross-domain symptom, and assessed value is turned back to the territory of this cross-domain symptom of report;
The symptom distribution module after the domain manager of the current field is received assessed value, adopts corresponding comparison mechanism comparative assessment value, more cross-domain symptom is distributed to most probable and is caused the territory of this symptom to be diagnosed;
Fault diagnosis module is used for diagnosing together being assigned to cross-domain symptom of the current field and the symptom in the current field, thereby must be out of order hypothesis;
Rely on model, be used to store symptom-fault and rely on model.
Management/information presents module, is used for presenting administration interface to the keeper, and according to the instruction management system from the keeper.
Domain manager also comprises alarm/fault message module, is used for storage alarm and failure logging.
Described dependence model is that two fens Bayesian networks rely on model, and it is the Noisy-OR model that two fens Bayesian networks rely on the node link model in the model.
The symptom Switching Module is according to the unusual probability of sub-services in other territories of observed symptom of the current field and the use of two fens Bayesian network dependence model reasoning the current fields, and select the territory corresponding according to predefined probable value, the territory of this territory for causing cross-domain symptom with qualified sub-services.
The impact evaluation module is observed in the territory sympotomatic set and two fens Bayesian networks according to the current field and is relied on model and do the single domain failure diagnosis, draws preliminary fault hypothesis H '; If the symptom that the current field receives can be by H ' explanation, then assessed value is revised as to break down among the H ' and causes the maximum of this symptom; If the symptom that the current field receives can not explain that then assessed value is priori probability of malfunction and conditional probability maximum product by the middle fault of H '.
The symptom distribution module is distributed related one of symptom increase for each and is acted on behalf of malfunctioning node, and this agent node represents that this symptom reality is caused by fault in other territories, and the possible degree that the node prior probability is caused by other territories for this symptom is false symptom probability; False symptom probability should be assessed by the territory of finding cross-domain symptom, is assigned to the territory that most probable causes symptom with symptom again; False symptom probability is the maximum in the assessed value of removing in the assessed value received of the current field the assessed value of the territory transmission that distributes this cross-domain symptom.
Fault diagnosis module is that each the cross-domain symptom that is assigned to increases related agent node to being assigned to before symptom carries out failure diagnosis, and this agent node represents that corresponding symptom is wrong the distribution, and its prior probability is the false symptom probability that is assigned to; Remove two fens Bayesian networks and rely on associated overseas component nodes in the model, use the reasoning of single domain diagnostic method to draw a fault hypothesis afterwards,, then this node is deleted from final result if comprise agent node in the diagnosis hypothesis.
The invention provides a kind of method for diagnosing faults of distributed diagnostics system of multi-domain collaborative, it is characterized in that, comprising:
Step 1 all is provided with a domain manager in each territory; Domain manager is used for the failure diagnosis in responsible this territory, domain manager place;
Step 2, the domain manager of the current field communicates with causing the domain manager in the territory of cross-domain symptom, and cross-domain alarm/symptom is distributed to the territory that this most probable causes cross-domain symptom.
Adopt device of the present invention, can be achieved as follows beneficial effect:
Distributed diagnostics system based on multi-domain collaborative.This system disposes a manager in each management domain, the failure diagnosis task is finished in interactive maintenance information and cooperation between the manager.
In the symptom switching phase: consult territory ratio SDR for cross-domain symptom is provided with, only select the part correlation territory to consult symptom and distribute, thereby reduce the supervisory communications expense.
In the impact evaluation stage: symptom is carried out tentative diagnosis in the employing territory, causes the possibility of each cross-domain symptom again according to preliminary fault hypothesis evaluation.
At the symptom allocated phase: the symptom that is each distribution is calculated a false symptom probability, the wrong possibility of distributing symptom of expression.
The present invention can improve the accuracy of failure diagnosis by the diagnosis to cross-domain symptom with respect to the diagnostic method of not considering collaborative process symptom between the territory.
The present invention selects part to consult the ratio SDR in territory for each cross-domain symptom has increased in the symptom Switching Module.If the current field is observed a cross-domain symptom, there is dependence in this a symptom and m territory.The present invention only selects wherein SDR*m territory negotiation symptom distribution.By regulating the SDR value, promptly can control communication overhead.Therefore, the present invention selects all to consult the method in territory relatively, has effectively reduced the supervisory communications expense.
The present invention adopts symptom in the territory (i.e. the service symptom of dependence in the existence domain) to carry out tentative diagnosis, causes the possibility of each cross-domain symptom again according to preliminary fault hypothesis evaluation.With respect to the method for not considering tentative diagnosis, the present invention can improve the accuracy of impact evaluation.
The present invention has considered the possibility of wrong distribution symptom, for the symptom of each distribution is calculated a false symptom probability.With respect to not considering false symptom probability method, the present invention can reduce the wrong pairing influence that diagnosis brings that divides.
Description of drawings
Fig. 1 is a distributed diagnostics provided by the invention system;
Fig. 2 is a distributed diagnostics handling process schematic diagram provided by the invention.
Embodiment
The present invention includes a distributed fault diagnosis system, as shown in Figure 1.This system all comprises a domain manager in each management domain.This manager is responsible for (being with each cross-domain symptom relevant with the partial other domains manager, and a part of territory that may cause this symptom) manage communication, cross-domain symptom (promptly this symptom may be caused by other territory internal faults) is distributed to the territory that most probable causes this symptom; And be responsible for the failure diagnosis in this territory.Each domain manager includes interface module, and management/information presents module, the symptom Switching Module, and the impact evaluation module, the symptom distribution module, fault diagnosis module relies on model, alarm/fault message module.
Management/information presents module and can read and retouching operation relying on model and alarm/fault message module, to satisfy the demand of management strategy and system state change; The symptom Switching Module, the impact evaluation module, and fault diagnosis module only can be carried out read operation to relying on model and alarm/fault message module; The symptom distribution module can read and retouching operation alarm/fault message module.
Interface module is responsible for the management information of domain manager inside or data are sent to other domain managers, and will send to corresponding internal module from the management information and the data of outside.Management/information presents module and presents administration interface to the keeper; According to instruction management system from the keeper; Symptom or dependence data are write correspondence database.When domain manager was found cross-domain symptom, the symptom Switching Module was mapped as the correlator service of selecting domain with these symptoms, and with sub-services as the domain manager of symptom report to corresponding this sub-services correspondence.Receive the symptom of other territory reports when a domain manager after, need to use this territory internal fault of impact evaluation module estimation to cause the possibility of these symptoms, and these values are returned the territory of report symptom.After a domain manager was received the assessed value of returning, the symptom distribution module in this territory adopted certain comparison mechanism comparative assessment value, more cross-domain symptom is distributed to most probable and is caused the territory of this symptom to be diagnosed.Symptom is diagnosed together in all symptoms that fault diagnosis module will be assigned to and the territory, thereby draws fault hypothesis more accurately.Rely on model and store current symptom-fault dependence model.Alarm/fault message is stored all alarms and failure logging.
Rely on model:
In order to represent the dependence under the multi-domain environment and to protect structure and information security in each territory, the present invention adopts distributed dependence model simultaneously in the representative domain and overseas dependence, and these dependences can obtain in service operation process or service-creation process.The present invention sets up two fens Bayesian network for each territory and relies on model.The upper layer node that relies on model is the services set S={s in this territory i, s i=1 expression service is unusual, s i=0 expression service is normal.Lower level node comprises territory inner assembly collection F={f iAnd overseas correlator services set S '=s ' i.f i=1 expression assembly breaks down f i=0 expression assembly is normal, s ' i=1 this sub-services of expression takes place unusual and fault propagation arrives this territory, s ' i=0 this sub-services of expression not fault propagation arrives this territory.Each component liaison priori probability of malfunction P (f); Every related expression of directed edge relies on the conditional probability P (s|f) of intensity.Having the lower level node set representations of dependence with upper layer node a is Par (a), and having the upper layer node set representations of dependence with lower level node b is Child (b).It is separate and with logic connective OR combination, i.e. noisy-OR model that the present invention supposes to serve unusual possible cause.
The symptom exchange:
When the current field was found cross-domain symptom, the symptom Switching Module should be consulted symptom and distributes according to relying on part territory that Model Selection may cause this symptom.Before the exchange symptom, the current field according to the observation to symptom and rely on the unusual probability of sub-services in other territories of using in this territory of model reasoning, just can select the territory that has the high probability sub-services according to these probable values, these territories are exactly the territory that may cause this cross-domain symptom.Suppose territory D iCurrent observed cross-domain sympotomatic set is combined into S N, it relies in model and S NRelevant overseas sub-services is F SN, can set up upper layer node is S N, lower level node is F SNTwo fens Bayesian network models.The unusual probability tables of each sub-services f is shown at given observation S NCondition under, f is to the probability of this territory fault propagation.Because these sub-services are actual to be in other territories, this territory can't obtain to decide their priori probability of malfunction, is identical value at this priori probability of malfunction of supposing them.
With S fObserved cross-domain symptom set, i.e. S in the descendent node of expression f f=Child (f) ∩ S NPosterior probability reasoning to overseas sub-services is as follows:
P ( f | S N ) = P ( f | S f ) = P ( f ) * P ( S f | f ) / P ( S f ) = P ( f ) * Π s ∈ S f P ( s | f ) / P ( s )
= P ( f ) * Π s ∈ S f P ( s | f ) / Σ f ∈ Par ( s ) P ( f ) * P ( s | f ) - - - ( 1 )
After calculating end, the current field is that each cross-domain symptom selects part negotiation territory to come mutual symptom, should have the bigger sub-services of posterior probability in these territories.
Impact evaluation:
Suppose the current field D kObserve sympotomatic set S in the territory N, i.e. the service symptom of dependence in the existence domain, the present invention can use these symptoms and rely on model and do the single domain failure diagnosis, draws preliminary fault hypothesis H '.H ' obtains according to symptom reasoning in the territory, and confidence level is higher, can suppose the actual generation of the middle fault of H '.If the symptom of other territory reports can be by H ' explanation, then assessed value is revised as to break down among the H ' and causes the maximum of this symptom; If this symptom can not be explained by the middle fault of H ', then use the priori probability of malfunction and the conditional probability that rely on fault that this symptom relies in the model to assess.Valuation functions is as follows:
P D k ( s ) = max f ∈ H ′ P ( s | f ) ifs ∈ child ( H ′ ) max f ∈ Par ( s ) P ( f ) * P ( s | f ) ifs ∉ child ( H ′ ) - - - ( 2 )
Wherein H ′ = arg max H ′ ⊆ F P ( H ′ | S N ) , Child ( H ′ ) = ∪ f ∈ H ′ Child ( f ) .
Symptom is distributed:
Because may exist mistake to distribute symptom, this symptom is not caused in the territory that promptly is assigned to symptom in fact, if directly diagnose this symptom will reduce the diagnosis accuracy.In order to reduce the influence that this phenomenon is brought, the present invention distributes related one of symptom increase for each and acts on behalf of malfunctioning node in relying on model, this agent node represents that this symptom reality is caused by fault in other territories, the possible degree that the node prior probability is caused by other territories for this symptom, promptly false symptom probability.False symptom probability should be assessed by the territory of finding cross-domain symptom, is assigned to the territory that most probable causes symptom with symptom again.
Suppose territory D iObserve cross-domain symptom s, and received maximum assessed value { P from the domain of dependence Dk(s) }, determine that according to these values distributing the territory of this symptom is D k, the false probable value P that needs mistake in computation to distribute S(s).If this symptom be can't help D kMiddle fault causes, and then inevitable by fault initiation in other domains of dependence, it is false probability that the present invention selects the maximum assessed value in other territories.
P s ( s ) = max j ≠ k ( P D j ( s ) ) - - - ( 3 )
Failure diagnosis:
To being assigned to before symptom carries out failure diagnosis, need be for each the cross-domain symptom that is assigned to increase related agent node in relying on model, this agent node represents that corresponding symptom is wrong the distribution, its prior probability is the false symptom probability that is assigned to; And associated overseas component nodes in the removal master mould.Can use the single domain diagnosis algorithm to draw a fault hypothesis afterwards according to symptom and dependence model reasoning, as single domain diagnosis algorithm IBU (referring to paper: based on the communication system probability failure diagnosis of belief network, Probabilisticfault diagnosis in communication systems using belief networks) and IHU (referring to paper: based on the communication system probability failure diagnosis that the increment hypothesis is upgraded, Probabilistic Fault Diagnosis inCommunication Systems Through Incremental Hypothesis Updating).If comprise agent node in the diagnosis hypothesis, then can simply this node be deleted from final result.
Fig. 2 has showed handling process of the present invention.Flow process of the present invention comprises 4 stages: the symptom exchange, and impact evaluation, symptom is distributed, failure diagnosis.Each domain manager all comprises the above-mentioned flow process stage.
Step 101: when domain manager is found cross-domain symptom, at first fault propagation is arrived the probability of the current field according to these symptom reasonings overseas sub-services relevant with them.
Step 102: this sub-services of the high explanation of fault propagation probability has probably been propagated fault to this territory, considers communication overhead, and the present invention propagates probability according to these and selects part to consult the territory.
Step 103: because therefore the corresponding a plurality of overseas sub-services of symptom possibility are mapped as symptom the correlator service in the selected territory of step 102, and report to these territories.
Step 104: use symptom reasoning partial fault hypothesis H ' in the observed territory of the current field.
Step 105: according to H ', probability of malfunction and dependence intensity are calculated the assessed value that this territory internal fault causes other territory report symptom.
Step 106: the territory that the assessed value that calculates is returned to this symptom of report.
Step 107: after receiving the assessed value of returning in other territories, the current field can be selected the territory that most probable causes each cross-domain symptom according to these assessed values.
Step 108:, therefore calculate the false symptom probability that the symptom mistake is distributed for this territory because symptom may be distributed by wrong.
Step 109: cross-domain symptom and false probable value are distributed to selecting domain simultaneously.
Step 110: act on behalf of malfunctioning node for each symptom node that is assigned to increases related one, the false probable value of its prior probability for distributing.
Step 111: use the single domain fault diagnosis algorithm that the symptom of symptom in the territory and distribution is diagnosed, the hypothesis that must be out of order H.
Those skilled in the art can also carry out various modifications to above content under the condition that does not break away from the definite the spirit and scope of the present invention of claims.Therefore scope of the present invention is not limited in above explanation, but determine by the scope of claims.

Claims (9)

1. the distributed diagnostics system of a multi-domain collaborative is characterized in that, includes a domain manager in each territory; Domain manager is used for the failure diagnosis in responsible this territory, domain manager place; The domain manager of the current field communicates with causing the domain manager in the territory of cross-domain symptom, and cross-domain alarm/symptom is distributed to the territory that most probable causes cross-domain symptom;
Domain manager comprises interface module, and management/information presents module, symptom Switching Module, impact evaluation module, symptom distribution module, fault diagnosis module and dependence model;
Interface module, be used for the management information of the domain manager of the current field or the domain manager that data send to the territory that may cause cross-domain symptom, and symptom Switching Module, impact evaluation module, symptom distribution module or fault diagnosis module in the domain manager that the management information that receives or data are sent to the current field;
Management/information presents module, is used for symptom or dependence data are write correspondence database;
The symptom Switching Module is used for when the domain manager of the current field is found cross-domain symptom, and this cross-domain symptom is mapped as the correlator service of selecting domain, and with the correlator service of selecting domain as the domain manager of cross-domain symptom report to the selecting domain;
The impact evaluation module is used for after the domain manager of the current field is received cross-domain symptom, and assessment the current field internal fault causes the possibility of this cross-domain symptom, and assessed value is turned back to the territory of this cross-domain symptom of report;
The symptom distribution module after the domain manager of the current field is received assessed value, adopts corresponding comparison mechanism comparative assessment value, more cross-domain symptom is distributed to most probable and is caused the territory of this cross-domain symptom to be diagnosed;
Fault diagnosis module is used for diagnosing together being assigned to cross-domain symptom of the current field and the symptom in the current field, thereby must be out of order hypothesis;
Rely on model, be used to store symptom-fault and rely on model.
2. the distributed diagnostics system of multi-domain collaborative as claimed in claim 1 is characterized in that management/information presents module, also be used for presenting administration interface to the keeper, and according to the instruction management system from the keeper.
3. the distributed diagnostics system of multi-domain collaborative as claimed in claim 1 is characterized in that, domain manager also comprises alarm/fault message module, is used for storage alarm and failure logging.
4. the distributed diagnostics system of multi-domain collaborative as claimed in claim 1 is characterized in that, described dependence model is that two fens Bayesian networks rely on model, and it is the Noisy-OR model that two fens Bayesian networks rely on the node link model in the model.
5. the distributed diagnostics system of multi-domain collaborative as claimed in claim 4, it is characterized in that, the symptom Switching Module is according to the unusual probability of sub-services in other territories of observed symptom of the current field and the use of two fens Bayesian network dependence model reasoning the current fields, and select the territory corresponding according to predefined probable value, the territory of this territory for causing cross-domain symptom with qualified sub-services.
6. the distributed diagnostics system of multi-domain collaborative as claimed in claim 5 is characterized in that, the impact evaluation module is observed in the territory sympotomatic set and two fens Bayesian networks according to the current field and relied on model and do the single domain failure diagnosis, draws preliminary fault hypothesis H '; If the symptom that the current field receives can be by H ' explanation, then assessed value is revised as to break down among the H ' and causes the maximum of this symptom; If the symptom that the current field receives can not explain that then assessed value is priori probability of malfunction and conditional probability maximum product by the middle fault of H '.
7. the distributed diagnostics system of multi-domain collaborative as claimed in claim 6, it is characterized in that, the symptom distribution module is distributed related one of symptom increase for each and is acted on behalf of malfunctioning node, this is acted on behalf of malfunctioning node and represents that this symptom reality is caused by fault in other territories, and the possible degree that the priori probability of malfunction is caused by other territories for this symptom is false symptom probability; False symptom probability should be assessed by the territory of finding cross-domain symptom, is assigned to the territory that most probable causes symptom with symptom again; False symptom probability is the maximum in the assessed value of removing in the assessed value received of the current field the assessed value of the territory transmission that distributes this cross-domain symptom.
8. the distributed diagnostics system of multi-domain collaborative as claimed in claim 7, it is characterized in that, fault diagnosis module is to being assigned to before symptom carries out failure diagnosis, for increasing related one, each the cross-domain symptom that is assigned to acts on behalf of malfunctioning node, this is acted on behalf of malfunctioning node and represents that corresponding symptom is wrong the distribution, and its priori probability of malfunction is the false symptom probability that is assigned to; Remove two fens Bayesian networks and rely on associated overseas component nodes in the model, use the reasoning of single domain diagnostic method to draw a fault hypothesis afterwards, act on behalf of malfunctioning node, then this is acted on behalf of malfunctioning node and from final result, delete if comprise in the fault hypothesis.
9. the method for diagnosing faults as the distributed diagnostics system of any described multi-domain collaborative of claim 1-8 is characterized in that, comprising:
Step 1 all is provided with a domain manager in each territory; Domain manager is used for the failure diagnosis in responsible this territory, domain manager place;
Step 2, the domain manager of the current field communicates with causing the domain manager in the territory of cross-domain symptom, and cross-domain alarm/symptom is distributed to the territory that most probable causes cross-domain symptom.
CN2009101483713A 2009-06-16 2009-06-16 Multi-domain collaborative distributed type fault diagnosis method and system Expired - Fee Related CN101674196B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009101483713A CN101674196B (en) 2009-06-16 2009-06-16 Multi-domain collaborative distributed type fault diagnosis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009101483713A CN101674196B (en) 2009-06-16 2009-06-16 Multi-domain collaborative distributed type fault diagnosis method and system

Publications (2)

Publication Number Publication Date
CN101674196A CN101674196A (en) 2010-03-17
CN101674196B true CN101674196B (en) 2011-12-07

Family

ID=42021200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101483713A Expired - Fee Related CN101674196B (en) 2009-06-16 2009-06-16 Multi-domain collaborative distributed type fault diagnosis method and system

Country Status (1)

Country Link
CN (1) CN101674196B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107925585B (en) * 2015-08-31 2020-07-21 华为技术有限公司 Network service fault processing method and device
CN107301131A (en) * 2017-06-30 2017-10-27 郑州云海信息技术有限公司 A kind of distributed storage management software fault diagnosis method and system
CN107896168B (en) * 2017-12-08 2020-11-10 国网安徽省电力有限公司信息通信分公司 Multi-domain fault diagnosis method for power communication network in network virtualization environment
CN109218406B (en) * 2018-08-13 2020-12-15 广西大学 Cross-domain cooperative service method for smart city
CN113746679B (en) * 2019-04-22 2023-05-02 腾讯科技(深圳)有限公司 Cross-subdomain communication operation and maintenance method, total operation and maintenance server and medium
CN110391947A (en) * 2019-08-21 2019-10-29 广东电网有限责任公司 A kind of power telecom network equipment fault diagnosis method of multi-controller cooperation
CN113541988B (en) * 2020-04-17 2022-10-11 华为技术有限公司 Network fault processing method and device
US20220027331A1 (en) * 2020-07-23 2022-01-27 International Business Machines Corporation Cross-Environment Event Correlation Using Domain-Space Exploration and Machine Learning Techniques
CN112383934B (en) * 2020-10-22 2024-01-16 深圳供电局有限公司 Service fault diagnosis method for multi-domain cooperation under 5G network slice

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6430712B2 (en) * 1996-05-28 2002-08-06 Aprisma Management Technologies, Inc. Method and apparatus for inter-domain alarm correlation
EP1460801A1 (en) * 2003-03-17 2004-09-22 Tyco Telecommunications (US) Inc. System and method for fault diagnosis using distributed alarm correlation
CN101170447A (en) * 2007-11-22 2008-04-30 北京邮电大学 Service failure diagnosis system based on active probe and its method
CN101192997A (en) * 2006-11-24 2008-06-04 中国科学院沈阳自动化研究所 Remote status monitoring and fault diagnosis system for distributed devices

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6430712B2 (en) * 1996-05-28 2002-08-06 Aprisma Management Technologies, Inc. Method and apparatus for inter-domain alarm correlation
EP1460801A1 (en) * 2003-03-17 2004-09-22 Tyco Telecommunications (US) Inc. System and method for fault diagnosis using distributed alarm correlation
CN101192997A (en) * 2006-11-24 2008-06-04 中国科学院沈阳自动化研究所 Remote status monitoring and fault diagnosis system for distributed devices
CN101170447A (en) * 2007-11-22 2008-04-30 北京邮电大学 Service failure diagnosis system based on active probe and its method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
褚灵伟 等.多域服务环境下的分布式故障诊断算法.《电子与信息学报》.2010,第32卷(第4期),836-840. *

Also Published As

Publication number Publication date
CN101674196A (en) 2010-03-17

Similar Documents

Publication Publication Date Title
CN101674196B (en) Multi-domain collaborative distributed type fault diagnosis method and system
CN101507185B (en) Fault location in telecommunications networks using bayesian networks
CN107896168B (en) Multi-domain fault diagnosis method for power communication network in network virtualization environment
Cauffriez et al. Design of intelligent distributed control systems: a dependability point of view
WO2013189110A1 (en) Power communication fault early warning analysis method and system
CN103412861A (en) Knowledge based parsing
CN101170447A (en) Service failure diagnosis system based on active probe and its method
CN102684902B (en) Based on the network failure locating method of probe prediction
CN110351391A (en) A kind of vehicle diagnosis cloud platform system, service implementation method
CN101715203B (en) Method and device for automatically positioning fault points
CN104660448B (en) Distributed-tier multiple domain system Multi-Agent collaborative fault diagnosis methods
JPH10502222A (en) Hardware distributed management method and system
CN114745286A (en) Intelligent network situation perception system facing dynamic network based on knowledge graph technology
CN105471926A (en) Diagnostic data cloud model constructing system based on autonomous decentralized system
CN108156061A (en) Esb monitoring service platforms
Odintsova et al. Multi-fault diagnosis in dynamic systems
CN112101422B (en) Typical case self-learning method for power system fault case
CN108173711A (en) Enterprises system data exchange monitoring method
He et al. A distributed network alarm correlation analysis mechanism for heterogeneous networks
Meskina et al. An efficient simulator for fault detection and recovery in smart grids FDIRSY
CN110391947A (en) A kind of power telecom network equipment fault diagnosis method of multi-controller cooperation
CN116150257B (en) Visual analysis method, system and storage medium for electric power communication optical cable resources
Liu et al. Anomaly detection based on dual-threaded blockchain in large-scale intelligent networks
Jahan et al. Detecting emergent behavior in scenario-based specifications using a probabilistic model
Ding Probabilistic Fault Management in Distributed Systems

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111207

Termination date: 20130616