CN114341877A - Root cause analysis method, root cause analysis device, electronic apparatus, root cause analysis medium, and program product - Google Patents

Root cause analysis method, root cause analysis device, electronic apparatus, root cause analysis medium, and program product Download PDF

Info

Publication number
CN114341877A
CN114341877A CN201980100087.0A CN201980100087A CN114341877A CN 114341877 A CN114341877 A CN 114341877A CN 201980100087 A CN201980100087 A CN 201980100087A CN 114341877 A CN114341877 A CN 114341877A
Authority
CN
China
Prior art keywords
node
propagation
nodes
abnormal
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980100087.0A
Other languages
Chinese (zh)
Inventor
王冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens Ltd China
Original Assignee
Siemens Ltd China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Ltd China filed Critical Siemens Ltd China
Publication of CN114341877A publication Critical patent/CN114341877A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models

Abstract

Root cause analysis method, apparatus, electronic device, medium, and program product. The root cause analysis method comprises the following steps: extracting a propagation graph from the knowledge graph added with the annotation, wherein the propagation graph comprises abnormal nodes with abnormal conditions and nodes with propagation relations with the abnormal nodes (S102); and analyzing the root cause of the abnormal node in the abnormal condition based on the attributes of the nodes in the propagation graph (S104).

Description

Root cause analysis method, root cause analysis device, electronic apparatus, root cause analysis medium, and program product Technical Field
The present disclosure relates generally to the field of industrial technology, and more particularly, to root cause analysis methods, apparatuses, electronic devices, media, and program products
Background
In the manufacturing industry, Key Performance Indicators (KPIs) can be used as important tools for evaluating and controlling manufacturing processes in a number of related areas, such as machine equipment, production scheduling and execution, products, inventory, and the like.
But provide only very limited insight if these aspects are not analyzed effectively. The value of the analysis depends greatly on the quality of the data, the data/context of the scene, the tools used and the analytical skills.
Root cause analysis of abnormal KPIs or performance degradation is a common scenario for KPI analysis. Especially for manufacturing processes, abnormal KPIs often represent a potential risk that may lead to economic losses in case of failure. Root cause analysis is a very challenging task for the following reasons:
1. fault propagation and cumulative effects
2. Complex rule combinations
3. Huge amount and type of data
Domain knowledge is the basis for performing such analyses. In reality, however, the bottleneck is to utilize the personal abilities of a person for knowledge fusion, processing and reasoning. For example, it is very easy to determine whether one or/and gate is working properly, but if two gates are connected in series or in parallel, more work is required to find the fault. It is conceivable that if there are hundreds of such "simple" elements combined, it is almost impossible for a human to perform the task.
In particular, the cloud platform may collect data from different information islands (silos) and visualize. The cloud itself is semantically unable to link data and requires a number of extract-transform-load (ETL) tasks to be performed before analysis can be performed. It is desirable to have a solution that can embed domain knowledge on the propagation chain and semantically integrate data.
Root cause analysis can be divided into two main types: based on the data and based on the derivation. The application field of the root cause analysis method is very wide and is not limited to factory process control, computer system performance, program debugging, wireless network and the like.
Reference 1(WO2017118380a1, "fingerprinting root cause analysis in cellular systems") relates to learning rules from historical performance data to characterize the association between metrics, monitoring anomalies and matching rules.
Reference 2(US20150074035a1, "detecting root causes of transaction degradation using a causal bayesian network"), states are associated with application transactions and components, the determined states are used as inputs to build the bayesian network, and a set of root causes is derived by traversing the bayesian network.
Reference 3(US10210189B2, "root cause analysis of performance problems") calculates database performance values based on monitored KPIs and database performance outputs. To determine that the database performance value is below the threshold, KPI correlation coefficients and correlation matrices are generated and used to determine the objective function.
Disclosure of Invention
The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. It should be understood that this summary is not an exhaustive overview of the invention. It is not intended to determine the key or critical elements of the present invention, nor is it intended to limit the scope of the present invention. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.
According to one aspect of the present disclosure, there is provided a root cause analysis method, including: extracting a propagation graph from the knowledge graph added with the annotation, wherein the propagation graph comprises abnormal nodes with abnormal conditions and nodes with propagation relations with the abnormal nodes; and analyzing the root cause of the abnormal node in the abnormal condition based on the attributes of the nodes in the propagation graph.
Optionally, in an example of the above aspect, before extracting the propagation map from the annotated knowledge-graph, the method further comprises: the method comprises the steps of constructing a knowledge graph based on an ontology model, and adding a propagation attribute to a relation between at least one pair of nodes in the constructed knowledge graph, wherein the propagation attribute comprises two sub-attributes of a direction and a context, the direction represents a propagation direction between the two nodes, and the context represents a scene to be subjected to root cause analysis.
Optionally, in one example of the above aspect, extracting the propagation graph from the annotated knowledge-graph comprises: and extracting a propagation graph related to the abnormal node from the knowledge graph added with the annotation by using a query statement.
Optionally, in an example of the above aspect, the propagation direction between nodes is also included in the propagation graph.
Optionally, in an example of the above aspect, analyzing the root cause of the abnormal node based on the attributes of the nodes in the propagation graph includes:
arranging all nodes in the propagation graph according to the weight descending order of the nodes;
starting from the first node of the sequence, the following operations are performed:
calculating an anomaly probability for a node based on at least one factor affecting a state of the node;
if the abnormal probability is larger than a preset threshold value, determining that the node is the root cause causing the abnormal condition, and stopping operation;
otherwise, the operation is performed for the next node until the operation is performed for all nodes.
Optionally, in one example of the above aspect, the weight of the node is determined based on at least one of: the number of paths from the node to the anomalous node, the distance from the node to the anomalous node, and whether the node is an intermediate node.
Optionally, in one example of the above aspect, calculating the anomaly probability for the node based on at least one factor affecting a state of the node comprises: and calculating the abnormal probability of the node based on the influence degree of the dominant factor and the recessive factor in the at least one factor influencing the state of the node on the abnormal condition and the moving average difference of each factor.
According to another aspect of the disclosure, a root cause analysis device is provided, which includes a propagation graph extraction unit configured to extract a propagation graph from a knowledge graph with annotations added, wherein the propagation graph includes abnormal nodes with abnormal situations and nodes with propagation relations with the abnormal nodes; and the analysis unit is configured to analyze the root cause of the abnormal node in the abnormal condition based on the attribute of the node in the propagation graph.
Optionally, in an example of the above aspect, the apparatus further comprises: the knowledge graph annotation unit is configured to construct a knowledge graph based on the ontology model, and add an annotation attribute to the relationship between at least one pair of nodes in the constructed knowledge graph, wherein the annotation attribute comprises two sub-attributes of a direction and a context, the direction represents a propagation direction between the two nodes, and the context represents a scene to be subjected to root cause analysis.
Optionally, in an example of the above aspect, the propagation map extracting unit is further configured to: and extracting a propagation graph related to the abnormal node from the knowledge graph added with the annotation by using a query statement.
Optionally, in an example of the above aspect, the propagation direction between nodes is also included in the propagation graph.
Optionally, in an example of the above aspect, the analysis unit is further configured to:
arranging all nodes in the propagation graph according to the weight descending order of the nodes;
starting from the first node of the sequence, the following operations are performed:
calculating an anomaly probability for a node based on at least one factor affecting a state of the node;
if the abnormal probability is larger than a preset threshold value, the node is considered to be the root cause of the abnormal condition, and the operation is stopped;
otherwise, the operation is performed for the next node until the operation is performed for all nodes.
Optionally, in one example of the above aspect, the weight of the node is determined based on at least one of: the number of paths from the node to the abnormal node, the distance from the node to the abnormal node, and whether the node is an intermediate node.
Optionally, in an example of the above aspect, the analysis unit is further configured to: and calculating the abnormal probability of the node based on the influence degree of the dominant factor and the recessive factor in the at least one factor influencing the state of the node on the abnormal condition and the moving average difference of each factor.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory coupled to the at least one processor, the memory for storing instructions that, when executed by the at least one processor, cause the processor to perform the method as described above.
According to another aspect of the disclosure, there is provided a non-transitory machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method as described above.
According to another aspect of the present disclosure, there is provided a computer program comprising computer-executable instructions that, when executed, cause at least one processor to perform the method as described above.
According to another aspect of the present disclosure, there is provided a computer program product tangibly stored on a computer-readable medium and comprising computer-executable instructions that, when executed, cause at least one processor to perform the method as described above.
According to the method and the device disclosed by the invention, the knowledge of fault propagation can be embedded into the ontology, the advantages of the knowledge graph on large-scale data integration are fully utilized, and only a small amount of work is needed to construct a propagation graph added with annotations.
According to the methods and apparatus of the present disclosure, data can be semantically integrated and linked across systems, formats, and locations through a unified architecture. Thereby breaking through the limits of data volume and type and realizing more complex data application and analysis tasks.
According to the method and the device, the quality of data can be ensured and the problem of data isolated island is solved by using the unified integrated template based on the ontology model.
Drawings
The above and other objects, features and advantages of the present invention will be more readily understood by reference to the following description of the embodiments of the present invention taken in conjunction with the accompanying drawings. The components in the figures are meant to illustrate the principles of the present invention. In the drawings, the same or similar technical features or components will be denoted by the same or similar reference numerals.
FIG. 1 is a flow diagram illustrating an exemplary process of a root cause analysis method according to one embodiment of the present disclosure;
FIGS. 2A-2C are schematic diagrams illustrating different propagation relationships, respectively;
FIG. 3 is a schematic diagram of a specific example of a propagation map;
FIG. 4 is a schematic diagram of a specific example of another propagation map;
FIG. 5 is a schematic diagram of a specific example of yet another propagation map;
FIG. 6 is a flowchart illustrating an exemplary process of the operation of block S104 in FIG. 1;
FIG. 7 is a block diagram illustrating an exemplary configuration of a root cause analysis apparatus 700 according to one embodiment of the present disclosure; and
FIG. 8 illustrates a block diagram of an electronic device 1000 for root cause analysis in accordance with an embodiment of the disclosure.
Reference numerals
100: root cause analysis methods S101, S102, S104, S1042,
S1044、S1046、S1048、S1049:
Step (ii) of
201: parent object 202: child object
203: object 1204: object 2
205: object 3206: object 4
300. 400 and 500: propagation map PL: production line
U1 … Un: cell 1 … cell n M1 … Mk: machine tool 1 … machine tool k
M: machine tool LA: production line A
U1: unit 1M 1: machine tool 1
S1: sensor 1U 2: unit 2
M2: machine tool 2S 2: sensor 2
CT: cycle time S3: sensor 3
W: worker WO: work order
WP: and (3) working plan P: product(s)
M: PO as a material: purchase order
SL supplier WH warehouse
U3 Unit 3U 4: unit 4
LB: production line B700: root cause analysis device
702: propagation map extraction unit 704: analysis unit
701: knowledge-graph annotation unit 800: electronic device
802: the processor 804: memory device
Detailed Description
The subject matter described herein will now be discussed with reference to example embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and thereby implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as needed. For example, the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. In addition, features described with respect to some of the examples may also be combined in other examples.
As used herein, the term "include" and its variants mean open-ended terms in the sense of "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment". The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. The definitions of a term are consistent throughout the specification unless context clearly dictates otherwise.
The present disclosure provides a solution to embed domain knowledge in a propagation chain and to semantically integrate data. The method according to the disclosed embodiments may reduce dependency on human power with the aid of a knowledge base.
A root cause analysis method according to an embodiment of the present disclosure will now be described with reference to the accompanying drawings.
FIG. 1 is a flow chart illustrating an exemplary process of a root cause analysis method 100 according to one embodiment of the present disclosure.
First, in block S102, a propagation graph is extracted from a knowledge graph to which an annotation (annotation) is added, wherein the propagation graph includes an abnormal node where an abnormal condition occurs and a node having a propagation relationship with the abnormal node.
The concept of a propagation map is presented in this disclosure. The propagation map is a sub-map extracted from a knowledge-graph with annotations added.
In a method according to an embodiment of the present disclosure, generating a propagation graph is aided by adding annotations in a knowledge graph. Specifically, a propagation attribute is defined and added to the relationship between at least one pair of nodes in the knowledge-graph, the propagation attribute may include two sub-attributes of a direction and a context, wherein the direction represents the propagation direction between two nodes, and the context is used to define a specific scene involved in the knowledge-graph.
Propagation is the representation of the relationship between two events: a second event occurs after a first event that has a certain probability as the cause of the occurrence of the second event.
In general, the propagation relationship between two events may include the following three.
The first is containment (inclusion) propagation, which is a flow from child to parent, as shown in FIG. 2A, a flow from child object 202 to parent object 201, i.e., a child fault, can cause a parent fault. For example, a bad tool may have a 90% probability of making a CNC (computer Numerical Control) machine unusable. In general, the inclusion of a propagation schema may be appropriate for most hierarchical system architectures, such as the ISA-95 enterprise model.
The second is upstream-downstream propagation, which is a flow from upstream to downstream, as shown in fig. 2B, from object 1203 to object 2204. That is, an upstream fault causes a downstream fault. For example, a longer processing time of machine 1 causes a longer waiting time of machine 2, or a delay of the supplier causes a delay of the product planning. Such upstream and downstream propagation patterns may be used to describe the relationship between independent events.
The third is the equivalence relationship, as shown in FIG. 2C, object 3205 is equivalent to object 4206. Flow equivalence means that there is no (no) propagation, i.e. if object 3 fails, object 4 must also fail. Such patterns can be used to describe binding events, for example, product quality and quality tests are equivalent relationships (without regard to the test error itself).
A knowledge graph constructed based on an ontology model is a set of three-dimensional data that includes objects and relationships between the objects, where the relationships are also directional. But the propagation direction may be different from the general relationship direction defined in the ontology. In other words, propagation is a specific type of relationship that is determined by objective facts rather than subjective descriptions. For example, a relationship from A to B "A has component B" is equivalent to a relationship from B to A "B is a component of A". But for both "with component" and "component that is …", the propagation direction is always from B to a (containing the propagation relationship of the mode).
Therefore, in the embodiment according to the present disclosure, the added propagation attribute includes a direction sub-attribute to indicate whether the propagation direction of the relationship between the two nodes coincides with the direction defined in the ontology.
The following code shows one example of an added annotation.
Figure PCTCN2019107571-APPB-000001
From this piece of code, it can be seen that the attribute "propagation" has two sub-attributes: "context" and "direction". Taking "hasdart" (with component) as an example, this annotation means that in the context of tracking the abnormal case of "CycleTime", the propagation direction is opposite to that of the attribute "hasdart".
Furthermore, all propagation relationships should be defined with a specific context or scenario. Since the relationship between two events may be different in different contexts. For example, the relationship between product quality and quality testing described above should be an upstream-downstream relationship in view of testing errors, since even a perfect product may have failed testing results.
Thus, in embodiments according to the present disclosure, the added propagation attributes also include context, which may define the specific scenario to which the knowledge-graph relates. In addition, context may also define the boundaries of the propagation map.
In summary, by adding a propagation attribute to the relationship between nodes in the knowledge graph constructed based on the ontology model, the knowledge graph with the added annotation is obtained, and then the propagation graph can be extracted from the knowledge graph.
In one example, a propagation graph can be extracted from a knowledge graph using a query statement (such as SPARQL). Those skilled in the art will appreciate that the specific operations of extracting propagation graphs related to nodes in which abnormal situations occur from a knowledge graph by writing query statements in a query language will not be described in detail herein.
Fig. 3 shows a schematic diagram of a specific example 300 of a propagation diagram of a production line. Fig. 3 includes a production line PL, units 1U 1 to n Un, machine tools 1 to k, and machine tool M.
The propagation map shown in FIG. 3 is modeled as a tree of vertical hierarchies and horizontal connections, which is a combination of containment and upstream and downstream relationships.
Fig. 4 is a schematic illustration of a propagation diagram 400 for production management.
This is a more specific example, showing a propagation map in the context of a KPI of "cycle time for line a". Including line A LA, unit 1U 1, machine tool 1M 1, sensor 1S 1, unit 2U 2, machine tool 2M 2, sensor 2S 2, cycle time CT, sensor 3S 3, worker W, work order WO, work plan WP, product P, material M, purchase order PO, supplier SL, warehouse WH, unit 3U 3, unit 4U 4, and line B LB. The left part of fig. 3 is the relevant physical object/facility, while the right part is the relevant information object. In fig. 3, a dashed line frame indicates that an abnormality occurs in the cycle time CT of the production line a LA. Although the KPI is dominated by the physical object "line A", however, the factors that may affect the KPI are not limited to physical objects. In practice, the entire propagation graph is built by enumerating the propagation relationships for each pair of object nodes.
In one example, before performing the operation of extracting the propagation map in block S102, the operation in block S101 may be performed: and constructing a knowledge graph based on the ontology model, and adding a propagation attribute to the relationship between at least one pair of nodes in the constructed knowledge graph. Wherein the propagation attribute may comprise two sub-attributes of direction representing the direction of propagation between two nodes and context representing the scenario where root cause analysis is to be performed.
As will be appreciated by those skilled in the art, in another example, a knowledge-graph may be pre-constructed and a propagation attribute added to the relationship between at least one pair of nodes in the constructed knowledge-graph, and then this annotated knowledge-graph may be stored in a medium for use by a query without having to perform the operations in block S101 each time.
After generating the propagation map, the operations in block S104 may be performed: analyzing the root cause of the abnormal node in the abnormal condition based on the attributes of the nodes in the propagation graph.
For root cause analysis, it is critical to assess the likelihood of a candidate cause being the root cause. Two indicators are proposed in a method according to an embodiment of the present disclosure to evaluate a node: the weight of the node and the probability of anomaly of the node.
1. Weights of nodes
The weight of a node is used to represent the importance of a node in the overall propagation graph. The importance of a node may be determined based on, for example, the number of paths from the node to the originating node where the abnormal condition occurred, the distance from the node to the originating node, and whether the node is an intermediate node.
In one example, the weight of one node may be calculated by the following equation (1).
Figure PCTCN2019107571-APPB-000002
Where path _ num represents the number of paths from node n to m; dist (n, m) represents the shortest distance from node n to m.
Here, the node m represents a starting node (also referred to as an abnormal node) at which an abnormal condition starts to occur, i.e., a "cycle time CT" represented by a dashed box in fig. 4, for example. Node n represents any node along the propagation flow. The number of paths is to indicate how many paths are from the abnormal node to the node; the shortest distance represents the number of nodes passed by the node to the abnormal node.
Those skilled in the art will understand that the calculation of the node weight is not limited to the above equation (1), but may be calculated by using any function related to the number of paths from the node to the abnormal node, the distance from the node to the abnormal node, and whether the node is an intermediate node, and the considered factors may also be not limited to these factors, and may also include any other factors related to the importance degree of the node, such as the conditional probability of the node mentioned below, and will not be described in detail herein.
Fig. 5 shows a propagation map relating to a node "cycle time CT", wherein the node "cycle time CT" in the dotted line frame indicates an abnormal node where an abnormal situation occurs. Table 1 below shows the weight of each node in fig. 5.
Node point Number of paths Shortest distance of n to m Weight of
Production line LA - 0 -
Unit 1U 1 1 1 1
Unit 2U 2 1 1 1
Machine tool 1M 1 3 2 1.5
Machine tool 2M 2 2 2 1
Sensor 1S 1 0 - -
Sensor 2S 2 1 1 1
Sensor 3S 3 1 1 1
TABLE 1
In fig. 5, "line a LA" is defined as equivalent to "cycle time CT", so they are both starting points. "sensor 1S 1" has no path to either "line A LA" or "cycle time CT" and therefore should not be within the range of causes of the abnormal condition, so it can be seen that sensor 1S 1 has no weight in Table 1.
2. Abnormal probability of node
The abnormal probability of a node is a probability indicating whether or not a node is in an abnormal state. In a method according to an embodiment of the present disclosure, a factor determining a likelihood that a node is in an abnormal state
Including dominant and recessive factors. Explicit factors are factors explicitly specified in the attributes of a node, such as the input of a KPI formula; the remaining factors may be referred to as implicit factors such as the temperature, humidity, etc. of the environment.
In one example, to evaluate whether a node is abnormal, a moving average difference may be calculated separately for each of its factors, the moving average difference is a measure of the deviation of the short-term behavior from the long-term behavior of one factor, and may be, for example, the degree of change of the value of one factor with respect to its normal state, where the length of the moving window may be specified according to the situation. Each factor may be quantified by calculating a moving average difference for each factor, thereby calculating an anomaly probability for a node based on the quantified factors.
In one example, the anomaly probability of a node may be calculated, for example, by equation (2) below.
Figure PCTCN2019107571-APPB-000003
Wherein p isabIs the anomaly probability of a node, wiIs the weight of the factor, diff (f)i) Is the moving average difference.
The weight of a factor is a weight that represents the contribution of the factor to the abnormal state likelihood estimate. For example, dominant factors are generally more important than recessive factors. If there is no a priori knowledge about this factor, a default value may be set.
The moving average difference of all the factors of a node can be weighted and summed by the above equation (2) to calculate the abnormal probability of the node. If the calculated anomaly probability is greater than a predetermined threshold, the node may be considered anomalous, i.e., the node may be the root cause node for the occurrence of the anomaly.
Those skilled in the art will understand that the manner of calculating the abnormal probability of the node is not limited to the manner of the above equation (2), and may also be calculated in the following manner, for example: and performing machine learning by using different data of each factor of each node as sample data, and calculating the abnormal probability of one node by using a model obtained by the machine learning. Therefore, in the method according to the embodiment of the present disclosure, no limitation is made on a specific manner of calculating the abnormal probability of the node based on the factors affecting the state of the node.
In one example, the annotation added to a node may also include probability information for the node, which may indicate uncertainty of a causal relationship between the node and its parent. The propagation map thus constructed resembles a Bayesian (Bayesian) network. Each node in the propagation graph has a table of probability information that includes the conditional probability between the node and its immediate parent. It is understood that a node may have different tables of probability information in different contexts.
In the case where the added annotation includes probability information of a node, the probability information of the node may be taken as a factor in calculating the weight of the node. If the probability information of a node is not included in the annotation, it is equivalent to a probability of 1 or 0, i.e. there is a propagation relationship or no propagation relationship between two nodes.
From the above, the weight of the node and the abnormal probability of the node can be determined for each node, and then root cause analysis can be started. In the process of root cause analysis, the node weight is mainly used for determining which node to start from and in which order to analyze the nodes so as to improve the analysis speed and efficiency; the abnormal probability of a node is the root cause for determining whether the node is an abnormal situation. An exemplary process for analyzing the root cause of an abnormal condition of the abnormal node based on the attributes of the nodes in the propagation graph is described below with reference to fig. 6.
Fig. 6 is a flowchart illustrating an exemplary process of the operation in block S104.
First, in block S1042, all nodes in the propagation graph are arranged in descending order according to the weight of the nodes.
Then, starting from the first node of the sequence, the following operations are performed:
in block S1044: calculating an anomaly probability for a node based on at least one factor affecting a state of the node;
in decision block S1046: determining whether the calculated anomaly probability is greater than a predetermined threshold, and if the anomaly probability is greater than the predetermined threshold, denoted as Y in the figure, proceeding to operation in block S1048; otherwise, denoted as N in the figure, the operation in block S1044 is performed on the next node.
In block S1048: it is determined that the node is the root cause of the abnormal situation.
Finally, the operation ends in S1049.
The predetermined threshold value may be set empirically by those skilled in the art.
By the exemplary process of analyzing the root cause of an abnormal condition at an abnormal node as shown in fig. 6, the root cause of the abnormal condition can be determined.
Those skilled in the art will appreciate that the analysis method according to the embodiments of the present disclosure may also find no root cause in the propagation map that causes the abnormal situation, in which case the operation is ended after the operations have been performed on all nodes.
The method according to the present disclosure is improved in view of the problems in the prior art from the following aspects.
Propagation between two "simple" elements of a direct link is more deterministic and can be clarified more easily than fault derivation. By making annotations, such domain knowledge may be embedded in the knowledge base. Thus converting fault propagation in large networks into sub-graph extraction problems.
The annotation includes both orientation and context aspects. To define annotations, the domain knowledge should first be decomposed into minimal elements, which means a very specific and deterministic description of the propagation. Context can define the boundaries of the propagation map to reduce complexity. With the help of the graphical algorithm, the quantitative indicators help the rules to meet the evaluation. For example, the level of influence of device a on device B may be evaluated by the distance from device a to device B.
With ontologies & knowledge graphs, data can be semantically integrated and linked across systems, formats, and locations through a unified architecture. Thereby breaking through the limits of data volume and type and realizing more complex data application and analysis tasks.
According to the method disclosed by the invention, the knowledge of fault propagation can be embedded into the ontology, the advantages of the knowledge graph on large-scale data integration are fully utilized, and only a small amount of work is needed to construct the propagation graph added with the annotation.
Aiming at the data island problem, a plurality of mature technical schemes exist based on the knowledge graph. The onto-model is a unified template for integration so that the quality of data can be ensured.
The construction of the propagation map is converted into a query task by adding comments to the knowledge map. And the query results can be dynamically adjusted according to the input of the context included in the annotation.
Digitization is based on accessible data and knowledge of the system. The technology of the internet of things is benefited, the bottleneck is broken through by data acquisition, and more events can be interconnected. The purpose of digitization is not just "digitization" itself, but rather the use of data and knowledge to gain added value. The method according to the invention provides a good technical solution to obtain commercial insights from the data. In particular, the invention is not limited to the tracking problem of KPIs, but proposes a method for evaluating the impact of an individual on the whole network. In the context of the manufacturing industry, this may be: equipment/material/human importance; tracking quality; order delivery analysis, and the like. Most of the above problems are very challenging due to complexity and dependence on analytical skills. The method can save labor through the inference assisted by the atlas, and help people to judge more accurately by the atlas.
Fig. 7 is a block diagram illustrating an exemplary configuration of a root cause analysis apparatus 700 according to one embodiment of the present disclosure.
As shown in fig. 7, the root cause analysis apparatus 700 includes: a propagation map extraction unit 702 and an analysis unit 704.
The propagation map extracting unit 702 is configured to extract a propagation map from the knowledge graph with the added annotation, wherein the propagation map comprises abnormal nodes with abnormal situations and nodes with propagation relations with the abnormal nodes.
The analyzing unit 704 is configured to analyze a root cause of the abnormal node for an abnormal situation based on the attributes of the nodes in the propagation graph.
In one example, the root cause analysis apparatus 700 may further include a knowledge graph annotation unit 701 configured to construct a knowledge graph based on the ontology model, and add an annotation attribute to a relationship between at least one pair of nodes in the constructed knowledge graph, wherein the annotation attribute includes two sub-attributes of a direction and a context, the direction represents a propagation direction between the two nodes, and the context defines a specific scenario involved in the knowledge graph.
Wherein the propagation map extraction unit 702 is further configured to: and extracting a propagation graph related to the abnormal node from the knowledge graph added with the annotation by using a query statement.
Wherein, the propagation direction between the nodes is also included in the propagation graph.
Wherein the analysis unit 704 is further configured to:
arranging all nodes in the propagation graph according to the weight descending order of the nodes;
starting from the first node of the sequence, the following operations are performed:
calculating an anomaly probability for a node based on at least one factor affecting a state of the node;
if the abnormal probability is larger than a preset threshold value, the node is considered to be the root cause of the abnormal condition, and the operation is stopped;
otherwise, the operation is performed for the next node until the operation is performed for all nodes.
Wherein the weight of the node is determined based on at least one of:
the number of paths from the node to the anomalous node, the distance from the node to the anomalous node, and whether the node is an intermediate node.
The analysis unit 704 is further configured to: and calculating the abnormal probability of the node based on the influence degree of the dominant factor and the recessive factor in the at least one factor influencing the state of the node on the abnormal condition and the moving average difference of each factor.
The details of the operation and function of the various parts of the root cause analysis apparatus 700 may be, for example, the same as or similar to those associated with the embodiments of the root cause analysis method 100 of the present disclosure described with reference to fig. 1-6, and will not be described in detail herein.
It should be noted that the structure of the root cause analysis apparatus 700 and its constituent units shown in fig. 7 is merely an example, and those skilled in the art may modify the structural block diagram shown in fig. 7 as needed.
Embodiments of a root cause analysis method and apparatus according to embodiments of the present disclosure are described above with reference to fig. 1 through 7. The root cause analysis device described above may be implemented by hardware, or may be implemented by software, or a combination of hardware and software.
FIG. 8 illustrates a block diagram of an electronic device 800 that performs root cause analysis in accordance with an embodiment of the disclosure. According to one embodiment, the electronic device 800 may include at least one processor 802, the processor 802 executing at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in a computer-readable storage medium (i.e., the memory 804).
In one embodiment, computer-executable instructions are stored in the memory 804 that, when executed, cause the at least one processor 802 to perform the following: extracting a propagation graph from the knowledge graph added with the annotation, wherein the propagation graph comprises abnormal nodes with abnormal conditions and nodes with propagation relations with the abnormal nodes; and analyzing the root cause of the abnormal node in the abnormal condition based on the attributes of the nodes in the propagation graph.
It should be appreciated that the computer-executable instructions stored in the memory 804, when executed, cause the at least one processor 802 to perform the various operations and functions described above in connection with fig. 1-7 in the various embodiments of the present disclosure.
According to one embodiment, a non-transitory machine-readable medium is provided. The non-transitory machine-readable medium may have machine-executable instructions (i.e., elements described above as being implemented in software) that, when executed by a machine, cause the machine to perform various operations and functions described above in connection with fig. 1-7 in various embodiments of the present disclosure.
According to one embodiment, there is provided a computer program comprising computer-executable instructions that, when executed, cause at least one processor to perform the various operations and functions described above in connection with fig. 1-7 in the various embodiments of the present disclosure.
According to one embodiment, a computer program product is provided that includes computer-executable instructions that, when executed, cause at least one processor to perform the various operations and functions described above in connection with fig. 1-7 in the various embodiments of the present disclosure.
The detailed description set forth above in connection with the appended drawings describes exemplary embodiments but does not represent all embodiments that may be practiced or fall within the scope of the claims. The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (18)

  1. A root cause analysis method comprising:
    extracting a propagation graph from the knowledge graph added with the annotation, wherein the propagation graph comprises abnormal nodes with abnormal conditions and nodes with propagation relations with the abnormal nodes; and
    analyzing the root cause of the abnormal node in the abnormal condition based on the attributes of the nodes in the propagation graph.
  2. The method of claim 1, prior to extracting the propagation map from the annotated knowledge-graph, the method further comprising:
    the method comprises the steps of constructing a knowledge graph based on an ontology model, and adding a propagation attribute to a relation between at least one pair of nodes in the constructed knowledge graph, wherein the propagation attribute comprises two sub-attributes of a direction and a context, the direction represents a propagation direction between the two nodes, and the context represents a scene to be subjected to root cause analysis.
  3. The method of claim 1, wherein extracting a propagation map from the annotated knowledge-graph comprises:
    and extracting a propagation graph related to the abnormal node from the knowledge graph added with the annotation by using a query statement.
  4. The method of claim 2, wherein propagation directions between nodes are also included in the propagation graph.
  5. The method of any one of claims 1-4, wherein analyzing the root cause of the abnormal node for an abnormal condition based on the attributes of the nodes in the propagation graph comprises:
    arranging all nodes in the propagation graph according to the weight descending order of the nodes;
    starting from the first node of the sequence, the following operations are performed:
    calculating an anomaly probability for a node based on at least one factor affecting a state of the node;
    if the abnormal probability is larger than a preset threshold value, determining that the node is the root cause causing the abnormal condition, and stopping operation;
    otherwise, the operation is performed for the next node until the operation is performed for all nodes.
  6. The method of claim 5, wherein the weight of the node is determined based on at least one of:
    the number of paths from the node to the anomalous node, the distance from the node to the anomalous node, and whether the node is an intermediate node.
  7. The method of claim 5, wherein calculating the anomaly probability for a node based on at least one factor affecting the state of the node comprises:
    and calculating the abnormal probability of the node based on the influence degree of the dominant factor and the recessive factor in the at least one factor influencing the state of the node on the abnormal condition and the moving average difference of each factor.
  8. Root cause analysis apparatus (700), comprising:
    a propagation graph extracting unit (702) configured to extract a propagation graph from the knowledge graph with the added annotation, wherein the propagation graph comprises abnormal nodes with abnormal conditions and nodes with propagation relations with the abnormal nodes; and
    an analyzing unit (704) configured to analyze a root cause of an abnormal situation occurring at the abnormal node based on the attributes of the nodes in the propagation graph.
  9. The apparatus (700) of claim 8, further comprising:
    the knowledge graph annotation unit (701) is configured to construct a knowledge graph based on the ontology model, and add an annotation attribute to the relationship between at least one pair of nodes in the constructed knowledge graph, wherein the annotation attribute comprises two sub-attributes of a direction and a context, the direction represents the propagation direction between the two nodes, and the context represents a scene to be subjected to root cause analysis.
  10. The apparatus (700) of claim 8, wherein the propagation map extraction unit (702) is further configured to:
    and extracting a propagation graph related to the abnormal node from the knowledge graph added with the annotation by using a query statement.
  11. The apparatus (700) of claim 9, wherein propagation directions between nodes are also included in the propagation map.
  12. The apparatus (700) according to any of claims 8-11, wherein the analyzing unit (704) is further configured to:
    arranging all nodes in the propagation graph according to the weight descending order of the nodes;
    starting from the first node of the sequence, the following operations are performed:
    calculating an anomaly probability for a node based on at least one factor affecting a state of the node;
    if the abnormal probability is larger than a preset threshold value, the node is considered to be the root cause of the abnormal condition, and the operation is stopped;
    otherwise, the operation is performed for the next node until the operation is performed for all nodes.
  13. The apparatus (700) of claim 12, wherein the weight of the node is determined based on at least one of:
    the number of paths from the node to the anomalous node, the distance from the node to the anomalous node, and whether the node is an intermediate node.
  14. The apparatus of claim 12, wherein the analysis unit (704) is further configured to:
    and calculating the abnormal probability of the node based on the influence degree of the dominant factor and the recessive factor in the at least one factor influencing the state of the node on the abnormal condition and the moving average difference of each factor.
  15. An electronic device (800) comprising:
    at least one processor (802); and
    a memory (804) coupled to the at least one processor (802), the memory for storing instructions that, when executed by the at least one processor (802), cause the processor (802) to perform the method of any of claims 1-7.
  16. A non-transitory machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method of any one of claims 1 to 7.
  17. A computer program comprising computer-executable instructions that, when executed, cause at least one processor to perform the method of any one of claims 1 to 7.
  18. A computer program product tangibly stored on a computer-readable medium and comprising computer-executable instructions that, when executed, cause at least one processor to perform the method of any one of claims 1 to 7.
CN201980100087.0A 2019-09-24 2019-09-24 Root cause analysis method, root cause analysis device, electronic apparatus, root cause analysis medium, and program product Pending CN114341877A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/107571 WO2021056197A1 (en) 2019-09-24 2019-09-24 Root cause analysis method and apparatus, electronic device, medium and program product

Publications (1)

Publication Number Publication Date
CN114341877A true CN114341877A (en) 2022-04-12

Family

ID=75165573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980100087.0A Pending CN114341877A (en) 2019-09-24 2019-09-24 Root cause analysis method, root cause analysis device, electronic apparatus, root cause analysis medium, and program product

Country Status (2)

Country Link
CN (1) CN114341877A (en)
WO (1) WO2021056197A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360720B (en) * 2021-06-24 2023-11-21 湖北华中电力科技开发有限责任公司 Data asset visualization method, device and equipment based on data blood relationship
WO2023159574A1 (en) * 2022-02-28 2023-08-31 西门子股份公司 Anomaly detection method and apparatus, computer-readable medium and electronic apparatus
CN114598539B (en) * 2022-03-16 2024-03-01 京东科技信息技术有限公司 Root cause positioning method and device, storage medium and electronic equipment
WO2023230788A1 (en) * 2022-05-30 2023-12-07 西门子股份公司 Semi-automatic knowledge graph construction method and apparatus, and computer device
CN116360387B (en) * 2023-01-18 2023-09-15 北京控制工程研究所 Fault positioning method integrating Bayesian network and performance-fault relation map

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8190543B2 (en) * 2008-03-08 2012-05-29 Tokyo Electron Limited Autonomous biologically based learning tool
CN102496028B (en) * 2011-11-14 2013-03-20 华中科技大学 Breakdown maintenance and fault analysis method for complicated equipment
CN109522192B (en) * 2018-10-17 2020-08-04 北京航空航天大学 Prediction method based on knowledge graph and complex network combination
CN109992440A (en) * 2019-04-02 2019-07-09 北京睿至大数据有限公司 A kind of IT root accident analysis recognition methods of knowledge based map and machine learning
CN110110870B (en) * 2019-06-05 2022-03-22 厦门邑通软件科技有限公司 Intelligent equipment fault monitoring method based on event map technology

Also Published As

Publication number Publication date
WO2021056197A1 (en) 2021-04-01

Similar Documents

Publication Publication Date Title
CN114341877A (en) Root cause analysis method, root cause analysis device, electronic apparatus, root cause analysis medium, and program product
Wu et al. A risk analysis model in concurrent engineering product development
Lechevalier et al. Towards a domain-specific framework for predictive analytics in manufacturing
Okamura et al. SRATS: Software reliability assessment tool on spreadsheet (Experience report)
US10996160B2 (en) Mitigating asset damage via asset data analysis and processing
US20180018209A1 (en) Method and apparatus for a computer-based generation of component fault trees
Kim The property of learning effect based on delayed software S-Shaped reliability model using Finite NHPP Software cost model
Groth et al. A performance shaping factors causal model for nuclear power plant human reliability analysis
US20150363250A1 (en) System analysis device and system analysis method
US20150378802A1 (en) Supporting Global Effect Analysis
Schachinger et al. An advanced data analytics framework for energy efficiency in buildings
CN116993306A (en) Knowledge graph-based construction method and device of network collaborative manufacturing system
Rao M et al. Availability modeling of repairable systems using Markov system dynamics simulation
US11790249B1 (en) Automatically evaluating application architecture through architecture-as-code
TWM592123U (en) Intelligent system for inferring system or product quality abnormality
Nijhawan et al. On development of change point based generalized SRGM for software with multiple releases
JP2016143107A (en) Source code evaluation system and method
CN112162528B (en) Fault diagnosis method, device, equipment and storage medium of numerical control machine tool
Pena et al. Increasing failure rate software reliability models for agile projects: A comparative study
Nannapaneni et al. Predictive Model Markup Language (PMML) representation of Bayesian networks: An application in manufacturing
Bordel et al. Controlling supervised industry 4.0 processes through logic rules and tensor deformation functions
Kapur et al. A unified approach for developing two-dimensional software reliability model
Nannapaneni et al. Automated uncertainty quantification analysis using a system model and data
Yuan et al. Issues of intelligent data acquisition and quality for manufacturing decision-support in an Industry 4.0 context
Imanaka et al. Software reliability modeling based on Burr XII distributions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination