CN114153881A

CN114153881A - High-recall cause and effect discovery method, device and equipment based on time sequence operation and maintenance big data

Info

Publication number: CN114153881A
Application number: CN202111486699.3A
Authority: CN
Inventors: 欧阳梦云
Original assignee: CCB Finetech Co Ltd
Current assignee: CCB Finetech Co Ltd
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2022-03-08

Abstract

The invention relates to the technical field of operation and maintenance data, and provides a high-recall cause and effect discovery method, device and equipment based on time sequence operation and maintenance big data. The high-recall cause and effect discovery method based on the time sequence operation and maintenance big data is based on a cause and effect discovery algorithm and comprises the following steps: predefining a number of relationship rules to be applied to edges and points in the full graph; acquiring streaming operation and maintenance data to be analyzed; initializing the streaming operation and maintenance data into a complete graph by adopting a graph library and a graph algorithm; processing the relationship in the complete graph of the streaming operation and maintenance data according to the relationship rules; outputting a processed cause and effect relationship graph, wherein the cause and effect relationship graph is used for describing cause and effect relationships in the streaming operation and maintenance data. The embodiment provided by the invention can improve the accuracy of the causal relationship analysis of the operation and maintenance data.

Description

High-recall cause and effect discovery method, device and equipment based on time sequence operation and maintenance big data

Technical Field

The invention relates to the technical field of operation and maintenance data, in particular to a high recall cause and effect discovery method based on time sequence operation and maintenance big data, a high recall cause and effect discovery device based on the time sequence operation and maintenance big data, high recall cause and effect discovery equipment based on the time sequence operation and maintenance big data and a corresponding storage medium.

Background

The cause and effect discovery in the operation and maintenance data mostly adopts the SVAR-FCI method to remove edges by using stationarity and additional inference. However, the method does not consider the hypothesis of functional relationship or complex structure, so that the recall rate is low, the formed result cannot accurately reflect the causal relationship in the operation and maintenance data, and the accuracy and the reliability of the analysis of the big data are reduced.

Disclosure of Invention

The embodiment of the invention aims to provide a high-recall cause and effect discovery method, device and equipment based on time sequence operation and maintenance big data.

In order to achieve the above object, a first aspect of the present invention provides a high recall causal discovery method based on time series operation and maintenance big data, where the method is based on a causal discovery algorithm, and the method includes: predefining a number of relationship rules to be applied to edges and points in the full graph; acquiring streaming operation and maintenance data to be analyzed; initializing the streaming operation and maintenance data into a complete graph by adopting a graph library and a graph algorithm; processing the relationship in the complete graph of the streaming operation and maintenance data according to the relationship rules; outputting a processed cause and effect relationship graph, wherein the cause and effect relationship graph is used for describing cause and effect relationships in the streaming operation and maintenance data.

In this embodiment of the present application, processing the relationship in the full graph of the streaming operation and maintenance data according to the several relationship rules includes: defining the following algorithm according to the plurality of relationship rules: a full graph processing main algorithm, a first removal algorithm, a second removal algorithm and a delete edge processing algorithm; processing edges in the complete graph of the streaming operation and maintenance data by adopting the first removal algorithm and the second removal algorithm to realize relational processing; the first removal algorithm is used for determining that all edges of the ordered variable pairs between the complete graph and the non-adjacent variables are to be deleted or reserved; determining the direction of the edge to be reserved according to the causal relationship between the points connected with the edge to be reserved; the second removal algorithm is used for determining edges between the ordered variable pairs, which meet a preset separation set, to be deleted or reserved; determining the direction of the edge to be reserved according to the causal relationship between the points connected with the edge to be reserved; the full graph processing main algorithm is used for providing a main function and calling the first removal algorithm and the second removal algorithm; and the deleted edge processing algorithm is used for carrying out secondary processing on the edges to be deleted determined by the first removing algorithm and the second removing algorithm.

In an embodiment of the present application, the full graph processing main algorithm is configured to: calling the first removal algorithm to perform first traversal on the complete graph, and marking the parent-child relationship in the complete graph; calling the first removal algorithm and the second removal algorithm to identify the complete graph after the first traversal, and determining a partial ancestor graph corresponding to the complete graph; and outputting the determined part of ancestor graphs as processed causal relationship graphs.

In an embodiment of the present application, the first removal algorithm is further configured to: determining ordered variable pairs according to the ordered variables in the complete graph and the adjacency relation; determining that a middle label of an edge between the ordered variable pair is a first type label; determining edges between the ordered variable pairs to be reserved or deleted according to a first preset condition; and if the edges between the ordered variable pairs are determined to be deleted, calling the deleted edge processing algorithm to perform secondary processing.

In an embodiment of the present application, the second removal algorithm is further configured to: determining ordered variable pairs according to the ordered variables in the complete graph and the adjacency relation; determining that a middle label of an edge between the ordered variable pair is a second type label; determining edges between the ordered variable pairs to be reserved or deleted according to a second preset condition; and if the edges between the ordered variable pairs are determined to be deleted, calling the deleted edge processing algorithm to perform secondary processing.

In an embodiment of the present application, the edge deletion processing algorithm is configured to: acquiring edges to be deleted determined according to the first removal algorithm and the second removal algorithm; if the determined edge to be deleted is a directed edge, solving the deleted conflict; and executing deletion operation on the determined edge to be deleted, and weakening and minimizing the corresponding separation set.

A second aspect of the present application provides a high recall cause and effect discovery apparatus based on time series operation and maintenance big data, the apparatus comprising: a rule definition module for predefining a number of relationship rules to be applied to edges and points in the full graph; the operation and maintenance data access module is used for acquiring streaming operation and maintenance data to be analyzed; the complete graph generation module is used for initializing the streaming operation and maintenance data into a complete graph by adopting a graph library and a graph algorithm; the edge processing module is used for processing the relationship in the complete graph of the streaming operation and maintenance data according to the relationship rules; and the result output module is used for outputting the processed cause-and-effect relationship diagram, and the cause-and-effect relationship diagram is used for describing cause-and-effect relationships in the streaming operation and maintenance data.

In this embodiment of the present application, the edge processing module includes: the complete graph processing main algorithm sub-module, the first removal algorithm sub-module, the second removal algorithm sub-module and the delete edge processing algorithm sub-module; the first removal algorithm submodule is used for determining that all edges of the ordered variable pairs between the ordered variable pairs and the nonadjacent variable in the complete graph are to be deleted or reserved through a first removal algorithm; determining the direction of the edge to be reserved according to the causal relationship between the points connected with the edge to be reserved; the second removal algorithm submodule is used for determining edges between the ordered variable pairs, which meet the preset separation set, as to-be-deleted or to-be-reserved through a second removal algorithm; determining the direction of the edge to be reserved according to the causal relationship between the points connected with the edge to be reserved; the full image processing main algorithm sub-module is used for providing a main function and calling the first removal algorithm and the second removal algorithm; the deleted edge processing algorithm submodule is used for carrying out secondary processing on the edges to be deleted determined by the first removing algorithm and the second removing algorithm.

The third aspect of the present application provides a high recall cause and effect discovery apparatus based on time series operation and maintenance big data, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the aforementioned high recall cause and effect discovery method based on time series operation and maintenance big data.

A fourth aspect of the present application provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the aforementioned time-series operation and maintenance big data-based high-recall causal discovery method.

In a fifth aspect of the invention, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the aforementioned time-series operation and maintenance big data based high-recall causal discovery method.

The technical scheme has the following beneficial effects: the operation and maintenance data are processed through the optimized cause and effect discovery method, so that cause and effect relationships extracted from the operation and maintenance data are more accurate, and the accuracy and reliability of analysis of operation and maintenance big data are improved.

Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:

FIG. 1 is a schematic flow chart illustrating a high recall causal discovery method based on time series operation and maintenance big data according to an embodiment of the present application;

fig. 2 schematically shows a block diagram of a high recall cause and effect discovery apparatus based on time series operation and maintenance big data according to an embodiment of the present application.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.

For the understanding and implementation of the technical principles of the present invention and its embodiments, as will be described below, by those skilled in the art:

to true connection

Quantifies the probability of a connection not being erroneously deleted due to an erroneous partial correlation coefficient (ParCorr) CI test, and is recorded as

And

it is also explained here that the method is also applicable to non-time-series situations. The method relies on four aspects: (1) sample size (usually fixed); (2) significance level of CI test α (which will generally be fixed for false positive levels); (3) the estimated dimension of the CI detection; (4) the magnitude of the effect.

The effect size is defined as the minimum CI test statistic I (A; B | S) that replaces all the condition sets S being tested. This minimum can be very small, ultimately resulting in low detection efficiency. The method mainly improves the effect size through the following two aspects: (1) limiting the condition set S to be tested so as to delete all the error connections; it is sufficient here to consider only the set of conditions consisting of the ancestors of A or B; (2) the extension requires the use of a so-called default condition S_defSet S, S of tests performed_defTo increase CI test statistics without creating spurious dependencies. But if S_defConsisting of only the ancestors of A or B, no spurious dependencies exist.

The theory for the above mentioned effects is presented below:

let A → B (wherein

And

) In M (G) is a link (→ or

). For default condition S_defPa ({ a, B }, m (g) \ { a, B }, while X ═ X \ S }_def. Is provided with

A set of enhanced method effect size sets is defined. If the following two conditions are satisfied simultaneously: (1) s ∈ S, wherein

Or

(2) There is a reasonable subset

Satisfies the condition I (A; B; S)_def\Q|S*∪Q)<0, then present

If not, the above formula>Become not less than that.

Wherein I represents (conditional) mutual information, while I (A; B; C | D) ≡ I (A; B | D) -I (A; B | C ≡ D) represents mutual information; the theory above shows that_defBeing the union of the a and B parents will increase the effect size.

From the theoretical derivation above, it can be seen that: this will result in higher detection capability and higher recall unless the higher effect size is over-offset by the increased estimation dimensionality (the reason for this phenomenon is the condition set setting of the higher cardinality). The above principle is only useful if some (non-) ancestors are known before all CI tests are completed. The method is realized by removing and positioning the complex edges, for example, learning the ancestor connection relation and then deleting the error edges.

According to the technical scheme, the data acquisition, storage, use, processing and the like meet relevant regulations of national laws and regulations.

FIG. 1 is a schematic flow chart illustrating a high recall causal discovery method based on time series operation and maintenance big data according to an embodiment of the present application. As shown in fig. 1, in an embodiment of the present application, a method for high recall cause and effect discovery based on time series operation and maintenance big data is provided, including:

101. predefining a number of relationship rules to be applied to edges and points in the full graph;

102. acquiring streaming operation and maintenance data to be analyzed;

103. initializing the streaming operation and maintenance data into a complete graph by adopting a graph library and a graph algorithm;

104. processing the relationship in the complete graph of the streaming operation and maintenance data according to the relationship rules;

105. outputting a processed cause and effect relationship graph, wherein the cause and effect relationship graph is used for describing cause and effect relationships in the streaming operation and maintenance data.

The causal relationship between points in the causal relationship graph is represented by edges, including directed edges corresponding to the causal relationship and bidirectional edges corresponding to each other as causal relationships. In the above embodiments, the predefined number of relationship rules are based on causal relationship rules in causal discovery, including but not limited to causal association rules, de-border rules, direction rules, etc. between points and points mentioned in the technical principles section. The above relation rules are the basis for processing the relation in step 104.

And acquiring streaming operation and maintenance data to be analyzed, wherein the acquisition mode comprises modes such as kafka real-time access and the like. Initializing the acquired streaming operation and maintenance data, mapping the data into a plurality of points in the graph, and connecting each pair of the points by using an edge to obtain a complete graph. The complete graph includes all possible causal relationships between all points, which does not reflect the actual causal relationships of the streaming operation and maintenance data. Deleting partial edges in the complete graph through a plurality of predefined relationship rules, and finally leaving points connected through a plurality of edges to represent the causal relationship between the points, wherein the graph at the moment is the final causal relationship graph. And deleting part of edges in the complete graph through a predefined relation rule, wherein the deletion can be realized through one or more preset edge deletion algorithms.

Through the above embodiment, the constraint-based cause and effect discovery algorithm includes cause and effect parental relationships in condition setting, and increases the effect amount. The real-time mode determines the reason causing low recall rate by identifying that the low efficiency of the condition independence test is the main reason, thereby improving the effect of the CI test by utilizing discovery and theory. In the whole process, the parent class is identified as much as possible, potential confounding factors of the time sequence at present are observed, and a new direction rule is used for determining the parent-child or ancestor relationship in the edge removal stage to carry out process iteration. The causal relationship graph with the largest information amount obtained by the above embodiment can accurately represent the causal relationship in the streaming operation and maintenance data.

In an embodiment provided by the present invention, processing the relationship in the full graph according to the relationship rules includes: defining the following algorithm according to the plurality of relationship rules: a full graph processing main algorithm, a first removal algorithm, a second removal algorithm and a delete edge processing algorithm; processing the edges in the complete graph by adopting a first removal algorithm and a second removal algorithm to realize the processing of the relationship; wherein the full graph processing main algorithm is used for providing a main function and calling the first removal algorithm and the second removal algorithm; the first removal algorithm is used for determining that all edges of the ordered variable pairs between the complete graph and the non-adjacent variables are to be deleted or reserved; determining the direction of the edge to be reserved according to the causal relationship between the points connected with the edge to be reserved; the second removal algorithm is used for determining edges between the ordered variable pairs, which meet a preset separation set, to be deleted or reserved; determining the direction of the edge to be reserved according to the causal relationship between the points connected with the edge to be reserved; and the deleted edge processing algorithm is used for carrying out secondary processing on the edges to be deleted determined by the first removing algorithm and the second removing algorithm. Specific implementations of the above algorithms will be described in detail later.

For a better understanding of the embodiments mentioned hereinafter, the noun definitions used in the following description of the pseudo-code will be explained here as follows:

defining a multivariate time series V^j。

To follow a stationary discrete-time structure vector, the autoregressive process described by the structural causal model SCM is therefore as follows:

wherein: j is 1, L,

measurement function f_jDependent on input parameters, noise variations

It is relatively independent. Collection

Based on V_t ^jCausal services are defined, and V_t＝(V_t ¹,V_t ²…) and p_tsIs a time-series data sequence. A pair of variables due to stationarity and causality

And all time-series moving pairs

Again, where τ ≧ 0 is referred to as hysteresis.

Assuming no cyclic causal relationship, this assumption is premised on timing-limited contemporaneous (τ -0) interactions. The method allowing unobserved presence of variables, e.g. observing only a subset of the study

Wherein

Further, it is assumed that no variables are selected and that the conditional independence CI in the relatively trusted context, i.e. the observation distribution p (V) generated by the SCM, represents the d-section in the variable V based correlation time series diagram G.

A maximum ancestry graph and a partial ancestry graph are defined. Maximal Anthral Graphs (MAGs) may contain directed edges (denoted "→" in the graph) and bidirectional edges (denoted "→" in the graph)

Representation), where a bidirectional edge may also be referred to as a link; partial Ancestor Graphs (PAGs) may have additional directed edges and bidirectional edge types.

A maximum time lag is defined. In time series causal discovery, stationarity assumptions and selected time lag window lengths t- τ_maxT' is less than or equal to t and plays an important role. Under sufficient causal conditions (X ═ V), the causal graph is for all τ_max≥p_tsThe same applies. Different in the potential case, let

Is by marginalisation at time interval t'<t-τ_maxMAG values derived for all non-observed variables and all general observed variables. Then, by increasing τ_maxIncreasing the size of the skew window may result in all contained edges being deleted in the original window, even in the case of good statistical data decisions. That is, at τ_max,1<τ_max,2In the following, the first and second parts of the material,

is not to

Sub-diagram of (1), thus τ_maxIs an analytical choice and not a tunable parameter.

The rationality and completeness of the definition. For the same reason, stationarity also affects the definition of MAGs and PAGs being estimated. And therefore cannot usually be determined

One PAG of (a). Formalize the above logic as follows: (1) in that

MAG acquired by forced repetition of the adjacent edge, note as

(2) By being at

The maximum information PAG of the Markov equivalence class obtained by the upward operation of the directed rule algorithm is recorded as

(3) PAG obtained by additionally enforcing the time order and repeating the directional elements using the directional rule at each step is noted

In addition, the first and second substrates are,

there may be fewer circular loop identifiers than

More information is contained. In summary, the object is to construct

If an algorithm can return a PAG, it is noted that

The algorithm is said to be reasonable; if can return to

The algorithm is said to be complete. Hereinafter, the same shall be simply labeled

And

definition of

And (3) gathering:

refer to removing

Outside the field

A set of all non-future adjacencies, an

Is not determined as

Is a non-ancestor of (c).

In the aforementioned napds_tThe following subsets are defined under the set:

collection

Is that

And

a union of (1);

collection

Is a set

In which all variables are removed

As a result of (A) and

is and

with a connection and a tail portion of

All of the variables of (1).

Collection

Is all that

A set of variables, and

is via path p and

there is a connection.

The attributes of path p are defined. The path p has the following properties:

a) on path p except

Any other node has no tail;

b) the middle node in every three continuous points on the path p is a collision node on the path p;

c) path p does not contain

d) And

adjacent node

And

without a head as

Are connected without the edges of

Is a tail part;

e) on the path p except

And

all nodes and

or

A tail part is

Or

Are not connected by edges and are not connected by

And

the tail part or the head part is connected with both sides.

An intermediate mark is defined. To facilitate early localization of edges, a clear causal explanation is given to the graph at each step of the algorithm, by adding intermediate labels to the edges. The intermediate mark is represented on a link symbol, and the options include: "? "," L "," R ","! "or empty. For example:

is shown if A<B(B<A) Then there is

Or is absent

I.e., a and B have several distances in m (g). "<"herein refers to any order of variable sets for the purpose of distinguishing

And

the choice of this symbol is arbitrary and does not affect the content of the cause and effect information. The "+" in the formula is a wildcard symbol indicating the label of all three connecting sides (tail, head, loop) present in PAGs. In addition to this, the present invention is,

to represent

And

are all true. The empty middle marker a × B then indicates a ∈ adj (B, m (g)). Is there a And does not represent any state. Non-circular edge labels (which may be hidden under a symbol here) may represent ancestors and non-ancestors in the standard sense, while the absence of an edge between A and B may still be expressed in the sense that

An ancestor-parent theory is defined. 1) A → B available

2)A>B and A → B available

3)A<B and A → B available

Selecting the total order in accordance with the chronological order, i.e.

Wherein tau is>0 or τ ═ 0, i<j. Hysteresis connection available edge

Initialization is performed.

A weak infinitesimal separation set is defined. In MAG M (G), assuming that a and B are points separated by S sets by some distance, the set S is called a weak minimum separation set of a and B when S satisfies the following two conditions:

1) s can be decomposed into

Wherein

2) If it is

And is

And A and B are separated by some distance, then there is S'₂＝S₂。

(S₁,S₂) For very weak small compositions that may be referred to as S.

To generalize the definition of the minimum separation set, guarantees need to be made

A strong explicit three-point pair rule is defined. Is provided with

Is an explicit three point pair, S, in PAG C (G)_ACIs a separate set of a and C, then there are:

1) if B ∈ S_AC，S_ACIs the very weak minimum set, then B ∈ ({ A, C }, G).

2) Is provided with

Any value is taken. If it is not

A and B cannot be substituted by S_AC∪τ_AB\ { A, B } is separated. C and B cannot be substituted by S_AC∪τ_CBV, { C, B } are separated, then

The setting of the latter two conditions may intersect with future or past variables. The rules described above apply to each step of the algorithm and may be executed at any time or in any order.

In one embodiment of the invention, the full graph processing main algorithm comprises: calling the first removal algorithm to perform first traversal on the full graph, and marking parent-child relations in the full graph; calling the first removal algorithm and the second removal algorithm to identify the complete graph after the first traversal, and determining a partial ancestor graph corresponding to the complete graph; and taking the obtained partial ancestral graph as a causal graph with the largest information amount. In particular, to facilitate understanding and implementation by those skilled in the art, corresponding pseudo code is provided as follows:

inputting a requirement: time series data set X ═ X¹,...,X^N}, maximum time lag τ_maxSignificance level α, CI test CI (X, Y, S), non-negative integer k;

1. initializing C (G) as a complete graph, wherein

For 0 ≤ l ≤ k-1, performing:

3. using a first removal algorithm to perform edge removal and use of direction rules;

4. repeat the first row if

The directed edge is marked in c (g).

5. Using a first removal algorithm to perform edge removal and use of direction rules;

6. using a second removal algorithm for edge removal and use of direction rules;

7.

the above pseudo code is explained as follows: initializing c (g) to a complete and complete graph, the algorithm will enter the initial phase and then design calls the first removal algorithm, which removes many (but typically not all) of the erroneous connections and reuses the direction rules described above while deleting them. These rules can identify a subset of (non-) ancestors in G and mark the top or tail labels in the edges of c (G) accordingly. The non-ancestral relationships then further constrain the conditional sets S of subsequent CI tests, the ancestral relationships being used to extend these sets S @ S_defWherein

C (G) is the known parent of which variables were tested for independence. All parent-child relationships marked with C (G) after line 3 are then marked, with the next transfer to reinitialized C (G) before the first removal algorithm is used. The condition set may be extended from the beginning to a known parent node. The purpose of this iterative process is to determine an exact subset of parent-child relationships in G. These results are then passed to the final stage in lines 5-6, i.e., the last use for the first removal algorithm. There may still be connections that are faulty at this point, because the first removal algorithm erroneously deletes the variables from each other

And

is erroneously connected to

And

without any ancestors of the other. This is the purpose of the second removal algorithm being invoked in line 6, which is the second deletion phase. The second removal algorithm is applied iteratively on the direction rule and is used to identify (non-) ancestors as with the first removal algorithm. Thus, PAG P (G) can be found. In addition, the output of the method is independent of the N-fold time series variable X^jThe order of (a). The number k of iterations of the algorithm in the initial stage is a super parameter, and each step of the algorithm has stationarity.

In one embodiment of the present invention, the first removal algorithm further comprises: determining ordered variable pairs according to the ordered variables in the complete graph and the adjacency relation; determining that a middle label of an edge between the ordered variable pair is a first type label; determining edges between the ordered variable pairs to be reserved or deleted according to a first preset condition; and if the edges between the ordered variable pairs are to be deleted, calling the deleted edge processing algorithm to perform secondary processing. Specifically, the pseudo code providing the first removal algorithm is as follows:

inputting a requirement: c (G), variable I of minimum test statistic^min(..), a separation set SepSet (,), and a time series data set X ═ X [ (. X. ])¹,...,X^N}, maximum time lag τ_maxSignificant level alpha, CI test CI (X, Y, S)

The above pseudo code is explained as follows: deleting pair

All edges between variables that are not adjacent to m (g). For this reason, the algorithm is applied to a given Scotus_defIs tested and is based on

The cardinality of S increases successively with p, where apds_tHas been given before, i.e. excludes

All variables in (1) that have been identified as non-ancestors. Default condition set

From all in C (G)

Or

All variables marked as parents are next. The algorithm needs to restart with p-0, otherwise the future computed disjoint set may not be the weakest. The rules mentioned above may also locate non-ancestral relationships and then further on the apds_tThe set is limited. Another innovation of the method is that before the edge test is carried out, some edges can be judged in advance and deleted. It can also be expressed by the following terms:

all the self-dependent links are tested first, followed by a cross-link from 0 step by step to τ_max. The whole sequence being independent of the N-th order timing variable X^jAnd therefore there is no need to introduce order dependencies. The algorithm is marked "!in the middle of C (G)! "or empty" is convergent, it cannot be separated multiple times, and further testing is not required. The updated memory in line 11 is used to track the minimum test statistic for all previous CI tests for a given variable. These values are used for the sequence S of the ground-based departure_searchOrdering is performed, for example, if

That is at S_searchIn

Will certainly be in

The foregoing occurs. Note that in line 18, only a selected subset of rules is used, and only applies to directed lag links.

In one embodiment of the present invention, the second removal algorithm further comprises: determining ordered variable pairs according to the ordered variables in the complete graph and the adjacency relation; determining that a middle label of an edge between the ordered variable pair is a second type label; determining edges between the ordered variable pairs to be reserved or deleted according to a second preset condition; and if the edges between the ordered variable pairs are to be deleted, calling the deleted edge processing algorithm to perform secondary processing. Specifically, the second removal algorithm pseudo code is provided as follows:

The above pseudo code is explained as follows: all the middle marks in C (G) are "! "or null, and the edge marked null in the middle must be in M (G), but marked"! "is not necessarily in M (G). The latter edges are edges between pairs of variables, and neither is another ancestor. Algorithm only searches out

The separation set of (3). In addition to the parent node in C (G), the algorithm also includes

Current napds of Chinese_tAll nodes are aggregated to set default conditions. Once all intermediate flags are empty, the algorithm will converge and then proceed with the final exhaustive rule application to ensure integrity.

In an embodiment of the present invention, the edge deletion processing algorithm includes: acquiring edges to be deleted determined according to the first removal algorithm and the second removal algorithm; if the determined edge to be deleted is a directed edge, repairing the deleted conflict; and executing deletion operation on the determined edge to be deleted, and weakening and minimizing the corresponding separation set. Specifically, the pseudo code is provided as follows:

inputting a requirement: c (G), ordered rule list tau, variable I of minimum test statistic^min(..), a separation set SepSet (,), and a time series data set X ═ X [ (. X. ])¹,...,X^N}, maximum time lag τ_maxSignificant level alpha, CI test CI (X, Y, S)

The above pseudo code is explained as follows: the algorithm exhaustively applies a set of directed rules for edge computation. Since many rules require a weak minimum of the disjoint sets, row 10 acts to make them weak, as follows: separate collections

And

not necessarily infinitesimal, but they can be identified by successively eliminating individual elements which are identified when the result set can no longer be separated

And

the ancestors of the collection. In particular, it is not necessary to search all subsets of the original separation set. The effectiveness of this method is in the equivalence of weak minima and weak minima of the second type.

Fig. 2 is a block diagram schematically illustrating a structure of a high recall cause and effect discovery apparatus based on time series operation and maintenance big data according to an embodiment of the present application, and as shown in fig. 2, in an embodiment of the present application, there is provided a high recall cause and effect discovery apparatus based on time series operation and maintenance big data, the apparatus including: a rule definition module for predefining a number of relationship rules to be applied to edges and points in the full graph; the operation and maintenance data access module is used for acquiring streaming operation and maintenance data to be analyzed; the complete graph generation module is used for initializing the streaming operation and maintenance data into a complete graph by adopting a graph library and a graph algorithm; the edge processing module is used for processing the relationship in the complete graph of the streaming operation and maintenance data according to the relationship rules; and the result output module is used for outputting the processed cause-and-effect relationship diagram, and the cause-and-effect relationship diagram is used for describing cause-and-effect relationships in the streaming operation and maintenance data.

The high-recall cause and effect discovery device based on the time sequence operation and maintenance big data comprises a processor and a memory, wherein the rule definition module, the operation and maintenance data access module, the complete graph generation module, the edge processing module, the result output module and the like are all stored in the memory as program units, and the processor executes the program modules stored in the memory to realize corresponding functions.

The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more, and the high-recall cause and effect discovery method based on the time sequence operation and maintenance big data is realized by adjusting the kernel parameters.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

Those skilled in the art will appreciate that the architecture shown in fig. 2 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

The embodiment of the application provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein the processor executes the program, and the steps of the high-recall causal discovery method based on the time sequence operation and maintenance big data are realized.

The present application further provides a computer program product adapted to perform a program initialized with high recall causal discovery method steps based on time series operation and maintenance big data when executed on a data processing apparatus.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, which include both non-transitory and non-transitory, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A high recall causal discovery method based on time series operation and maintenance big data is characterized in that the method is based on a causal discovery algorithm and comprises the following steps:

predefining a number of relationship rules to be applied to edges and points in the full graph;

acquiring streaming operation and maintenance data to be analyzed;

initializing the streaming operation and maintenance data into a complete graph by adopting a graph library and a graph algorithm;

processing the relationship in the complete graph of the streaming operation and maintenance data according to the relationship rules;

outputting a processed cause and effect relationship graph, wherein the cause and effect relationship graph is used for describing cause and effect relationships in the streaming operation and maintenance data.

2. The method of claim 1, wherein processing the relationship in the full graph of the streaming operation and maintenance data according to the relationship rules comprises:

defining the following algorithm according to the plurality of relationship rules: a full graph processing main algorithm, a first removal algorithm, a second removal algorithm and a delete edge processing algorithm;

processing edges in the complete graph of the streaming operation and maintenance data by adopting the first removal algorithm and the second removal algorithm to realize relational processing;

the first removal algorithm is used for determining that all edges of the ordered variable pairs between the complete graph and the non-adjacent variables are to be deleted or reserved; determining the direction of the edge to be reserved according to the causal relationship between the points connected with the edge to be reserved;

the second removal algorithm is used for determining edges between the ordered variable pairs, which meet a preset separation set, to be deleted or reserved; determining the direction of the edge to be reserved according to the causal relationship between the points connected with the edge to be reserved;

the full graph processing main algorithm is used for providing a main function and calling the first removal algorithm and the second removal algorithm;

and the deleted edge processing algorithm is used for carrying out secondary processing on the edges to be deleted determined by the first removing algorithm and the second removing algorithm.

3. The method of claim 2, wherein the full graph processing main algorithm is configured to:

calling the first removal algorithm to perform first traversal on the complete graph, and marking the parent-child relationship in the complete graph;

calling the first removal algorithm and the second removal algorithm to identify the complete graph after the first traversal, and determining a partial ancestor graph corresponding to the complete graph;

and outputting the determined part of ancestor graphs as processed causal relationship graphs.

4. The method of claim 2, wherein the first removal algorithm is further configured to:

determining ordered variable pairs according to the ordered variables in the complete graph and the adjacency relation;

determining that a middle label of an edge between the ordered variable pair is a first type label;

determining edges between the ordered variable pairs to be deleted or reserved according to a first preset condition;

and if the edges between the ordered variable pairs are determined to be deleted, calling the deleted edge processing algorithm to perform secondary processing.

5. The method of claim 2, wherein the second removal algorithm is further configured to:

determining that a middle label of an edge between the ordered variable pair is a second type label;

determining edges between the ordered variable pairs to be deleted or reserved according to a second preset condition;

6. The method of claim 2, wherein the edge deletion processing algorithm is configured to:

acquiring edges to be deleted determined according to the first removal algorithm and the second removal algorithm;

if the determined edge to be deleted is a directed edge, solving the deleted conflict;

and executing deletion operation on the determined edge to be deleted, and weakening and minimizing the corresponding separation set.

7. A high recall causal discovery apparatus based on time series operation and maintenance big data, the apparatus comprising:

a rule definition module for predefining a number of relationship rules to be applied to edges and points in the full graph;

the operation and maintenance data access module is used for acquiring streaming operation and maintenance data to be analyzed;

the complete graph generation module is used for initializing the streaming operation and maintenance data into a complete graph by adopting a graph library and a graph algorithm;

the edge processing module is used for processing the relationship in the complete graph of the streaming operation and maintenance data according to the relationship rules; and

and the result output module is used for outputting the processed cause and effect relationship diagram, and the cause and effect relationship diagram is used for describing cause and effect relationships in the streaming operation and maintenance data.

8. The apparatus of claim 7, wherein the edge processing module comprises:

the complete graph processing main algorithm sub-module, the first removal algorithm sub-module, the second removal algorithm sub-module and the delete edge processing algorithm sub-module;

the first removal algorithm submodule is used for determining that all edges of the ordered variable pairs between the ordered variable pairs and the nonadjacent variable in the complete graph are to be deleted or reserved through a first removal algorithm; determining the direction of the edge to be reserved according to the causal relationship between the points connected with the edge to be reserved;

the second removal algorithm submodule is used for determining edges between the ordered variable pairs, which meet the preset separation set, as to-be-deleted or to-be-reserved through a second removal algorithm; determining the direction of the edge to be reserved according to the causal relationship between the points connected with the edge to be reserved;

the full image processing main algorithm sub-module is used for providing a main function and calling the first removal algorithm and the second removal algorithm;

the deleted edge processing algorithm submodule is used for carrying out secondary processing on the edges to be deleted determined by the first removing algorithm and the second removing algorithm.

9. A high recall cause and effect discovery apparatus based on time series operation and maintenance big data, comprising a memory, a processor and a computer program stored in the memory and operable on the processor, wherein the processor when executing the computer program implements the high recall cause and effect discovery method based on time series operation and maintenance big data according to any one of claims 1 to 6.

10. A computer-readable storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform the time-series operation and maintenance big data-based high-recall causal discovery method of any of claims 1 to 6.

11. A computer program product comprising a computer program which, when executed by a processor, implements a time series operation and maintenance big data based high recall causal discovery method according to any of claims 1 to 6.