Alarm source searching method based on data driving
Technical Field
The invention belongs to the technical field of safety monitoring, and particularly relates to a data-driven alarm root cause searching method.
Background
Due to the continuous improvement of the requirements on the safety and the reliability of the industrial system, the online and real-time monitoring of the operation process of the system becomes an essential key link in the modern industrial system. In consideration of the situations that accurate mathematical models and prior knowledge of the system are difficult to obtain, a large amount of historical operating data is generated by the industrial system, and the like, process monitoring based on data driving becomes a mainstream technology of modern industrial safety monitoring. The alarm is sent after the fault occurs, so that the staff can be helped to judge the system operation condition in time, but the method cannot determine the reason for the alarm. The alarm source searching method can be used for determining the reason of the alarm when the alarm occurs, so that the method is generally regarded.
The alarm root cause searching method accurately positions the fault through a series of measures, and assists workers to timely isolate and eliminate the fault. Through the development of many years, people have proposed various alarm source searching technologies, which are mainly divided into three categories:
1) the symbolic directed graph method, relying on the physical model and prior knowledge of the system;
2) a Granger causal analysis method based on predicted causal relationships;
3) the Transition Entropy (TE) method.
Both of the first two methods are only suitable for linear systems, and the relationship between variables is obtained by constructing a model, and the methods are not suitable for large-scale complex systems. The last method obtains the causal relationship among the variables mainly by calculating the probability density function of the process variables, can be applied to a complex nonlinear system, and has stronger practicability. The method has the defect that the method has higher requirements on the quantity of modeling data, and the characteristic that the modern industrial system generates mass data just makes up the defect.
Therefore, the inventor considers that the causal relationship among the variables is determined by using the transformation entropy method, and lays a solid foundation for searching the alarm source.
Disclosure of Invention
The invention aims to provide a data-driven alarm root cause searching method, which can obtain cause-and-effect relationships only by process measurement variables without depending on a system physical model and prior knowledge, and can search the root cause at the initial stage of alarm sending so as to isolate and eliminate faults in time, reduce or even avoid accidents and improve the safety and reliability of system operation.
In order to achieve the purpose, the invention adopts the following technical scheme:
a data-driven alarm root cause searching method comprises the following steps:
the method comprises the following steps: detecting working data of an industrial system, obtaining observation variables, storing the d observation variables into a data matrix X, checking the time stability of the data and preprocessing the data;
the working data comprises parameters reflecting the operation condition of the system;
step two: initializing the model parameters, and optimizing the model parameters by utilizing a Cao criterion or a Ragwitz criterion;
step three: calculating a transition entropy matrix P comprising:
A. selecting variables: taking any two variables from the data matrix X, marking as X and y, and d (d-1)/2 combinations in total;
B. the transition entropy between two variables is calculated:
wherein,is a joint probability density function, f (|) is a conditional probability density function, w is a random vectorSuppose the element of w is w1,w2,…,wsAnd [ alpha ] dw isAndembedded vectors, k, of historical measurements of x and y, respectively1And l1Embedding dimensions of y and x, h, respectively1Is the prediction horizon;
C. calculating standard transition entropy:
wherein, H represents the entropy of the sample,is the conditional entropy; and Tx→y≠Ty→x;
If it is notIf the difference is larger than the specified threshold value, judging that the two variables x and y have causal relationship;
D. repeating the step B, C until d (d-1) is calculated, calculating the standard transition entropy of the variables of d (d-1)/2 combinations, storing the standard transition entropy into a matrix P, and then representing the variables with causal relationship by using a flow diagram;
step four: calculating standard direct transition entropy based on variable causal relationship in the information flow graph:
taking any x, y and z3 causal variables from the matrix P, wherein z is an intermediate variable, and judging the direct causal relationship between x and y comprises:
1) calculating direct transition entropy:
wherein v represents a random vectorThe prediction range h is max (h)1,h3) Embedding vectorIs a historical value of z, can provide effective information for predicting y at the moment i + h,is the historical value of x, if h ═ h1Then, thenIf h is h3Then, thenAnd calculate Tx→zWhen l is turned on2And m1Is the embedding dimension of x and z, h2Is the prediction horizon, τ2Is a time interval; calculating Tz→yWhen k is2And m2Is the embedding dimension of y and z, h3Is the prediction horizon, τ3Is a time interval;
2) calculating standard direct transformation entropy:
if it is notIf the value is larger than the specified threshold value, the direct causal relationship between x and y is shown;
carrying out the two-step calculation of 1) and 2) on the variables in the information flow diagram in the step three, and verifying the truth of the causal relationship of the variables;
and step five, establishing a variable direct causal relationship graph according to the verification result of the step four.
In the first step, the data is checked for time stability by an augmented fullerene test method.
The step of preprocessing the data comprises: and processing data noise by using a filtering method and the like.
In step four, 3 causal variables are arbitrarily taken from the matrix P, where the variable z may be null and x and y are adjacent variables.
After the scheme is adopted, the invention has the following advantages: the model is established only by mass data reflecting the operation of the system, and the method does not depend on a physical model and prior knowledge of the system, so that the limiting conditions are few and the applicability is strong; in addition, fault location is carried out at the early stage of alarming, faults can be rapidly discharged, major accidents are reduced, the safety and the reliability of the system are improved, and the economic benefit is improved.
The invention is further described below with reference to the accompanying drawings.
Drawings
FIG. 1 is a flow chart of the alarm root cause searching method based on data driving according to the present invention.
FIG. 2 is a graph of the relationship of variables x, y and z.
FIG. 3 is an information flow diagram of variables x, y and z, where z to y have a direct causal relationship.
FIG. 4 is an information flow diagram of variables x, y, and z, where z to y have no causal relationship.
Fig. 5 is an information flow diagram based on standard transition entropy.
FIG. 6 shows the steps of calculating the standard direct transition entropy.
Fig. 7 is an information flow diagram based on standard direct transition entropy.
Detailed Description
Example one
Referring to fig. 1, the present embodiment is described, and a method for searching an alarm root cause based on data driving according to an embodiment of the present invention is performed according to the following steps:
the method comprises the following steps: the working data of the industrial system is detected, observation variables are obtained, and d observation variables are stored in the data matrix X, the embodiment utilizes an augmented fullerene test method to check the time stability of the data, and preprocesses the data, wherein the preprocessing comprises processing data noise by utilizing methods such as filtering and the like; wherein the working data comprises parameters reflecting the operating conditions of the system, such as temperature, pressure, water level, etc.;
step two: initializing the model parameters, and optimizing the model parameters by using a Cao criterion; the model is a model for establishing a variable causal relationship, the model parameters are some setting parameters required for establishing the model, and the preprocessed working data are input into the model;
step three: calculating a transition entropy matrix P:
A. taking 3 variables from the data matrix X, marking as X, y and z, and calculating standard transition entropy values (NTE) between any two variables, wherein d (d-1)/2 combinations are calculated;
the 3 variables are taken as examples to explain the calculation method of the standard direct transition entropy;
B. calculate TE values for x to y:
wherein,is a joint probability density function, f (|) is a conditional probability density function, w is a random vectorSuppose the element of w is w1,w2,...,wsAnd [ alpha ] dw isAndembedded vectors, k, of historical measurements of x and y, respectively1And l1Dimension of y and x, respectively, h1Is the prediction horizon;
if T isx→yWhen x and y are not causally related, 0 is shown;
C. calculate NTE for x to y:
wherein,is the conditional entropy;
h represents entropy;
D. calculate TE values for x to z:
wherein,andis a time interval τ2η is a random vectorh2Is the prediction horizon;
if T isx→z0, indicating that x and z have no causal relationship;
E. calculate NTE values for x to z:
F. calculate the TE values for z to y:
wherein,andis a time interval τ3The embedded vector of (a) is embedded,is a random vectorh3Is the prediction horizon;
if T isz→y0, indicating that z and y have no causal relationship;
G. calculate the NTE values for z to y:
calculating TE values of any two variables of the data matrix X, and storing the TE values into a matrix P of d multiplied by d;
the diagonal elements of the matrix P are the transition entropies of the variables themselves, whose values are NA;
when the NTE value is larger than a specified threshold value, judging that the two variables have a causal relationship, and describing an information flow graph based on the NTE;
it should be noted that the probability density function is usually estimated by using a gaussian kernel function
The probability density function of a univariate can be calculated by the following formula
Where N is the number of samples, γ is the bandwidth of the reduced probability density function estimate,c=(4/3)1/5≈1.06;
for the case of d-dimensional multivariate, the probability density function estimate can be calculated using the following formula
Wherein,s=1,…,d;
step four: calculating standard direct entropy value (NDTE):
a: calculating Direct Transition Entropy (DTE):
as shown in fig. 2, x causes a change in z and y, and to determine whether x and y have a direct causal relationship, DTE is defined:
wherein v represents a random vectorThe prediction range h is max (h)1,h3) Embedding vectorIs a historical value of z, can provide effective information for predicting y at the moment i + h,is the historical value of x, if h ═ h1Then, thenIf h is h3Then, then
B. Calculating NDTE:
if DTEx→y0, x and y have no direct causal relationship;
if it is notIf the x and y are greater than the specified threshold, x and y have a direct causal relationship;
then judging the truth of the z-y causal relationship;
the DTE values for z to y are calculated as:
where upsilon represents a random vectorThe prediction range h is max (h)1,h3) Embedding vectorIs the historical value of z, can be i + hThe prediction of the time of day y provides valid information,is the historical value of x, if h ═ h1Then, thenIf h is h3Then, then
The NDTE value of z to y is calculated as
If NDTEz→yAbove a specified threshold, indicating that z to y have a causal relationship, as shown in FIG. 3; otherwise, z to y have no causal relationship, as shown in FIG. 4;
step five: and carrying out NDTE calculation on the variables with the confirmed causal relationship, verifying the direct causal relationship between the two variables, and screening out the variables with the direct causal relationship to establish a direct causal relationship diagram, namely determining the information flow diagram.
Example two: the first difference between the present embodiment and the specific embodiment is: and step two, optimizing parameters by adopting a Ragwitz criterion.
The specific embodiment is as follows: the alarm root cause searching method based on data driving in the specific embodiment is used for simulating the variable causal relationship of the Flue Gas Desulfurization (FGD) process of a petroleum company, and comprises the following specific steps;
taking the FDG process as an example, selecting the liquid levels of a reaction tank, a water tank 1 and a water tank 2 and the flow rates of pumps 2 and 3 as variables, and respectively recording the variables as y1、y2、y3y4、y53544 groups of data are collected, the data have time stability, and the data are preprocessed;
initializing model parameters, and optimizing the parameters by using a Cao criterion;
step three, calculating TE values and NTE values among variables, and showing in table 1;
TABLE 1
Selecting 0.02 as a threshold value, wherein the information flow direction path based on the standard transformation entropy is shown in FIG. 5;
step four, calculating DTE and NDTE values of the FDG part process, and showing in a table 2;
TABLE 2
If the NDTE value is too small, it is determined that the variables have no direct causal relationship, and the step of obtaining the information flow graph according to the direct transition entropy calculation result is shown in FIG. 6;
step five, obtaining the information flow diagram of the FDG process is shown in FIG. 7.
While the above description shows and describes the preferred embodiments of the present invention, it is to be understood that the invention is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.