CN110942258A

CN110942258A - Performance-driven industrial process anomaly monitoring method

Info

Publication number: CN110942258A
Application number: CN201911254450.2A
Authority: CN
Inventors: 周东华; 陈茂银; 吴德浩; 纪洪泉; 高明; 朱继峰; 闫飞; 郑水明; 郭恩陶
Original assignee: Shandong University of Science and Technology
Current assignee: Shandong University of Science and Technology
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2020-03-31
Anticipated expiration: 2039-12-10
Also published as: CN110942258B

Abstract

The invention discloses a performance-driven industrial process abnormity monitoring method, and particularly relates to the technical field of industrial process monitoring. The method defines a detection performance index on the basis of fault detectability analysis, and selects components according to the index; the component selection problem is then described as a random optimization problem, and the optimal solution to the problem is given in a statistical sense. The method comprises two parts of off-line training and on-line monitoring. In the off-line training stage, historical data under normal working conditions are utilized to calculate the detection statistics of each sample, and the monitoring control limit is determined under the given significance level; in the on-line monitoring stage, the detection statistic is calculated according to the new sample, and if the detection statistic exceeds the control limit, the process is considered to be abnormal. Compared with a principal component analysis method based on the CPV criterion, the method selects principal components according to the detection performance, weakens sufficient conditions of detectability, and can generally obtain better monitoring performance.

Description

Performance-driven industrial process anomaly monitoring method

Technical Field

The invention belongs to the technical field of industrial process monitoring, and particularly relates to a performance-driven industrial process abnormity monitoring method.

Background

Principal Component Analysis (PCA) is a classic data dimension reduction technique, and is widely applied in many fields such as image processing and signal processing. In recent years, PCA and various improvements thereof have become effective technical means in the field of industrial process monitoring.

One of the key issues in building an industrial process data model using PCA is the selection of an appropriate number of principal elements, which is relevant to the monitoring performance of the built model. Too many principal elements may contain measurement noise and too few principal elements may lose critical information and fail to reflect some changes in the process. For the pivot selection problem, researchers have successively proposed many criteria or methods, such as eigenvalue limits, cross validation, CPV criteria, VRE criteria, and the like. However, the monitoring performance is not considered in these methods, for example, the eigenvalue limit and the CPV criterion consider that the principal element corresponding to the small eigenvalue is measurement noise, and both of them select the principal element with the starting point of retaining the maximum information amount; the VRE criterion takes into account the reconstruction performance and it is desirable that the selected pivot have the smallest reconstruction error. Thus, a pivot selected in the manner described above may be insensitive to anomalies.

Currently, there are only a few methods to consider monitoring performance when selecting a pivot. These methods all have several common drawbacks. Firstly, the method depends heavily on the fault direction, so that the method is only limited to detecting the fault of a sensor, and the fault direction needs to be estimated by using abnormal data for complex process faults, but in practice, the fault data is usually difficult to obtain and the fault direction is difficult to estimate accurately; secondly, for unknown faults without prior information, a parallel monitoring scheme for simultaneously monitoring a plurality of models is provided by a scholars, but huge calculation complexity is caused, and real-time monitoring on the process is not facilitated; thirdly, the methods determine the number of the pivot elements off line and keep the number of the pivot elements unchanged in the on-line monitoring stage, so that the monitoring performance of unknown faults is not ideal.

Disclosure of Invention

The invention aims to provide a performance-driven industrial process anomaly monitoring method which dynamically selects pivot elements according to detection performance indexes, does not need abnormal data and does not need to estimate the direction and the amplitude of a fault.

The invention specifically adopts the following technical scheme:

a performance-driven industrial process anomaly monitoring method comprises an offline training stage and an online monitoring stage;

the off-line training phase comprises the following steps:

1.1, collecting historical data under normal working conditions to obtain a training data set

Where N is the number of samples and m is the number of measurement variables;

1.2 calculating the sample mean μ by equation (1)_xAnd calculating a sample covariance matrix sigma using equation (2)_x：

Wherein the content of the first and second substances,

1.3 covariance matrix on samples ∑_xPerforming eigenvalue decomposition to obtain formula (3):

Σ_x＝QΛQ^T(3)

where Q is an orthogonal matrix, Λ ═ diag (λ)_[1],λ_[2],…,λ_[m]) Is a diagonal matrix and has a_[1]≥λ_[2]≥…≥λ_[m]；

1.4, for the kth sample in the training data set X, k is more than or equal to 1 and less than or equal to N,

the component vector y is calculated from equation (4)^k：

y^k＝Q^T(x^k-μ_x) (4)；

1.5, calculating a component vector y according to the definition of the detection performance index P^kCorresponding detection performance index

1.6 based on the selection matrix W ∈ {0,1}^m×^dObtaining the component vector after selection

And calculating corresponding detection performance index

1.7, traversing the dimension d from 1 to m in sequence to obtain the index of the detection performance

Maximum optimal selection matrix

1.8, calculating the detection statistic D of the kth sample according to the selected optimal component subset_kAs shown in formula (5):

wherein the content of the first and second substances,

representing a degree of freedom of

α quantites of chi-square distribution of (1);

1.9 given significance level α, detection control limit η was determined empirically_α；

The on-line monitoring phase comprises the following steps:

2.1, for a real-time sample x, its component vector y is calculated according to equation (6):

y＝Q^T(x-μ_x) (6)；

2.2, according to the definition of the detection performance index P, calculating the detection performance index P corresponding to the component vector y_y；

2.3 based on the selection matrix W ∈ {0,1}^m×^dObtaining the component vector after selection

And calculating corresponding detection performance index

2.4, traversing the dimension d from 1 to m in sequence to obtain the index of the detection performance

Maximum optimal selection matrix

2.5, calculating a detection statistic D of the real-time sample x according to the selected optimal component subset, wherein the formula (7) is as follows:

wherein the content of the first and second substances,

α quantites representing a chi-square distribution with d degrees of freedom;

2.6, detecting statistic D and control limit η_αBy comparison, if D is greater than η_αIf so, the process is considered to be abnormal, otherwise, the process is in a normal state.

Preferably, steps 1.5 and 2.2 are in particular:

mean value μ of component vectors_ySum covariance matrix Σ_yAs shown in formulas (8) and (9), respectively:

based on an additive fault model shown in equation (10):

x＝x^*+Ξ_if (10)

where x is the failure sample, x^*Is a corresponding normal sample, xi_iIs a failure

The direction matrix of (1), wherein, | f | | | represents the amplitude of the fault;

the mahalanobis distance corresponding to the component vector y is as shown in equation (11):

wherein, y^*＝Q^T(x^*-μ_x)；

From the trigonometric inequality of the vector, equation (12) is obtained:

in view of

To ensure failure

Is sufficiently detected that

Then, a sufficient condition for the failure to be detectable is as shown in equation (13):

||Λ^-1/2Q^TΞ_if||＞2χ_α(m) (13)；

the detection performance index corresponding to the component vector y is defined as formula (14):

preferably, steps 1.6 and 2.3 are in particular:

the definition selection matrix W is as shown in equation (15):

wherein d is not more than m, and

is a except for

A column vector having 1 as one element and 0 as the remaining elements;

for the component vector y, the subset of components y after selection_sGiven by equation (16):

having a mean value of

The covariance is as shown in equation (17):

subset of components y_sThe corresponding mahalanobis distance is shown as equation (18):

similarly, a sufficient condition for failure to be detectable is obtained, i.e., equation (19):

||(W^TΛW)^-1/2W^TQ^TΞ_if||＞2χ_α(d) (19)；

thus, the subset of components y_sCorresponding performance index

As shown in equation (20):

preferably, steps 1.7 and 2.4 are in particular:

let the selection matrix defined in equation (15) be

Then the performance index is detected

The maximum optimal selection matrix W is given by equation (21):

given d (1. ltoreq. d. ltoreq.m), formula (21) is converted to formula (22):

due to xi_iThe value of f is unknown, and equation (22) is really a random optimization problem;

according to the fault model shown in the formula (10), XI_if＝x-x^*Hence xi_if can be regarded as a random variable subject to a Gaussian distribution, i.e. having

It is expressed as a form shown in formula (23):

Ξ_if＝x-μ_x+e (23)

wherein the content of the first and second substances,

representing measurement noise;

substituting formula (23) into F (W) to obtain formula (24):

F(W)＝(y+g)^TW(W^TΛW)^-1W^T(y+g) (24)

wherein the content of the first and second substances,

is a gaussian random variable;

in consideration of (W)^TΛW)^-1＝W^TΛ^-1W, to yield formula (25) in a statistical sense:

due to WW^TIs a diagonal matrix with diagonal elements all being 0 or 1, so the two terms to the right of the equal sign of equation (25) are respectively formulated into the forms shown in equations (26) and (27):

wherein the content of the first and second substances,

formula (28) can thus be further obtained:

given d, in order to maximize

Should be reduced to_iD values of medium maximum are added, assuming σ_[1]≥σ_[2]≥…≥σ_[m]Optimal selection matrix

Given by equation (29):

sequentially traversing the dimension d from 1 to m to obtain a globally optimal selection matrix W^*As shown in equation (30):

the invention has the following beneficial effects:

the method dynamically selects the pivot element according to the detection performance index, and the selected pivot element is sensitive to abnormality, so that better monitoring performance can be obtained; the method does not need abnormal data, does not need to estimate the direction and the amplitude of the fault, and can obtain better monitoring performance for unknown abnormal without prior information; and the online calculation complexity is low, thereby being beneficial to monitoring the industrial process in real time.

Drawings

FIG. 1 is a flow chart of the present invention for off-line training and on-line monitoring;

FIG. 2 is a graph showing the results of Mahalanobis distance based monitoring;

FIG. 3 is a graphical illustration of the results of monitoring of a PCA-based Q statistic;

FIG. 4 is PCA-based T²A schematic diagram of the monitoring result of the statistic;

FIG. 5 is a graph showing the monitoring result according to the method of the present invention;

FIG. 6 is a schematic diagram illustrating the number of pivot elements selected based on the method of the present invention

Detailed Description

With reference to fig. 1, a performance-driven industrial process anomaly monitoring method includes an offline training phase and an online monitoring phase;

the off-line training phase comprises the following steps:

Where N is the number of samples and m is the number of measurement variables;

Wherein the content of the first and second substances,

Σ_x＝QΛQ^T(3)

the component vector y is calculated from equation (4)^k：

y^k＝Q^T(x^k-μ_x) (4)；

1.6 based on the selection matrix W ∈ {0,1}^m×dObtaining the component vector after selection

And calculating corresponding detection performance index

Maximum optimal selection matrix

wherein the content of the first and second substances,

representing a degree of freedom of

α quantites of chi-square distribution of (1);

1.9 given significance level α, detection control limit η was determined empirically_α。

The on-line monitoring phase comprises the following steps:

y＝Q^T(x-μ_x) (6)；

2.3 based on the selection matrix W ∈ {0,1}^m×dObtaining the component vector after selection

And calculating corresponding detection performance index

Maximum optimal selection matrix

wherein the content of the first and second substances,

represents a degree of freedom of d^*α quantites of chi-square distribution of (1);

2.6、the detection statistic D is compared with the control limit η_αBy comparison, if D is greater than η_αIf so, the process is considered to be abnormal, otherwise, the process is in a normal state.

Steps 1.5 and 2.2 specifically include the following processes:

based on an additive fault model shown in equation (10):

x＝x^*+Ξ_if (10)

wherein, y^*＝Q^T(x^*-μ_x)；

From the trigonometric inequality of the vector, equation (12) is obtained:

in view of

To ensure failure

Is sufficiently detected that

||Λ^-1/2Q^TΞ_if||＞2χ_α(m) (13)；

steps 1.6 and 2.3 specifically include the following processes:

the definition selection matrix W is as shown in equation (15):

wherein d is not more than m, and

is a except for

A column vector having 1 as one element and 0 as the remaining elements;

having a mean value of

The covariance is as shown in equation (17):

||(W^TΛW)^-1/2W^TQ^TΞ_if||＞2χ_α(d) (19)；

thus, the subset of components y_sCorresponding performance index

As shown in equation (20):

the specific processes of the steps 1.7 and 2.4 are as follows:

let the selection matrix defined in equation (15) be

Then the performance index is detected

The maximum optimal selection matrix W is given by equation (21):

given d (1. ltoreq. d. ltoreq.m), formula (21) is converted to formula (22):

It is expressed as a form shown in formula (23):

Ξ_if＝x-μ_x+e (23)

wherein the content of the first and second substances,

representing measurement noise;

substituting formula (23) into F (W) to obtain formula (24):

F(W)＝(y+g)^TW(W^TΛW)^-1W^T(y+g) (24)

wherein the content of the first and second substances,

is a gaussian random variable;

wherein the content of the first and second substances,

formula (28) can thus be further obtained:

given d, in order to maximize

Given by equation (29):

and traversing the dimension d from 1 to m in sequence to obtain a globally optimal selection matrix W as shown in a formula (30):

the monitoring method is validated below on the basis of a Continuous Stirred Tank Heater (CSTH), a process that provides a standard model library that is widely studied in the field of industrial process monitoring.

In this process, hot and cold water are mixed and heated by steam, where temperature and level are the controlled variables with nominal values. The process model has three PI controllers for controlling temperature, liquid level and cold water flow separately. Without loss of generality, the process is operated in the laboratory in the first operating condition. The measuring sample consists of the measured values of three sensors of liquid level, cold water flow and temperature and the output values of three controllers, namely x ═ L, F, T and C_L,C_F,C_T]^T。

In the off-line training phase, 2000 samples under normal conditions are collected for off-line modeling, the significance level is selected to be α ═ 0.01, and the control limit is determined empirically, 1500 samples are generated in the on-line monitoring phase, wherein the first 500 samples are normal, an abnormal condition is introduced starting from the 501 th sample and continuing until the end, the abnormality is a constant deviation fault with an amplitude of +0.02 applied to the liquid level sensor.

In order to show the advantages of the method of the present invention more clearly, mahalanobis distance and PCA methods are used for comparison. FIG. 2 shows a graph of Mahalanobis distance based monitoring results, and FIGS. 3 and 4 show the PCA-based Q statistic and T, respectively²The monitoring result of the statistic is shown schematically, and fig. 5 shows the monitoring result based on the method of the invention. Comparing these four figures, one can conclude that: PCA has poor monitoring results in this scheme, either Q statistic or T statistic²Statistics, anomalies can hardly be detected because the amplitude of the fault is small and easily masked by noise; the Mahalanobis distance can partially monitor the abnormity, but the report missing rate is still more than 20%; the method provided by the invention has an optimal monitoring result, and can monitor the abnormality at a detection rate of more than 90%. In addition, fig. 6 shows a schematic diagram of the number of pivot elements selected based on the method of the present invention. As can be seen from fig. 6, there is a large variation in the number of selected pivot elements before an anomaly occurs because the process is operating normally and measurement noise is present; but after the exception occurs, the method can quickly locate the pivot element subset which is most sensitive to the exception, so that better monitoring performance can be obtained.

It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make modifications, alterations, additions or substitutions within the spirit and scope of the present invention.

Claims

1. A performance-driven industrial process anomaly monitoring method is characterized by comprising an off-line training stage and an on-line monitoring stage;

the off-line training phase comprises the following steps:

Where N is the number of samples and m is the number of measurement variables;

Wherein the content of the first and second substances,

Σ_x＝QΛQ^T(3)

the component vector y is calculated from equation (4)^k：

y^k＝Q^T(x^k-μ_x) (4)；

And calculating corresponding detection performance index

Maximum optimal selection matrix

wherein the content of the first and second substances,

representing a degree of freedom of

α quantites of chi-square distribution of (1);

The on-line monitoring phase comprises the following steps:

y＝Q^T(x-μ_x) (6)；

2.3 based on the selection matrix W ∈ {0,1}^m×dObtaining the component directions after selectionMeasurement of

And calculating corresponding detection performance index

Maximum optimal selection matrix

wherein the content of the first and second substances,

2. A performance driven industrial process anomaly monitoring method according to claim 1, characterized in that 1.5 and 2.2 are in particular:

based on an additive fault model shown in equation (10):

x＝x^*+Ξ_if (10)

wherein, y^*＝Q^T(x^*-μ_x)；

From the trigonometric inequality of the vector, equation (12) is obtained:

in view of

To ensure failure

Is sufficiently detected that

||Λ^-1/2Q^TΞ_if||＞2χ_α(m) (13)；

3. a performance driven industrial process anomaly monitoring method according to claim 1, characterized in that 1.6 and 2.3 are in particular:

the definition selection matrix W is as shown in equation (15):

wherein d is not more than m, and

is a except for

A column vector having 1 as one element and 0 as the remaining elements;

having a mean value of

The covariance is as shown in equation (17):

||(W^TΛW)^-1/2W^TQ^TΞ_if||＞2χ_α(d) (19)；

thus, the subset of components y_sCorresponding performance index

As shown in equation (20):

4. a performance driven industrial process anomaly monitoring method according to claim 3, characterized in that 1.7 and 2.4 are in particular:

let the selection matrix defined in equation (15) be

Then the performance index is detected

Maximum optimal selection matrix W^*Given by equation (21):

given d (1. ltoreq. d. ltoreq.m), formula (21) is converted to formula (22):

according to the fault model shown in the formula (10), XI_if＝x-x^*Hence xi_if can be viewed as a Gaussian-compliant distributionRandom variables of, i.e. having