CN113743489A - Process industrial process fault detection method based on data loss - Google Patents
Process industrial process fault detection method based on data loss Download PDFInfo
- Publication number
- CN113743489A CN113743489A CN202110987661.8A CN202110987661A CN113743489A CN 113743489 A CN113743489 A CN 113743489A CN 202110987661 A CN202110987661 A CN 202110987661A CN 113743489 A CN113743489 A CN 113743489A
- Authority
- CN
- China
- Prior art keywords
- data
- matrix
- missing
- representing
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a process industrial process fault detection method based on data loss, which comprises the following steps: step S1: sampling and processing data of the process industrial process; step S2: filling missing data in the sampled data by using a kernel extreme learning machine KELM; step S3: performing low-dimensional feature extraction on the data by adopting a landmark equidistant mapping method L-ISOMAP; step S4: and calculating statistics and controlling the current situation in the feature space and the residual error space respectively, and performing fault detection. Compared with the prior art, the method has the advantages of high accuracy, time saving, computing resource saving and the like.
Description
Technical Field
The invention relates to the field of process industrial process control, monitoring and safety production, in particular to a process industrial process fault detection method based on data loss.
Background
With the introduction of the industrial 4.0 concept and the increasing maturity of technologies such as industrial internet, internet of things and the like, the intelligent manufacturing transformation of the industrial production process has become a necessary trend of the traditional industrial development, and the industrial process has become increasingly integrated and large-scale as a result. The production process of the process industry such as oil refining, pharmacy and the like is increasingly complex, and the establishment of an accurate mechanism model for the process by a traditional mode becomes increasingly difficult. Under the wave of support of technologies such as a distributed control system, a data acquisition and monitoring control system and the like and machine/deep learning, process industrial process modeling and process monitoring based on data driving become indispensable links for industrial intelligent operation production.
Signals are unstable in the industrial data transmission process, data storage fails, a sensor loses packets during sampling, and data are lost due to the multiple sampling rates. When a large number of missing values appear in the historical process data applied to modeling, if a deletion rule is directly adopted, a large number of effective information can be removed, and a small amount of sample data used for constructing the model cannot embody the characteristics of the original process; if an unreasonable filling method is adopted, missing values can be predicted in a wrong mode, and the constructed fault detection model is low in accuracy.
Through retrieval, the Chinese patent publication No. CN109146004A discloses a dynamic process detection method based on an iterative missing data estimation strategy, and the invention uses an iterative missing data estimation method to estimate the estimated value of the missing data, thereby converting the assumed original data into an estimation error; and iteratively solving the estimation value of the missing variable by adopting a PCA (principal component analysis) model, and finally performing online fault detection by using the estimation error as a monitored object. However, the PCA model used in this method is slow and not highly accurate.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a flow industrial process fault detection method based on data loss, which has high accuracy and saves time and computing resources.
The purpose of the invention can be realized by the following technical scheme:
a process industrial process fault detection method based on data loss comprises the following steps:
step S1: sampling and processing data of the process industrial process;
step S2: filling missing data in the sampled data by using a kernel extreme learning machine KELM;
step S3: performing low-dimensional feature extraction on the data by adopting a landmark equidistant mapping method L-ISOMAP;
step S4: and calculating statistics and controlling the current situation in the feature space and the residual error space respectively, and performing fault detection.
Preferably, the step S1 includes the steps of:
step S101: sampling data of a normally running process industrial process, simulating various industrial field reasons to perform deletion exception processing on the data, and obtaining an incomplete deletion data set X containing various deletion typesM,XM∈Rm ×nWherein R ism×nRepresenting a real matrix with m samples and n dimensions;
step S102: for missing data set XMCarrying out standardization processing to obtain a new data set XSM;
Step S103: find dataset XSMThe position of the missing data in (1) divides all sampling points containing the missing value into a data set XSM-NCAnd dividing the complete sample point data into another data set XSM-C。
Preferably, the step S2 is specifically:
step S201: determining KELMiInput and output data of the model;
for the ith sampling point, finding the variable v to which the missing value belongsms_iV is to bems_iCorresponding data NanNCiAs a value to be predicted, the observed variable excluding the missing value in the sample point is defined as vob_iV is to beob_iCorresponding data XNCiAs KELMiTest input of the model;
complete data set XSM-CAs KELMiTraining data of the model-XSM-CMiddle variable vob_iCorresponding data XCiAs input, XSM-CMiddle variable vms_iCorresponding data YCiAs model output, construct a model with P sampling pointsIs a data set ofWherein XCi∈RP×TRepresenting training input XCiIs a data point of dimension T, YCi∈RP×KIndicating label YCiData points in K dimension, xCi_tTraining data representing the t-th sample point, yCi_tA label representing the t-th sample point;
step S202: KELM for establishing ith sampling momentiA model;
step S203: predicting missing data of the ith sample point;
step S204: mixing XSM-NCFilling all the moments with missing values to obtain a complete data set Xf。
Preferably, the step S202 specifically includes:
the extreme learning machine ELM is a special single-hidden-layer feedforward neural network SLFNs, and aiming at the ith sampling moment, the SLFNs meet the following expression:
wherein L represents the number of nodes of the hidden layer, G (x)Ci_j,aq,bq) Representing the activation function, xCi_jQ represents a q-th layer hidden layer node for training data of the model; a is in the form of RT×LFor inputting the weight matrix, b is an element of R1×LTo imply layer bias, β ∈ RL ×KAs an output weight matrix, y* Ci_jAn output value representing the model;
the parameters a and b in the extreme learning machine ELM model are randomly determined, only the output weight matrix parameter beta is required to be obtained, and the corresponding output of the extreme learning machine ELM is as follows:
YCi *=Hβ (2)
where H represents the feature mapping matrix:
wherein g (x)Ci_1,aq,bq) For activating a function matrix G (x)Ci_j,aq,bq) An element of (1);
Wherein HTRepresenting the transposition of a characteristic mapping matrix H, C representing a regularization parameter, I representing an identity matrix, and P representing the number of samples;
the output function of the ELM is expressed as:
wherein h (x)Ci) Is xCiA mapping function of (a);
introducing Mercer theorem to construct KELM on the basis of ELMiSaid KELMiThe output function of (a) is as follows:
wherein omegaiThe kernel function matrix trained to fill the missing values of the ith sample point is expressed as:
K(xCi_α,xCi_β) Is represented by XCiTwo elements x in (1)Ci_α,xCi_βConstructed radial basis functionNumber:
where σ is a kernel width parameter, α and β represent the positions of the elements, respectively,is xCi_α,xCi_βAn abbreviated form of the constructed kernel function.
Preferably, the step S203 specifically includes: mixing XSM-NCData X at the ith timeNCiPredicting missing data Nan at that time as input to the modelNCi:
Preferably, the step S3 includes the steps of:
step S301: randomly selecting m' samples from m samples as landmark points;
step S302: constructing a neighbor neighborhood graph G;
calculating Euclidean distances between m' landmark point pairs, data point pairs (X)fi,Xfj) Is recorded as dXm′(Xfi,Xfj) (ii) a Setting a distance threshold, selecting proper neighbors, and constructing a neighbor neighborhood graph G;
step S303: calculating the Dijkstra distance between the geodesic lines of the high-dimensional data, namely the shortest path;
by calculating X on the neighborhood map Gfi,XfjGeodesic distance d between two pointsDm′(Xfi,Xfj) To approximate the geodesic distance of the original manifold, a geodesic distance matrix DDm′Consisting of the square of the geodesic distance;
step S304: determining an inner product matrix Bm′:
Wherein Hm′Is a centralized matrix;
step S305: obtaining a d-dimensional embedding matrix L of landmark pointsd:
Solving to obtain a matrix Bm′Corresponding maximum d eigenvalues λ1≥λ2≥…λdD eigenvectors corresponding to the eigenvalues are [ v ]1,v2,…,vd]Thus d-dimensional embedding matrix L of landmark pointsdExpressed as:
Step S307: calculating the distance between the data point except the landmark point in the data set and the landmark point, namely the distance between a certain point r in the rest data points and the landmark point is marked as dDmm′(Xfr,Xfj) The distance squares form a matrix, and the vector formed by the columns of the data points r in the matrix is recorded as
Step S308: solving a matrix LdIs pseudo-inverse transpose matrix L# d
Step S309: computing a d-dimensional embedding matrix L for the remaining data pointsrd;
Step S310: adopting a Principal Component Analysis (PCA) algorithm to realize embedded coordinate alignment;
is calculated to obtaind-dimensional embedded matrix Xfd∈Rm×dRealizing coordinate alignment by using PCA (principal component analysis) standardization method to obtain aligned d-dimensional feature matrix Y ∈ Rm×d。
Preferably, the number of landmark samples in step S301 satisfies m' < m.
Preferably, the step S4 includes the steps of:
step S401: calculating a mapping matrix A;
solving a mapping matrix A of the original high-dimensional data projected to the low-dimensional space through a local linear regression idea:
Y=AXf (12)
A=YXf T(XfXf T)-1 (13)
wherein XfFor filling up the complete data set after missing data, Y is a feature matrix;
step S402: constructing an offline data fault detection statistic and a control limit;
step S403: and calculating the online data statistic for real-time monitoring.
Preferably, the step S402 specifically includes: for offline data XfSeparately constructing feature space statisticsAnd residual spatial statistics SPEf(ii) a And calculating respectively by adopting a kernel density estimation algorithmAnd SPEfControl limit ofAnd SPEucl。
Preferably, the step S403 specifically includes: standardizing observed real-time data xtTo obtain xrtObtaining a low-dimensional mapping y of the real-time data by the mapping matrix ArtComprises the following steps:
yrt=Axrt (14)
computing real-time data statisticsAnd SPErtAnd if the online data statistic is larger than the control limit, indicating that the process has a fault.
Compared with the prior art, the invention has the following advantages:
1) when missing values are predicted, the difference of each sampling moment with the missing values is fully considered, and each sampling moment is sequentially filled in a model updating mode, so that the method is suitable for various missing types, and the accuracy of filling data is ensured;
2) the nuclear limit learning machine has the characteristics of strong generalization performance and high learning speed, and has less time consumption and computing resources by using the nuclear limit learning model to predict the missing value while ensuring the accuracy;
3) when a landmark equidistant mapping (L-ISOMAP) model is established to realize feature extraction, the low-dimensional feature data can keep the manifold structure of the original high-dimensional data, so that the low-dimensional data can keep effective information of the original data as much as possible;
4) compared with an equidistant mapping algorithm (ISOMAP), the landmark equidistant mapping algorithm (L-ISOMAP) has smaller operation amount when the distance matrix is calculated while the dimension reduction reliability is ensured, so the algorithm has higher operation speed.
Drawings
FIG. 1 is a flow chart of the overall steps of the present invention in implementing fault detection based on data loss;
FIG. 2 is a flow diagram of missing data padding implemented using a KELM model for model updating;
FIG. 3 is a flow chart for implementing feature extraction using the L-ISOMAP algorithm.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
As shown in fig. 1, the present invention provides a process industrial process fault detection method based on data loss, and the working principle of the method is as follows: firstly, collecting normal data when a process industrial process normally runs, processing to obtain a training data set containing a missing value, and filling the missing value through each sampling data of a Kernel Extreme Learning Machine (KELM) based on model updating to obtain a complete data set; on the basis, a Landmark equidistant mapping algorithm (Landmark-ISOMAP, L-ISOMAP) is adopted to realize low-dimensional feature extraction; finally, T is established in the feature space2And (4) statistics, namely establishing SPE statistics in the residual error space, and respectively calculating corresponding control limits, thereby realizing fault detection.
The embodiment is realized by the following specific technical scheme, which specifically comprises the following steps:
step S1: sampling data of a normally running process industrial process, simulating various reasons for data loss in an industrial field, and performing deletion exception processing on the data to obtain an incomplete missing data set X containing various deletion typesM,XM∈Rm×nWherein R ism×nRepresenting a real matrix with m samples and n dimensions;
step S2: for the missing data set XMCarrying out standardization processing to obtain a new data set XSM;
As shown in fig. 2, a flow chart of a data padding method is presented.
Here, for simplicity of illustration of the padding process, XSMSetting a matrix with three missing values;
in which the coordinates (u) of the data are missing1,v1),(u1,v2),(u2,v3) Respectively represent the u-th1V th of a sampling instant1,v2Individual variable and u2V th of a sampling instant3Data missing of each variable occurs;
step S3: find dataset XSMThe position of the missing data is divided into data sets X by all sampling points containing the missing valuesSM-NCDividing the complete sample point data into another data set XSM-C;
Step S4: for data set X in turnSM-NCFilling each sampling point;
as shown in FIG. 2, the variable v to which the missing value belongs is found for the ith sampling point of data paddingms_iV is to bems_iData Nan corresponding to variablesNCiThe observed variables in the sample points excluding the missing values are v as the values to be predictedob_iData X corresponding theretoNCiAs KELMiTest input of the model;
complete data set XSM-CAs KELMiTraining data of the model, XSM-CMiddle variable vob_iCorresponding data XCiAs input, XSM-CMiddle variable vms_iCorresponding data YCiAs a model output, a data set with P sample points is constructed asWherein XCi∈RP×TRepresenting training input XCiIs a data point of dimension T, YCi∈RP×KIndicating label YCiData points in the K dimension;
when X is presentSMHas three deficiency values as shown aboveFor the u-th example matrix, first1Filling missing values of a sampling moment, wherein the variable to which the missing values belong is v1,v2Corresponding to missing data beingAndwill miss dataAndthe residual data after the missing value is removed at the sampling time is recorded as the prediction model output of the modelWill beA prediction model input as a model; then selecting XSM-CFind v in1,v2Data corresponding to variablesOutput labels as model training data, XSM-CThe rest of the dataAs input for model training data;
to fill in u1KeLM is a model of kernel limit learning machine with missing values at each momentu1Training data sets of modelsThe specific data corresponding to the moment is
Extreme Learning Machines (ELM) are special single-hidden layer feedforward neural networks (SLFNs) for the u-th1At each sampling instant, SLFNs satisfies the following expression:
where L represents the number of nodes of the hidden layer,representing an activation function, the type of activation function represented by g (-), a ∈ RT×LFor inputting the weight matrix, b is an element of R1×LTo imply layer bias, β ∈ RL×KTo output the weight matrix, the weight matrix is output,an output value representing the model;
an Extreme Learning Machine (ELM) is a special SLFNs, parameters a and b in an ELM model are randomly determined, and only an output weight matrix parameter beta is required to be obtained; compared with the traditional SLFNs, the ELM has better generalization performance and learning speed; the corresponding outputs of ELM are:where H represents the feature mapping matrix:
wherein HTExpress characterThe transpose of the eigen-mapping matrix, C denotes the regularization parameter, and I denotes the identity matrix.
The output function of the ELM can be expressed as:
in order to avoid the influence of the selection of the number L of nodes of the hidden layer on the model training result, Mercer theorem construction is introduced on the basis of ELM The output function of (a) is as follows:
where σ is the kernel width parameter.
Is filled up with u1After the missing value of the moment, the u-th order2Predicting and filling missing values of sampling time, wherein the variable to which the missing values of the sampling time belong is v3Corresponding to missing data beingWill miss dataThe residual data after the missing value is removed at the sampling time is recorded as the prediction model output of the modelWill bePrediction model output as a modelEntering; then selecting XSM-CFind v in3Data corresponding to variablesOutput labels as model training data, XSM-CThe rest of the dataAs input for model training data;
to fill in u2The kernel limit learning machine model of the missing value at each moment is recorded asData set for training modelAt u2The specific data corresponding to the time isAfter confirming the input and output data of the model, training according to the aboveSame step trainingFinally obtaining the predicted missing valueXSM-NCAfter all the missing values are filled up, a complete data set X is finally obtainedf。
Step S5: utilizing L-ISOMAP algorithm to carry out pair on filled data set XfCarrying out feature extraction;
high-dimensional training data set X by L-ISOMAP algorithmf∈Rm×nMapping to a low-dimensional matrix Y ∈ Rm×dWherein X isfThe method comprises the following steps of (1) obtaining a matrix with m sample numbers and n dimension; y is a matrix with the sample number of m and the dimension of d; in-process industrial processesThe dimension represents the number of variables in the process.
As shown in FIG. 2, the dimension reduction process of the L-ISOMAP algorithm is as follows:
1) selecting m' landmark points;
in the traditional ISOMAP algorithm, no matter the distance between every two m sample points needs to be calculated when the Euclidean distance is calculated, when the value of m is large, the algorithm has high calculation complexity; the L-ISOMAP algorithm randomly selects m ' samples from m samples as landmark points, wherein m ' < m, and only the distance between the m ' landmark points needs to be calculated, so that the complexity is greatly reduced;
2) constructing a neighbor neighborhood graph G;
calculating Euclidean distances between m' landmark point pairs, data point pairs (X)fi,Xfj) Is recorded as dXm′(Xfi,Xfj) The calculation formula is as follows:
setting a distance threshold, selecting proper neighbors, and constructing a neighbor neighborhood graph G;
3) calculating the geodesic distance (Dijkstra distance) between the high-dimensional data, namely the shortest path;
by calculating X on the neighborhood map Gfi,XfjGeodesic distance d between two pointsDm′(Xfi,Xfj) To approximate the geodesic distance of the original manifold, if Xfi,XfjThe two points are shared, and then:
dDm′(Xfi,Xfj)=dXm′(Xfi,Xfj) (13)
otherwise, there are:
dDm′(xfi,xfj)=min{dDm′(xfi,xfj),dDm′(xfi,xfp)+dDm′(xfp,xfj)} (14)
wherein d isDm′(Xfi,Xfj)=∞,i,j=1,2,…,m′,p=1,2,…,m′;
Geodesic distance matrix DDm′The method is composed of the square of geodesic distance, and the concrete form is as follows:
4) determining an inner product matrix Bm′;
Wherein Hm′Is a centralized matrix, which is specifically defined as follows:
δij=[DDm′]ij (18)
wherein deltaijRepresents Xfi,XfjThe square of the distance between the two points;
5) d-dimensional embedding of landmark points is obtained;
solving to obtain a matrix Bm′Corresponding maximum d eigenvalues λ1≥λ2≥…λdD eigenvectors corresponding to the eigenvalues are [ v ]1,v2,…,vd]Thus d-dimensional embedding matrix L of landmark pointsdCan be expressed as:
7) calculating the distance between the data point except the landmark point in the data set and the landmark point, namely the distance between a certain point r in the rest data points and the landmark point is marked as dDmm′(Xfr,Xfj) The distance squares form a matrix, and the vector formed by the columns of the data points r in the matrix is recorded as
8) Solving a matrix LdIs pseudo-inverse transpose matrix L# d;
9) Computing a d-dimensional embedding matrix L for the remaining data pointsrd;
from this, a d-dimensional embedding matrix L of the remaining data points can be determinedrd。
10) A Principal Component Analysis (PCA) algorithm realizes embedded coordinate alignment;
obtaining the d-dimensional embedded matrix X through the stepsfd∈Rm×dRealizing coordinate alignment by using PCA (principal component analysis) standardization method to obtain aligned d-dimensional feature matrix Y ∈ Rm×d。
Step S6: calculating a mapping matrix A;
in order to calculate real-time statistics conveniently, a mapping matrix A of original high-dimensional data projected to a low-dimensional space is solved through a local linear regression idea:
Y=AXf(23)
A=YXf T(XfXf T)-1 (24)
step S7: constructing an offline data fault detection statistic and a control limit;
for offline data XfSeparately constructing feature space statisticsAnd residual spatial Statistics (SPE)f):
Tf 2=YS-1Y (25)
SPEf=||(I-ATA)Xf||2 (26)
Where S is the covariance matrix and,
S=YYT/(m-1) (27)
separately computing using a kernel density estimation methodAnd SPEfA control limit of (d); if the confidence coefficient is 0.99, α is 0.01, and therefore the control limit can be derived by the following equationAnd SPEucl:
Step S8: calculating online data statistics to realize real-time detection;
if real-time data x is observedtNormalized to obtain xrtObtaining a low-dimensional mapping y of the real-time data by the mapping matrix Art:
yrt=Axrt (30)
Calculating real-time data statistics:
Trt 2=yrtS-1yrt (31)
SPErt=||(I-ATA)Xrt||2 (32)
the online detection is realized through two statistics, if the online data statistics is larger than the control limit, the process is indicated to have a fault, namely the fault occurs when the following conditions occur:
in an industrial field of process industrial production, data loss can occur in the process of collecting, transmitting, storing and the like of process industrial process data due to various reasons such as equipment aging, wrong operation, technical bottlenecks and the like. The invention provides a fault detection method under the condition of data deficiency, which comprises the steps of firstly, effectively predicting the deficient data through a kernel limit learning machine model updated by the model, after obtaining a complete training data set, utilizing a landmark equidistant mapping algorithm to carry out feature extraction, establishing corresponding statistics and control limits, and realizing fault detection.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A process industrial process fault detection method based on data loss is characterized by comprising the following steps:
step S1: sampling and processing data of the process industrial process;
step S2: filling missing data in the sampled data by using a kernel extreme learning machine KELM;
step S3: performing low-dimensional feature extraction on the data by adopting a landmark equidistant mapping method L-ISOMAP;
step S4: and calculating statistics and controlling the current situation in the feature space and the residual error space respectively, and performing fault detection.
2. The method for fault detection of process industrial process based on data loss according to claim 1, wherein the step S1 comprises the following steps:
step S101: sampling data of a normally running process industrial process, simulating various industrial field reasons to perform deletion exception processing on the data, and obtaining an incomplete deletion data set X containing various deletion typesM,XM∈Rm×nWherein R ism×nRepresenting a real matrix with m samples and n dimensions;
step S102: for missing data set XMCarrying out standardization processing to obtain a new data set XSM;
Step S103: find dataset XSMThe position of the missing data in (1) divides all sampling points containing the missing value into a data set XSM-NCAnd dividing the complete sample point data into another data set XSM-C。
3. The method for detecting the fault of the process industrial process based on the data missing as claimed in claim 1, wherein the step S2 is specifically as follows:
step S201: determining KELMiInput and output data of the model;
for the ith sampling point, finding the variable v to which the missing value belongsms_iV is to bems_iCorresponding data NanNCiAs a value to be predicted, the observed variable excluding the missing value in the sample point is defined as vob_iV is to beob_iCorresponding data XNCiAs KELMiTest input of the model;
complete data set XSM-CAs KELMiTraining data of the model, XSM-CMiddle variable vob_iCorresponding data XCiAs input, XSM-CMiddle variable vms_iCorresponding data YCiAs a model output, a data set with P sample points is constructed asWherein XCi∈RP×TRepresenting training input XCiIs a data point of dimension T, YCi∈RP×KIndicating label YCiData points in K dimension, xCi_tTraining data representing the t-th sample point, yCi_tA label representing the t-th sample point;
step S202: KELM for establishing ith sampling momentiA model;
step S203: predicting missing data of the ith sample point;
step S204: mixing XSM-NCFilling all the moments with missing values to obtain a complete data set Xf。
4. The method for detecting process industrial process faults based on data loss according to claim 3, wherein the step S202 specifically comprises:
the extreme learning machine ELM is a special single-hidden-layer feedforward neural network SLFNs, and aiming at the ith sampling moment, the SLFNs meet the following expression:
wherein L represents the number of nodes of the hidden layer, G (x)Ci_j,aq,bq) Representing the activation function, xCi_jIs the training data for the model and is,qis shown asqA layer implies a layer node; a is in the form of RT×LFor inputting the weight matrix, b is an element of R1×LTo imply layer bias, β ∈ RL×KAs an output weight matrix, y* Ci_jAn output value representing the model;
the parameters a and b in the extreme learning machine ELM model are randomly determined, only the output weight matrix parameter beta is required to be obtained, and the corresponding output of the extreme learning machine ELM is as follows:
YCi *=Hβ (2)
where H represents the feature mapping matrix:
wherein g (x)Ci_1,aq,bq) For activating a function matrix G (x)Ci_j,aq,bq) An element of (1);
Wherein HTRepresenting the transposition of a characteristic mapping matrix H, C representing a regularization parameter, I representing an identity matrix, and P representing the number of samples;
the output function of the ELM is expressed as:
wherein h (x)Ci) Is xCiA mapping function of (a);
introducing Mercer theorem to construct KELM on the basis of ELMiSaid KELMiThe output function of (a) is as follows:
wherein omegaiThe kernel function matrix trained to fill the missing values of the ith sample point is expressed as:
K(xCi_α,xCi_β) Is represented by XCiTwo elements x in (1)Ci_α,xCi_βConstructed radial basis kernel function:
6. The method for fault detection of process industrial process based on data loss according to claim 1, wherein the step S3 comprises the following steps:
step S301: randomly selecting m' samples from m samples as landmark points;
step S302: constructing a neighbor neighborhood graph G;
calculating Euclidean distances between m' landmark point pairs, data point pairs (X)fi,Xfj) Is recorded as dXm′(Xfi,Xfj) (ii) a Setting a distance threshold, selecting proper neighbors, and constructing a neighbor neighborhood graph G;
step S303: calculating the Dijkstra distance between the geodesic lines of the high-dimensional data, namely the shortest path;
by calculating X on the neighborhood map Gfi,XfjGeodesic distance d between two pointsDm′(Xfi,Xfj) To approximate the geodesic distance of the original manifold, a geodesic distance matrix DDm′Consisting of the square of the geodesic distance;
step S304: determining an inner product matrix Bm′:
Wherein Hm′Is a centralized matrix;
step S305: obtaining a d-dimensional embedding matrix L of landmark pointsd:
Solving to obtain a matrix Bm′Corresponding maximum d eigenvalues λ1≥λ2≥…λdD eigenvectors corresponding to the eigenvalues are [ v ]1,v2,…,vd]Thus d-dimensional embedding matrix L of landmark pointsdExpressed as:
Step S307: calculating the distance between the data point except the landmark point in the data set and the landmark point, namely the distance between a certain point r in the rest data points and the landmark point is marked as dDmm′(Xfr,Xfj) The distance squares form a matrix, and the vector formed by the columns of the data points r in the matrix is recorded as
Step S308: solving a matrix LdIs pseudo-inverse transpose matrix L# d
Step S309: computing a d-dimensional embedding matrix L for the remaining data pointsrd;
Step S310: adopting a Principal Component Analysis (PCA) algorithm to realize embedded coordinate alignment;
d-dimension embedded matrix X is obtained through calculationfd∈Rm×dRealizing coordinate alignment by using PCA (principal component analysis) standardization method to obtain aligned d-dimensional feature matrix Y ∈ Rm×d。
7. The method for detecting faults of process industrial process based on data loss according to claim 6, wherein the number of landmark sample samples in step S301 satisfies m' < m.
8. The method for fault detection of process industrial process based on data loss according to claim 1, wherein the step S4 comprises the following steps:
step S401: calculating a mapping matrix A;
solving a mapping matrix A of the original high-dimensional data projected to the low-dimensional space through a local linear regression idea:
Y=AXf (12)
A=YXf T(XfXf T)-1 (13)
wherein XfFor filling up the complete data set after missing data, Y is a feature matrix;
step S402: constructing an offline data fault detection statistic and a control limit;
step S403: and calculating the online data statistic for real-time monitoring.
9. The method for detecting process industrial process faults based on data loss according to claim 8, wherein the step S402 specifically comprises: for offline data XfSeparately constructing feature space statisticsAnd residual spatial statistics SPEf(ii) a And calculating respectively by adopting a kernel density estimation algorithmAnd SPEfControl limit ofAnd SPEucl。
10. The method for detecting process industrial process faults based on data loss according to claim 8, wherein the step S403 specifically includes: standardizing observed real-time data xtTo obtain xrtObtaining a low-dimensional mapping y of the real-time data by the mapping matrix ArtComprises the following steps:
yrt=Axrt (14)
computing real-time data statistics Trt 2And SPErtAnd if the online data statistic is larger than the control limit, indicating that the process has a fault.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110987661.8A CN113743489B (en) | 2021-08-26 | 2021-08-26 | Data loss-based fault detection method for process industrial process |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110987661.8A CN113743489B (en) | 2021-08-26 | 2021-08-26 | Data loss-based fault detection method for process industrial process |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113743489A true CN113743489A (en) | 2021-12-03 |
CN113743489B CN113743489B (en) | 2023-09-29 |
Family
ID=78733173
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110987661.8A Active CN113743489B (en) | 2021-08-26 | 2021-08-26 | Data loss-based fault detection method for process industrial process |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113743489B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107092923A (en) * | 2017-03-22 | 2017-08-25 | 东北大学 | The electric melting magnesium furnace process monitoring method of method is locally linear embedding into based on improvement supervision core |
CN108181894A (en) * | 2017-12-15 | 2018-06-19 | 宁波大学 | A kind of nongausian process monitoring method that strategy is returned based on trimming independent entry |
CN108960329A (en) * | 2018-07-06 | 2018-12-07 | 浙江科技学院 | A kind of chemical process fault detection method comprising missing data |
CN111142501A (en) * | 2019-12-27 | 2020-05-12 | 浙江科技学院 | Fault detection method based on semi-supervised autoregressive dynamic hidden variable model |
US20200271720A1 (en) * | 2020-05-09 | 2020-08-27 | Hefei University Of Technology | Method for diagnosing analog circuit fault based on vector-valued regularized kernel function approximation |
-
2021
- 2021-08-26 CN CN202110987661.8A patent/CN113743489B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107092923A (en) * | 2017-03-22 | 2017-08-25 | 东北大学 | The electric melting magnesium furnace process monitoring method of method is locally linear embedding into based on improvement supervision core |
CN108181894A (en) * | 2017-12-15 | 2018-06-19 | 宁波大学 | A kind of nongausian process monitoring method that strategy is returned based on trimming independent entry |
CN108960329A (en) * | 2018-07-06 | 2018-12-07 | 浙江科技学院 | A kind of chemical process fault detection method comprising missing data |
CN111142501A (en) * | 2019-12-27 | 2020-05-12 | 浙江科技学院 | Fault detection method based on semi-supervised autoregressive dynamic hidden variable model |
US20200271720A1 (en) * | 2020-05-09 | 2020-08-27 | Hefei University Of Technology | Method for diagnosing analog circuit fault based on vector-valued regularized kernel function approximation |
Non-Patent Citations (1)
Title |
---|
张妮;田学民;蔡连芳;: "基于RISOMAP的非线性过程故障检测方法", 化工学报, no. 06 * |
Also Published As
Publication number | Publication date |
---|---|
CN113743489B (en) | 2023-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3462267B1 (en) | Anomaly diagnosis method and anomaly diagnosis apparatus | |
Malhotra et al. | Long Short Term Memory Networks for Anomaly Detection in Time Series. | |
CN109612513B (en) | Online anomaly detection method for large-scale high-dimensional sensor data | |
El-Koujok et al. | Multiple sensor fault diagnosis by evolving data-driven approach | |
CN107403196B (en) | Method for predicting concentration of butane at bottom of debutanizer by instant learning modeling based on spectral clustering analysis | |
CN111709577B (en) | RUL prediction method based on long-range correlation GAN-LSTM | |
CN114970715A (en) | Variable working condition fault diagnosis method and system under small sample and unbalanced data constraint | |
Simula et al. | Analysis and modeling of complex systems using the self-organizing map | |
CN117272196A (en) | Industrial time sequence data anomaly detection method based on time-space diagram attention network | |
Yang et al. | A study on software reliability prediction based on support vector machines | |
CN114048546B (en) | Method for predicting residual service life of aeroengine based on graph convolution network and unsupervised domain self-adaption | |
CN113343587A (en) | Flow abnormity detection method for electric power industrial control network | |
CN112822184B (en) | Unsupervised autonomous attack detection method in endogenous security system | |
CN116894180A (en) | Product manufacturing quality prediction method based on different composition attention network | |
CN113743489B (en) | Data loss-based fault detection method for process industrial process | |
Salah et al. | Echo state network and particle swarm optimization for prognostics of a complex system | |
Davila-Frias et al. | Deep neural networks for all-terminal network reliability estimation | |
CN116541794B (en) | Sensor data anomaly detection method based on self-adaptive graph annotation network | |
CN117272244B (en) | Soft measurement modeling method integrating feature extraction and self-adaptive composition | |
Lin et al. | Quality-Related Fault Detection for Nonlinear Processes Based on Neural Orthonormal Subspace Analysis | |
CN117274616B (en) | Multi-feature fusion deep learning service QoS prediction system and prediction method | |
CN113283174B (en) | Reservoir productivity prediction method, system and terminal based on algorithm integration and self-control | |
CN115439721B (en) | Method and device for training classification model of power equipment with few abnormal samples | |
Yang et al. | Advanced run-to-run controller in semiconductor manufacturing with real-time equipment condition: APC: Advanced process control; AM: Advanced metrology | |
Oh et al. | LSTM-based PdM Platform for Automobile SCU Inspection Equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |