CN115167369A

CN115167369A - Industrial data fault detection method and device and electronic equipment

Info

Publication number: CN115167369A
Application number: CN202210924981.3A
Authority: CN
Inventors: 戴絮年; 张壹芬; 黄繁秋; 林皆超; 郑二磊; 韦金银; 李鼎
Original assignee: Zhejiang Supcon Technology Co Ltd
Current assignee: Zhejiang Supcon Technology Co Ltd
Priority date: 2022-08-02
Filing date: 2022-08-02
Publication date: 2022-10-11

Abstract

The application discloses a fault detection method and device for industrial data and electronic equipment. Wherein, the method comprises the following steps: acquiring target industrial data, and determining feature score data of the target industrial data in a target dimensional space; determining a first distance between each sample data in the feature score data and the rest of the sample data, and determining a plurality of sample sets according to the first distances; and determining a statistical value corresponding to each sample data according to the plurality of sample sets, wherein the statistical value is determined by the Mahalanobis distance corresponding to each sample data, and determining a control limit of the target industrial data according to the statistical value, wherein the control limit is used for judging whether the target industrial data corresponding to the sample data in the feature score data is fault data or not. The application solves the problem of adopting Hotelling's T in the prior art ² As a statistic in process fault detection, the technical problem of low fault detection rate is solved when industrial process data do not comply with multivariate Gaussian distribution.

Description

Industrial data fault detection method and device and electronic equipment

Technical Field

The application relates to the field of fault detection, in particular to a fault detection method and device for industrial data and electronic equipment.

Background

On the premise of ensuring the normal operation of a chemical industrial system, the method reduces the occurrence of abnormal states of the system, and improves the production efficiency, which is always the key focus of the chemical industry. The amount of data generated during industrial processes is often quite large. The data often contain most of the information in the production process, and it is necessary to make reasonable and effective use of the data. The fault detection method based on the multivariate statistical analysis is used for carrying out dimension reduction processing and analysis on process data so as to judge whether the state information of the process operation is abnormal or not. In industrial process fault detection, the local structure between data is very important. The Neighborhood Preserving Embedding (NPE) algorithm is an algorithm for keeping local geometry of data after dimensionality reduction similar to original data. However, the NPE algorithm cannot be directly applied to process fault detection, and the NPE algorithm needs to be matched with corresponding statistics to achieve the purpose of performing fault detection on the process.

In the prior art, when an NPE fault detection algorithm is adopted, hotelling's T is usually adopted ² As a statistic in process fault detection, but the statistic requires that process data obey a multivariate gaussian distribution, whereas industrial process data generally do not obey the distribution, thereby failing to achieve fault detection of industrial data.

Disclosure of Invention

The embodiment of the application provides a fault detection method and device for industrial data and electronic equipment, and at least solves the problem that Hotelling's T is adopted in the prior art ² As a statistic in process fault detection, the technical problem of low fault detection rate is solved when industrial process data do not obey multivariate Gaussian distribution.

According to an aspect of an embodiment of the present application, there is provided a method for detecting a fault of industrial data, including: acquiring target industrial data, and determining feature score data of the target industrial data in a target dimensional space, wherein the target industrial data is obtained by standardizing original industrial data, and the feature score data comprises a plurality of sample data; determining a first distance between each sample data in the feature score data and the rest sample data, and determining a plurality of sample sets according to the first distance, wherein each sample set in the plurality of sample sets consists of a plurality of neighbor sample data closest to the target sample data, the target sample data is any sample data in the feature score data, and each sample data corresponds to one sample set; and determining a statistical value corresponding to each sample data according to the plurality of sample sets, and determining a control limit of the target industrial data according to the statistical value, wherein the statistical value is determined by the Mahalanobis distance corresponding to each sample data, and the control limit is used for judging whether the target industrial data corresponding to the sample data in the feature score data is fault data.

Optionally, obtaining target industrial data comprises: acquiring original industrial data, and determining the mean value and standard deviation of the original industrial data; and carrying out standardization processing on the original industrial data according to the mean value and the standard deviation to obtain target industrial data.

Optionally, determining feature score data of the target industrial data within the target dimensional space comprises: determining a projection matrix of the target industrial data; and determining the feature score data of the target industrial data in the target dimensional space according to the target industrial data and the projection matrix.

Optionally, determining a projection matrix of the target industrial data comprises: acquiring a first objective function, wherein the first objective function comprises weight values corresponding to a plurality of neighbor sample data in each sample set, each sample data and a plurality of sample sets; solving the first objective function to obtain a plurality of corresponding target weight values when the first objective function takes the minimum value; substituting the plurality of target weight values into a second objective function to solve to obtain a plurality of characteristic values and corresponding characteristic vectors, wherein the second objective function is used for calculating projection vectors in the projection matrix; and taking the eigenvalues of the target quantity from the plurality of eigenvalues, and determining a projection matrix according to the eigenvectors corresponding to the eigenvalues of the target quantity, wherein the eigenvalues of the target quantity are the eigenvalues of the front target quantity after the plurality of eigenvalues are sequenced according to a preset sequence.

Optionally, determining a statistical value corresponding to each sample data according to a plurality of sample sets, including: acquiring each sample data in the characteristic score data and a sample set corresponding to each sample data; determining an average value corresponding to a plurality of neighbor sample data in each sample set; converting each sample data into target data according to the average value of each sample data and the sample set corresponding to each sample data, wherein the target data approximately obeys Gaussian distribution; and determining a target matrix consisting of a plurality of target data, and determining a statistical value corresponding to each sample data according to the target matrix.

Optionally, determining a statistical value corresponding to each sample data according to the target matrix, including: calculating a covariance matrix of the target matrix; and determining a statistical value corresponding to each sample data according to the target data and the covariance matrix.

Optionally, after determining the statistical value corresponding to each sample data, the method further includes: under the condition that the statistical value is larger than the control limit, determining sample data corresponding to the statistical value as abnormal data; and under the condition that the statistical value is less than or equal to the control limit, determining the sample data corresponding to the statistical value as normal data.

According to another aspect of the embodiments of the present application, there is also provided an apparatus for detecting a fault of industrial data, including: the acquisition module is used for acquiring target industrial data and determining feature score data of the target industrial data in a target dimensional space, wherein the target industrial data is obtained by standardizing original industrial data, and the feature score data comprises a plurality of sample data; the first determining module is used for determining a first distance between each sample data in the feature score data and the rest of the sample data, and determining a plurality of sample sets according to the first distance, wherein each sample set in the plurality of sample sets consists of a plurality of neighbor sample data closest to the target sample data, the target sample data is any sample data in the feature score data, and each sample data corresponds to one sample set; and the second determining module is used for determining a statistical value corresponding to each sample data according to the plurality of sample sets, and determining a control limit of the target industrial data according to the statistical value, wherein the statistical value is determined by the Mahalanobis distance corresponding to each sample data, and the control limit is used for judging whether the target industrial data corresponding to the sample data in the feature score data is fault data.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including: a memory for storing program instructions; a processor, coupled to the memory, for executing program instructions that implement the following functions: acquiring target industrial data, and determining feature score data of the target industrial data in a target dimensional space, wherein the target industrial data is obtained by standardizing original industrial data, and the feature score data comprises a plurality of sample data; determining a first distance between each sample data in the feature score data and the rest of sample data, and determining a plurality of sample sets according to the first distance, wherein each sample set in the plurality of sample sets consists of a plurality of adjacent sample data closest to a target sample data, the target sample data is any sample data in the feature score data, and each sample data corresponds to one sample set; and determining a statistical value corresponding to each sample data according to the plurality of sample sets, and determining a control limit of the target industrial data according to the statistical value, wherein the statistical value is determined by the Mahalanobis distance corresponding to each sample data, and the control limit is used for judging whether the target industrial data corresponding to the sample data in the feature score data is fault data.

According to another aspect of the embodiments of the present application, a nonvolatile storage medium is further provided, where the nonvolatile storage medium includes a stored program, and when the program runs, a device where the nonvolatile storage medium is located is controlled to execute the method for detecting the fault of the industrial data.

In the embodiment of the application, the target industrial data is obtained by standardizing the original industrial data, and the target industrial data is processedPerforming data dimension reduction, determining a plurality of neighbor samples corresponding to each sample data in the feature score data after dimension reduction, calculating a statistic value corresponding to each sample data, determining a control limit according to the statistic value, and achieving the purpose of determining whether the industrial data is fault data according to the control limit, thereby realizing the technical effect of applying the NPE algorithm to the fault detection of the industrial data, and further solving the problem that the prior art adopts Hotelling's T ² As a statistic in process fault detection, the technical problem of low fault detection rate is solved when industrial process data do not obey multivariate Gaussian distribution.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a block diagram of a hardware structure of a computer terminal (or an electronic device) for implementing a fault detection method of industrial data according to an embodiment of the present application;

FIG. 2 is a flow chart of a method of fault detection of industrial data according to an embodiment of the present application;

FIG. 3 is a block diagram of an industrial data fault detection device according to an embodiment of the present application;

FIG. 4a is a flowchart of a method for detecting a failure based on neighborhood preserving embedding of quadratic distance according to an embodiment of the present application;

FIG. 4b is a control diagram of fault detection corresponding to a first type of fault according to an embodiment of the present application;

FIG. 4c is a diagram illustrating a fault detection control corresponding to a second type of fault according to an embodiment of the present application;

fig. 4d is a control diagram of fault detection corresponding to a third type of fault according to the embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the accompanying drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

and (4) control limit: refers to the control range specified when the quality control procedure is performed on the analytical test.

Statistic amount: is a generic term, including what may be called Hotelling's T ² Or mahalanobis distance, the statistical quantity in the embodiment of the present application is mahalanobis distance, that is, the embodiment of the present application uses mahalanobis distance as the statistical quantity to determine the state of the sample.

In order to solve the adverse effects on fault detection caused by huge data volume, data structure maintenance and unclear data distribution in the industrial process, the embodiment of the application provides a neighborhood maintenance embedded fault detection algorithm based on the secondary distance. The fault detection algorithm can keep the local structure information of data, meanwhile, newly adopted statistics does not need specific distribution of the data, faults occurring in the industrial production process can be effectively detected, possible faults can be fed back for a factory quickly, and loss caused by the faults is reduced.

The embodiment of the method for detecting the fault of the industrial data provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal or a similar operation device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or electronic device) for implementing a fault detection method of industrial data. As shown in fig. 1, the computer terminal 10 (or electronic device 10) may include one or more (shown as 102a, 102b, \8230; 102 n) processors (which may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission module 106 for communication functions. In addition, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module or incorporated, in whole or in part, into any of the other elements in the computer terminal 10 (or electronic device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 can be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the fault detection method of the industrial data in the embodiment of the present application, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implements the fault detection method of the industrial data. The memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission module 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet via wireless.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or electronic device).

It should be noted here that in some alternative embodiments, the computer device (or electronic device) shown in fig. 1 may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or electronic device) described above.

In the above operating environment, the embodiments of the present application provide an embodiment of a method for fault detection of industrial data, it should be noted that the steps illustrated in the flowcharts of the figures may be executed in a computer system, such as a set of computer executable instructions, and although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be executed in an order different from that shown.

Fig. 2 is a flowchart of a method for fault detection of industrial data according to an embodiment of the present application, as shown in fig. 2, the method includes the following steps:

step S202, target industrial data are obtained, and feature score data of the target industrial data in a target dimensional space are determined, wherein the target industrial data are obtained after the original industrial data are subjected to standardization processing, and the feature score data comprise a plurality of sample data.

In step S202, in order to eliminate adverse effects on data dimension reduction caused by dimensions between different variables of data of the Tennessee Eastman (TE) process (hereinafter referred to as TE process), data needs to be standardized. In the embodiment of the application, the TE process data is analog simulation data of an actual chemical production process, the simulation data is standardized to obtain target industrial data, and the target industrial data is subjected to data dimension reduction, which is specifically expressed as determining feature score data of the target industrial data in a target dimension space, where the target dimension space is a feature space after dimension reduction.

Step S204, determining a first distance between each sample data in the feature score data and the rest sample data, and determining a plurality of sample sets according to the first distance, wherein each sample set in the plurality of sample sets consists of a plurality of neighbor sample data closest to the target sample data, the target sample data is any sample data in the feature score data, and each sample data corresponds to one sample set.

The step S204 is mainly used to determine a plurality of neighboring sample data corresponding to each sample data after dimensionality reduction, specifically, for example, k neighboring sample data closest to each sample data are determined, the k neighboring sample data form a sample set, each sample data corresponds to a sample set, wherein the first distance in the step S204 may calculate a distance between each sample data and the rest of the sample data by using a euclidean distance, for example, the calculated distances are sorted in a descending order, and the top k distances are taken as the k neighboring samples of the target sample data.

Step S206, according to a plurality of sample sets, determining a statistic value corresponding to each sample data, and determining a control limit of the target industrial data according to the statistic value, wherein the statistic value is determined by the Mahalanobis distance corresponding to each sample data, and the control limit is used for judging whether the target industrial data corresponding to the sample data in the feature score data is fault data or not.

In the step S206, according to the plurality of sample sets, that is, k neighboring sample data corresponding to each sample data, the sample data is converted into data with approximate gaussian distribution to obtain target data, and a mahalanobis distance of the target data is calculated, that is, a statistical value, so that a control limit is determined according to the statistical value, and it can be determined whether the target industrial data is fault data.

In the above steps S202 to S206, the original industrial data is standardized to obtain the target industrial data, the data dimension reduction is performed on the target industrial data, a plurality of neighbor samples corresponding to each sample data in the feature score data after the dimension reduction are determined, a statistical value corresponding to each sample data is calculated, and a control limit is determined according to the statistical value, so as to achieve the purpose of determining whether the industrial data is fault data according to the control limit, thereby achieving the technical effect of applying the NPE algorithm to fault detection of the industrial data, and further solving the problem of adopting Hotelling' S T in the prior art ² As a statistic in process fault detection, the technical problem of low fault detection rate is solved when industrial process data do not comply with multivariate Gaussian distribution.

In step S202 of the method for detecting a fault of industrial data, the method for acquiring target industrial data specifically includes the following steps: acquiring original industrial data, and determining the mean value and standard deviation of the original industrial data; and carrying out standardization processing on the original industrial data according to the mean value and the standard deviation to obtain target industrial data.

Taking TE process data as an example, let TE process original data be

Wherein m is the number of samples, n is the number of process variables,

is the (i) th sampling point (x),

normalized to:

wherein x is _i Is composed of

The normalized data, i.e. the target industrial data,

is composed of

The average value of (a) is calculated,

is composed of

Is obtained from the variance

Standard deviation of (2).

In step S202 in the method for detecting industrial data failure, determining feature score data of target industrial data in a target dimensional space specifically includes the following steps: determining a projection matrix of the target industrial data; and determining the characteristic score data of the target industrial data in the target dimensional space according to the target industrial data and the projection matrix.

Specifically, in the embodiment of the present application, the NPE algorithm is applied to calculate the projection matrix a of X, where X is the normalized industrial numberIf X is reduced to b-dimensional space, T = (T) ₁ ,t ₂ ,...,t _m ) ^T I.e. the projection of X into a low dimensional space, i.e. a b dimensional space, is denoted T = (T) ₁ ,t ₂ ,...,t _m ) ^T The T matrix is the above-mentioned target dimension space, T _i And reducing the dimension X to any one feature score data in the target dimension space for the feature score data, wherein T = XA is satisfied.

In the above steps, determining a projection matrix of the target industrial data specifically includes the following steps: acquiring a first objective function, wherein the first objective function comprises weight values corresponding to a plurality of neighbor sample data in each sample set, each sample data and a plurality of sample sets; solving the first objective function to obtain a plurality of corresponding target weight values when the first objective function takes the minimum value; substituting the plurality of target weight values into a second objective function to solve to obtain a plurality of characteristic values and corresponding characteristic vectors, wherein the second objective function is used for calculating projection vectors in the projection matrix; and taking the eigenvalues of the target quantity from the plurality of eigenvalues, and determining a projection matrix according to the eigenvectors corresponding to the eigenvalues of the target quantity, wherein the eigenvalues of the target quantity are the eigenvalues of the front target quantity after the plurality of eigenvalues are sorted according to a preset sequence.

In the embodiment of the application, x is calculated under certain constraint conditions _i The distance from the first k adjacent points is determined, and the k adjacent points are weighted by w _ij (j denotes a sample x) _i Of (5) neighbor x _j ) And the weight of the remaining n-k samples is 0. The NPE reconstructs each point by a weighted linear combination W (a matrix of W) of these neighboring data points. The first objective function is expressed as follows:

in the formula of the above first objective function, it is assumed that y = (y) ₁ ,y ₂ ,...,y _m ) ^T For a sample point where X is reduced to one dimension, and a is a projection vector, it can be expressed as y = Xa. y preserves the local geometry of X. Given the constraint, a can be calculated by the following equation, which is the second objective function mentioned above:

s.t.a ^T X ^T Xa＝1

note that w in the second objective function _ij In order to obtain the target weight value when the first objective function takes the minimum value, the second objective function may be converted into:

mina ^T X ^T (I-W) ^T (I-W)Xa

s.t.a ^T X ^T Xa＝1

wherein I is an m-order identity matrix. Let M = (I-W) ^T (I-W), introducing a Lagrangian function, converting the above equation into:

X ^T MXa＝λX ^T Xa

the eigenvector corresponding to the eigenvalue with the minimum above formula is solved as a.

If X is desired to be reduced to dimension b, it is expressed as T = (T) ₁ ,t ₂ ,...,t _m ) ^T Then, the eigenvalues finally obtained by the above formula are sorted, and the eigenvectors corresponding to the first b small eigenvalues are taken to form a projection matrix a = (a) ₁ ,a ₂ ,...,a _b ) And T = XA is satisfied.

In step S206 of the method for detecting a fault of industrial data, a statistical value corresponding to each sample data is determined according to a plurality of sample sets, and the method specifically includes the following steps: acquiring each sample data in the characteristic score data and a sample set corresponding to each sample data; determining an average value corresponding to a plurality of neighbor sample data in each sample set; converting each sample data into target data according to the average value of each sample data and the sample set corresponding to each sample data, wherein the target data approximately obeys Gaussian distribution; and determining a target matrix consisting of a plurality of target data, and determining a statistical value corresponding to each sample data according to the target matrix.

In the embodiment of the present application, after k neighboring sample data corresponding to each sample data is determined, the following formula is applied to calculate

Average value of sample set corresponding to each sample data:

wherein, t _i,l Is t _i The ith neighbor of the sample data of (1),

in order to reduce the influence of non-gaussian characteristics of the data on fault detection, the data is converted into target data with approximate gaussian distribution by adopting the following formula, namely:

determining an object matrix E = (E) composed of a plurality of object data according to the formula ₁ ,e ₂ ,...,e _m ) ^T ，e _i The mahalanobis distance is the statistical value.

In the above steps, the method for determining the statistical value corresponding to each sample data according to the target matrix specifically includes the following steps: calculating a covariance matrix of the target matrix; and determining a statistical value corresponding to each sample data according to the plurality of target data and the covariance matrix.

In the embodiment of the present application, the mahalanobis distance is calculated as follows:

where Λ is the covariance matrix of the target matrix E.

In step S206 of the method for detecting industrial data failure, after determining the statistical value corresponding to each sample data, the method further includes the following steps: under the condition that the statistical value is larger than the control limit, determining sample data corresponding to the statistical value as abnormal data; and under the condition that the statistical value is less than or equal to the control limit, determining the sample data corresponding to the statistical value as normal data.

In the embodiment of the present application, a plurality of statistics constitute D = (D) ₁ ,d ₂ ,...,d _m ) The control limit Dcl for D is determined using a kernel density estimation method. If d is _i Greater than Dcl, then x _i If the sample is determined to be a failure sample, the sample is determined to be normal.

In the embodiment of the present application, the writing algorithm may be developed using Python, but is not limited to Python, java, matlab, R, and other languages.

According to the fault detection method for the industrial data, the NPE algorithm is used for carrying out dimension reduction processing on the data, the local geometric structure of the data is reserved, and redundant noise is eliminated. Meanwhile, a statistic value based on the secondary distance is adopted, the non-Gaussian characteristic of data is eliminated, support is provided for fault detection, the first distance is the Euclidean distance between each sample data and other sample data, the second distance is the Mahalanobis distance, namely the statistic value, and the description shows that the first distance and the second distance are calculated in sequence, namely the first distance is obtained first, and then the second distance is obtained. Compared with the related art, the method provided by the embodiment of the application can not only retain the internal structure information of the data in dimension reduction, but also obtain the change information of the data and the corresponding neighbor thereof. Meanwhile, the method constructs new secondary distance-based statistic to detect the process data, and the new statistic solves the problem of the traditional T ² And the detection rate is low when the statistics monitor the non-Gaussian process.

Fig. 3 is a block diagram of a fault detection apparatus for industrial data according to an embodiment of the present application, as shown in fig. 3, the apparatus including:

the obtaining module 302 is configured to obtain target industrial data and determine feature score data of the target industrial data in a target dimensional space, where the target industrial data is obtained by performing standardization processing on original industrial data, and the feature score data includes multiple sample data;

a first determining module 304, configured to determine a first distance between each sample data in the feature score data and the remaining sample data, and determine a plurality of sample sets according to the first distance, where each sample set in the plurality of sample sets is composed of a plurality of neighboring sample data closest to a target sample data, the target sample data is any one sample data in the feature score data, and each sample data corresponds to one sample set;

the second determining module 306 is configured to determine a statistical value corresponding to each sample data according to the multiple sample sets, and determine a control limit of the target industrial data according to the statistical value, where the statistical value is determined by a mahalanobis distance corresponding to each sample data, and the control limit is used to determine whether the target industrial data corresponding to the sample data in the feature score data is fault data.

It should be noted that the fault detection apparatus for industrial data shown in fig. 3 is used for executing the fault detection method for industrial data shown in fig. 2, and therefore, the related explanations in the fault detection method for industrial data are also applicable to the fault detection apparatus for industrial data, and are not described again here.

Fig. 4a is a flowchart of a secondary distance-based neighborhood preserving embedding fault detection method according to an embodiment of the present application, where in fig. 4a, an offline model is first established, and the model includes the following processes: data standardization, NPE dimension reduction, primary distance calculation, secondary distance calculation, and control limit calculation, wherein the data standardization corresponds to the process of standardization according to a mean value and a standard deviation, the NPE dimension reduction corresponds to the process of projecting the target industrial data into a target dimension space in the step S202, the primary distance calculation corresponds to the process of calculating a first distance in the step S204, and the second distance calculation and the control limit calculation correspond to the process of determining a statistical value and a control limit in the step S206, the offline model is used for judging the relationship between the industrial data and the control limit, and determining parameters (including the mean value, the standard deviation, a projection matrix, and the like) in an industrial data modeling stage, and specifically, the industrial data obtained by online monitoring is input into the model, and the implementation includes: the method comprises the steps of data standardization, NPE dimension reduction, primary distance calculation and secondary distance calculation, the relation between a statistical value corresponding to a sample and a control limit is judged according to the calculated secondary distance, namely the statistical value, when the statistical value is larger than the control limit, the corresponding industrial data is determined to be fault data, and when the statistical value is smaller than the control limit, the corresponding industrial data is determined to be normal data.

In the embodiment of the application, TE raw data is first obtained, and normal, fault 1, fault 2, and fault 8 are selected from the raw data as a data set of the present example. The method provided by the application has extremely high detection performance for three different types of faults. Specific fault detection control diagrams are shown in fig. 4b to 4d. In summary, the fault detection result of the method and the device is high in precision, and the trained detection model can be deployed on a system to realize TE process fault detection.

The embodiment of the application further provides a nonvolatile storage medium, which includes a stored program, wherein when the program runs, the device where the nonvolatile storage medium is located is controlled to execute the following fault detection method for the industrial data: acquiring target industrial data, and determining feature score data of the target industrial data in a target dimensional space, wherein the target industrial data is obtained by standardizing original industrial data, and the feature score data comprises a plurality of sample data; determining a first distance between each sample data in the feature score data and the rest of sample data, and determining a plurality of sample sets according to the first distance, wherein each sample set in the plurality of sample sets consists of a plurality of adjacent sample data closest to a target sample data, the target sample data is any sample data in the feature score data, and each sample data corresponds to one sample set; and determining a statistical value corresponding to each sample data according to the plurality of sample sets, and determining a control limit of the target industrial data according to the statistical value, wherein the statistical value is determined by the Mahalanobis distance corresponding to each sample data, and the control limit is used for judging whether the target industrial data corresponding to the sample data in the feature score data is fault data.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

In the embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technical content can be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present application and it should be noted that, as will be apparent to those skilled in the art, numerous modifications and adaptations can be made without departing from the principles of the present application and such modifications and adaptations are intended to be considered within the scope of the present application.

Claims

1. A method of fault detection of industrial data, comprising:

acquiring target industrial data, and determining feature score data of the target industrial data in a target dimensional space, wherein the target industrial data is obtained by standardizing original industrial data, and the feature score data comprises a plurality of sample data;

determining a first distance between each sample data in the feature score data and the rest of sample data, and determining a plurality of sample sets according to the first distance, wherein each sample set in the plurality of sample sets consists of a plurality of neighbor sample data closest to a target sample data, the target sample data is any sample data in the feature score data, and each sample data corresponds to one sample set;

and determining a statistical value corresponding to each sample data according to the plurality of sample sets, and determining a control limit of the target industrial data according to the statistical value, wherein the statistical value is determined by the mahalanobis distance corresponding to each sample data, and the control limit is used for judging whether the target industrial data corresponding to the sample data in the feature score data is fault data.

2. The method of claim 1, wherein obtaining target industrial data comprises:

acquiring the original industrial data, and determining the mean value and the standard deviation of the original industrial data;

and standardizing the original industrial data according to the mean value and the standard deviation to obtain the target industrial data.

3. The method of claim 1, wherein determining feature score data for the target industrial data within a target dimensional space comprises:

determining a projection matrix of the target industrial data;

and determining feature score data of the target industrial data in the target dimensional space according to the target industrial data and the projection matrix.

4. The method of claim 3, wherein determining a projection matrix for the target industrial data comprises:

obtaining a first objective function, wherein the first objective function includes weight values corresponding to a plurality of neighboring sample data in each sample set, the each sample data, and the plurality of sample sets;

solving the first objective function to obtain a plurality of corresponding objective weight values when the first objective function takes the minimum value;

substituting the target weight values into a second target function to solve to obtain a plurality of characteristic values and corresponding characteristic vectors, wherein the second target function is used for calculating the projection vectors in the projection matrix;

and taking a target number of eigenvalues from the plurality of eigenvalues, and determining the projection matrix according to eigenvectors corresponding to the target number of eigenvalues, wherein the target number of eigenvalues is a front target number of eigenvalues obtained by sorting the plurality of eigenvalues according to a preset sequence.

5. The method according to claim 1, wherein determining statistics corresponding to each of the sample data according to the plurality of sample sets comprises:

acquiring each sample data in the feature score data and a sample set corresponding to each sample data;

determining an average value corresponding to a plurality of neighbor sample data in each sample set;

converting each sample data into target data according to the average value of each sample data and the sample set corresponding to each sample data, wherein the target data approximately obeys Gaussian distribution;

and determining a target matrix consisting of a plurality of target data, and determining a statistical value corresponding to each sample data according to the target matrix.

6. The method of claim 5, wherein determining the statistical value corresponding to each sample data according to the target matrix comprises:

calculating a covariance matrix of the target matrix;

and determining a statistical value corresponding to each sample data according to the target data and the covariance matrix.

7. The method of claim 1, wherein after determining the statistics corresponding to each sample data, the method further comprises:

when the statistic value is larger than the control limit, determining sample data corresponding to the statistic value as abnormal data;

and under the condition that the statistical value is less than or equal to the control limit, determining the sample data corresponding to the statistical value as normal data.

8. An apparatus for fault detection of industrial data, comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring target industrial data and determining feature score data of the target industrial data in a target dimensional space, the target industrial data is obtained by standardizing original industrial data, and the feature score data comprises a plurality of sample data;

a first determining module, configured to determine a first distance between each sample data in the feature score data and the remaining sample data, and determine a plurality of sample sets according to the first distance, where each sample set in the plurality of sample sets is composed of a plurality of neighboring sample data closest to a target sample data, the target sample data is any one sample data in the feature score data, and each sample data corresponds to one sample set;

a second determining module, configured to determine a statistical value corresponding to each sample data according to the multiple sample sets, and determine a control limit of the target industrial data according to the statistical value, where the statistical value is determined by a mahalanobis distance corresponding to each sample data, and the control limit is used to determine whether the target industrial data corresponding to the sample data in the feature score data is faulty data.

9. An electronic device, comprising:

a memory for storing program instructions;

a processor coupled to the memory for executing program instructions that implement the functions of: acquiring target industrial data, and determining feature score data of the target industrial data in a target dimensional space, wherein the target industrial data is obtained by standardizing original industrial data, and the feature score data comprises a plurality of sample data; determining a first distance between each sample data in the feature score data and the rest of sample data, and determining a plurality of sample sets according to the first distance, wherein each sample set in the plurality of sample sets consists of a plurality of neighbor sample data closest to a target sample data, the target sample data is any sample data in the feature score data, and each sample data corresponds to one sample set; determining a statistical value corresponding to each sample data according to the plurality of sample sets, and determining a control limit of the target industrial data according to the statistical value, wherein the statistical value is determined by a Mahalanobis distance corresponding to each sample data, and the control limit is used for judging whether the target industrial data corresponding to the sample data in the feature score data is fault data.

10. A non-volatile storage medium, characterized in that the non-volatile storage medium comprises a stored program, wherein a device in which the non-volatile storage medium is located is controlled to perform a fault detection method for industrial data according to any one of claims 1 to 7 when the program is run.