CN114580472B

CN114580472B - Large-scale equipment fault prediction method with repeated cause and effect and attention in industrial internet

Info

Publication number: CN114580472B
Application number: CN202210187119.9A
Authority: CN
Inventors: 尹小燕; 南鑫; 刘长友; 龚志敏; 王禹; 田苗; 崔瑾; 陈晓江; 房鼎益
Original assignee: Northwest University
Current assignee: Northwest University
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2022-12-23
Anticipated expiration: 2042-02-28
Also published as: CN114580472A

Abstract

The invention provides a large-scale equipment fault prediction method with repeated cause and effect and attention in industrial internet. The prediction method provided by the invention adopts a supervised learning mode, collects fault samples, extracts fault characteristics, constructs a causal analysis model by analyzing potential causal relationship between the characteristics and faults, and realizes the prediction of equipment faults by combining causal analysis and a time attention mechanism. The prediction method is based on causal analysis, the potential relation between the characteristics and the fault prediction accuracy is explored, a feasible method is provided for characteristic selection of a large-scale equipment fault prediction model, and then the major factor characteristics of the fault are distributed with larger weight, and the minor factor characteristics are distributed with smaller weight.

Description

Large equipment fault prediction method with both causality and attention in industrial Internet

Technical Field

The invention belongs to the technical field of industrial internet, relates to a large-scale equipment fault prediction method, and particularly relates to a large-scale equipment fault prediction method with the effect and attention being the same in industrial internet.

Background

The long-term stable operation of large-scale equipment in the industrial Internet has important significance for safe production. The sensor nodes deployed for all-weather monitoring requirements on large equipment can cause mass data, based on feature extraction of fault samples, real-time analysis of monitoring data facing a full life cycle can accurately control the running state of the large equipment, and can predict possible faults, so that an equipment fault emergency plan is started, the equipment is maintained in time, and the occurrence of industrial safety accidents is avoided. Therefore, the operation state monitoring and fault prediction of industrial internet large-scale equipment are urgently needed to be researched. On the other hand, the conventional digital, networked and intelligent large-scale equipment fault analysis technology is insufficient, the real-time processing and analysis requirements of massive monitoring large data cannot be met, a large data analysis framework facing to the industrial internet is indispensable to construct, and a high-level data model facing to the industrial internet and large data analysis capability are urgent.

At present, several methods of predicting the failure of the device have been proposed in succession:

(A) The signal analysis method comprises the following steps: and carrying out numerical transformation analysis according to the signal change monitored by the deployed sensor of the large-scale equipment, and carrying out equipment state detection and fault prediction based on the knowledge and experience in the professional field.

(B) Method based on linear discriminant: and (3) counting the signal characteristics monitored by the sensor in the fault state, extracting the fault characteristics by using a principal component analysis method, and inputting the extracted important characteristics to a linear discriminator for classification.

(C) The method based on the convolutional neural network comprises the following steps: and (4) counting information in a period of time, converting the extracted time domain signal into a frequency domain signal by using the convolutional layer, and then training by using the full link layer to obtain a fault result.

(D) The method based on data fusion comprises the following steps: and synthesizing the data of each sensor, and predicting the equipment fault through a fusion algorithm.

The method can realize the operation state monitoring and the fault prediction of the large-scale equipment under specific conditions. However, the traditional signal analysis method needs deeper professional knowledge reserves, the linear arbiter and the data fusion depend on the number and the feature dimensions of the sensor data, and the deep learning can directly carry out the high-dimensional feature calculation characteristic of end-to-end learning, so that the sensor data of the equipment can be directly used. However, the deep learning is a black box process, and how the characteristics selected by the learning algorithm influence the experimental result still remains to be solved.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a large-scale equipment fault prediction method with serious cause and effect and attention in the industrial internet so as to solve the technical problem that the accuracy of the prediction method in the prior art needs to be further improved.

In order to solve the technical problems, the invention adopts the following technical scheme:

a method for predicting the failure of a large-scale device with repeated causality and attention in an industrial internet is characterized by comprising the following steps:

step 1, collecting fault data of large equipment, and taking the fault data as a training sample;

step 2, preprocessing the data of the training samples of the large-scale equipment faults classified in the step 1, obtaining time domain characteristics of the samples by using a signal time domain analysis method, coding the time domain characteristics, and normalizing the numerical characteristics to obtain a sample data sequence of the preprocessed large-scale equipment sensor;

step 3, performing causal analysis on the sample data sequence of the large-scale equipment sensor obtained in the step 2, and quantifying the influence degree of each characteristic on a prediction result based on a set causal analysis objective function;

step 4, combining the time information and the result of the step 3, and obtaining hidden layer data and attention scores of the large-scale equipment based on a model of a time attention mechanism;

and 5, predicting the fault of the large equipment by using the hidden layer data and the attention score obtained in the step 4.

Compared with the prior art, the invention has the following technical effects:

the prediction method is based on causal analysis, the potential relation between the characteristics and the fault prediction accuracy is explored, a feasible method is provided for characteristic selection of a large-scale equipment fault prediction model, and then the main cause characteristics of the fault are distributed with large weight, and the secondary cause characteristics are distributed with small weight.

And (II) analyzing fine-grained change of a large-scale equipment fault sample in a time dimension based on an attention mechanism, searching for a key time point, improving the accuracy of fault prediction, timely starting an equipment fault emergency plan, timely overhauling equipment and avoiding the occurrence of an industrial safety accident.

Drawings

FIG. 1 is a diagram of an early plateau signal and a mid-fault signal.

FIG. 2 is a selected feature sequence diagram.

FIG. 3 is a diagram showing the internal structure of the transducer.

The present invention will be explained in further detail with reference to examples.

Detailed Description

In recent years, researchers have made various attempts and extensions to cause-effect analysis, and have achieved some research results in the field of neural networks. Attention mechanism is widely applied to translation tasks, medical tasks and the like, and the attention mechanism can be modeled through serialization, so that the change situation of numerical values in a time dimension can be captured. Recent studies have demonstrated the effectiveness of attention mechanisms, but there has been no much attention in the field of industrial internets, because attention mechanisms are usually a highly dimensional feature learning on discrete data, capturing the relationship between data and tasks. Therefore, the method for monitoring and predicting the fault of the large-scale equipment facing the industrial internet is explored based on causal analysis and a time attention mechanism.

The invention provides a large-scale equipment fault prediction method with repeated cause and effect and attention in an industrial internet, namely a cause and effect perception large-scale equipment operation state monitoring and fault prediction method based on an attention mechanism in the industrial internet.

The prediction method provided by the invention adopts a supervised learning mode, collects fault samples, extracts fault characteristics, constructs a causal analysis model by analyzing potential causal relationship between the characteristics and faults, and realizes the prediction of equipment faults by combining causal analysis and a time attention mechanism.

It should be noted that all algorithms in the present invention, if not specifically mentioned, all employ algorithms known in the art.

In the present invention, it is noted that:

the SVM algorithm refers to a support vector machine algorithm.

The RF algorithm refers to a random forest algorithm.

The LR algorithm refers to a logistic regression algorithm.

The LSTM algorithm refers to a long-short term memory network algorithm.

The GRU algorithm refers to a gated round-robin unit algorithm.

The DFC-CNN algorithm refers to a Deep full Convolutional Neural Network (English).

The DA-RNN algorithm refers to a two-stage Attention-cycle Neural Network algorithm (English).

The DW-AE algorithm refers to a depth Wavelet Auto-Encoder algorithm (English).

A Transformer refers to a deep self-attention network.

AUC refers to the subject operating characteristic curve.

The Softmax function refers to a normalization function.

The present invention is not limited to the following embodiments, and all equivalent changes based on the technical solutions of the present invention fall within the protection scope of the present invention.

The embodiment is as follows:

the embodiment provides a method for predicting the fault of large equipment with repeated cause and effect and attention in the industrial internet, which comprises the following steps:

step 1 comprises the following substeps:

step 1.1, classifying fault data based on collected large-scale equipment fault samples, and marking each type of fault data;

specifically, in this embodiment, the experimental data is derived from the data set of the life cycle vibration signal bearing manufactured by the science and technology of the university of western-safety transportation and the Shang Yang. The experiment platform comprises a rotating speed control motor, a rotating shaft, a supporting bearing, a hydraulic loading system, a test bearing and the like. The data set had 3 conditions, 5 bearings for each condition. A total of 15 bearing full life cycle signal samples. In the test, the sampling frequency is set to be 25.6kHz, the sampling interval is set to be 1min, and the sampling time length of each time is set to be 1.28s. The number of signals collected by each sensor per minute is 32769. Given the fewer types of conditions in the data set, the data is first preprocessed. The essence of the data enhancement, i.e. increasing the size of the data set, is to increase the size of the data set under reasonable operation to obtain the learning result, and the time required for collecting 2000 data is taken as the unit time tau, wherein there are 15 devices of the same type, and each device has two acceleration sensors to record the data change of the bearing.

Step 1.2, carrying out mass sampling on the marked fault data of each type of large equipment, recording the time of each sampling point in the sampling process, and arranging the acquired signal segments according to the labels of the large equipment where the signal segments are located to obtain training samples;

the specific process of the finishing is as follows: the number of the large-scale equipment is G, and each large-scale equipment is provided with I monitoring sensors; the fault type of the large-scale equipment is Q; defining unit time tau, and making data acquired by the large-scale equipment every time the large-scale equipment goes through tau into a data slice; the time from starting to failure of the large-scale equipment is T; the time of the kth data slice is tau multiplied by k; the data collected by the large-scale equipment in each T process comprises K data slices which are arranged in time sequence.

In the present embodiment, sampling is used. Taking each device as an example, 2000 data may be taken as one slice, and a slice of the full-period signal may be obtained. After a fault occurs, the signal characteristics are obvious, the fault prediction of equipment is meaningless at the moment, the fault is predicted as early as possible in time, when the fault occurs early or does not occur, due to the periodicity of signals, information contained in a large amount of data is very little at the moment, the signal condition in the period is considered as few as possible, when the signals are abnormal before the fault occurs, the signal change condition in the period is paid attention to as much as possible, and the method accords with the general detection process flow of equipment. Therefore, when a data set is acquired, in consideration of real industrial equipment inspection, when sequence analysis is performed each time, and the size of the sequence, as shown in fig. 1, 20 slices are randomly acquired at the early stage of signal stabilization, 40 slices are randomly acquired at the middle stage of failure and combined into one slice sequence, and K =60, and 1000 slice sequences are acquired according to each working condition by the method.

The method has the advantages of the step 1: the sequence data set with the characteristic information and the time information is obtained through the steps, and the requirement of model data volume is met. The enhanced data set is taken entirely from the original data set and no additional information is introduced.

There are 3 fault types in this example for experimental demonstration of the new method.

Step 2, preprocessing the data of the training samples of the large equipment faults sorted and classified in the step 1, obtaining time domain characteristics of the samples by using a signal time domain analysis method, coding the time domain characteristics, and normalizing the numerical characteristics to obtain a preprocessed sample data sequence of the large equipment sensor;

step 2 comprises the following substeps:

step 2.1, obtaining the time domain characteristics of the data slice by using a signal time domain analysis method for the well-regulated data slice, and making the time domain characteristics of the corresponding sample into a time domain characteristic slice;

and (3) obtaining unique time domain characteristic parameters from the data of each time domain characteristic slice by adopting a signal time domain analysis method, wherein the time domain characteristic parameters are divided into dimensional characteristic parameters and dimensionless characteristic parameters. Such as variance, root mean square, mean, etc.

Step 2.2, standardizing the extracted time domain feature slices to generate a uniform feature code, wherein the standardized formula is defined as:

in the formula:

f _ijk slicing time domain features for a kth sequence of jth features for an ith sensor;

s _ijk a signature code of a kth sequence of jth signatures of an ith sensor;

Max(f _ijk ) Is the maximum value of the jth characteristic of the ith sensor;

Min(f _ijk ) Is the minimum value of the jth characteristic of the ith sensor;

gamma is a coefficient for controlling the space size of the feature code;

j is the characteristic number of the sensor;

i is the ith sensor;

j is the jth characteristic of the sensor;

k is the kth sequence;

the purpose of this step is to normalize and convert all time domain features into signatures.

Step 2.3, converting the feature code obtained in step 2.2 into binary input:

n _k ＝[b _11k ，b _12k ，b _13k ，...，b _IJk ]formula 2;

obtaining a sample data sequence N = [ N ] of a large-scale equipment sensor ₁ ，n ₂ ，n ₃ ，...，n _K ]；

Each feature code s _ijk Can be converted into a binary input b _ijk By {0,1} ^o Denotes, o = γ × I × J, feature code s _ijk The small and medium numbers are rounded down.

In the formula:

b _IJk is a feature code s _IJk A binary input of (2);

n is a sample data sequence of a large-scale equipment sensor;

n _k binary input of the characteristic code of the kth sequence;

n _K is the binary input of the K-th sequence of feature codes.

In the embodiment, in the process of analyzing the signal value, the time domain characteristics of every 2000 data points are counted, the time domain signal itself contains huge information, and it is important to select a proper time domain analysis index for analyzing the bearing state. And 7 feature description slice time domain feature values of variance, root mean square value, average value, kurtosis, skewness, peak factor and margin factor are selected. In the system of signal digital characteristics, the values represented are different due to the different characteristics of the respective characteristics. To take into account the meaning of its features in the input model space, the sample data needs to be normalized before fault detection. The data is mapped to a specific interval, and the effect of value difference caused by properties between data characteristics is eliminated. The convergence rate of the model can be accelerated, and the accuracy of the model can be improved. The characteristic value intervals are unified by using a normalization method, and then are coded. Specifically, the sequence points of all bearings are normalized. On the existing data set, a dispersion normalization method is adopted for each feature. The spatial size coefficient assigned to each feature is 1400. All features are mapped on a sparse space of total size 9800. As shown in fig. 2, we define each sample of the input sequence as an embedded vector of size 7 × 1 × 60, where 7 is the number of rows of the sequence feature, 1 is the number of columns of the sequence feature, and 60 is the number of slices.

As shown in fig. 3, the present invention has the advantages of step 2: by analyzing the characteristic signals of numerical analysis, the characteristic dimensions which are as useful as possible for the detection result are selected, and the influence of the subsequent causal analysis on the characteristics is intuitively judged. Increasing the understanding of the impact of the features.

step 3 comprises the following substeps:

step 3.1, performing causal analysis by using the preprocessed sample data sequence of the large-scale equipment sensor obtained in the step 2;

in a sample of sensors of a large-scale device, i sensors are provided, and the sensors have j characteristics; when all the feature calculations are performed, the objective function of the quantized features to the prediction result is defined as:

in the formula:

Δε，f _ij is characterized by _ij Impact on fault prediction;

f _ij is the jth characteristic of the ith sensor;

is free of the feature f _ij Error in fault prediction;

ε _F error predicted for a fault;

step 3.2, according to the formula 3, measuring the influence of one feature on the prediction result and calculating the model error epsilon of the complete feature _F And does not contain feature f _ij Model error of (2)

Using a layer of Transformer based on an attention mechanism as a model for calculating errors; generating an embedded sequence M of a Transformer and a characteristic f-free sequence from an equation 4 based on a sample data sequence N of a large-scale equipment sensor _ij Embedded sequence of (D) M \ f _ij }，M＝[m ₁ ，m ₂ ，m ₃ ，...，m _K ]，M\{f _ij }＝[m` ₁ ，m` ₂ ，m`3，...，m` _K ]；

m _k ＝w _m n _k +b _m Formula 4;

in the formula:

TF (-) is a Transformer;

a label that is a prediction result;

m is an embedded sequence of a Transformer;

M\{f _ij no feature f for Transformer _ij The embedding sequence of (a);

m _K embedding data of a Kth sequence of the embedding sequence M;

n _k binary input of the characteristic code of the kth sequence;

m` _k for embedding the sequence M \ f _ij Embedded data of the kth sequence of };

w _m for the initialization of the weight matrix for the embedding sequence,

b _m to initialize the deviation matrix for the embedding sequence,

v is w _m And b _m Dimension (d);

sigma is n _k σ = spatial dimension of (c)γ×I×J；

Is a real number set;

step 3.3, representing the real label by e, using cross entropy loss function

To represent the error of the prediction result after the transform learning, the failure prediction error of equation 3 is expressed as:

causal contribution of features can be calculated by using the models of equations 3, 4, 5, 6, 7 and 8 and by using the models without the feature f _ij The difference in loss function between the errors of the fault predictions;

step 3.4, calculating model errors of the input characteristics by using the formulas 7 and 8 to obtain causal influence of the characteristics on the model; each input feature is assigned a weight according to causal influence, and the weight assignment is as shown in equation 9:

deriving causal impact weights

In the formula:

a weight of a jth input feature for an ith sensor;

W _F are causal impact weights.

Specifically, in this embodiment, for step 3.1, due to the limitation of the data set, the acceleration sensor signal is selected for numerical analysis, and we have selected 7 features from the time domain signal analysis, i.e. I =1, j =7, which are as related as possible to the fault signal. And respectively removing a certain characteristic in the pre-training process, entering Transformer learning, and obtaining a loss value without a certain characteristic and a loss value containing all the characteristics, wherein the difference between the two values reflects the influence degree of the characteristics on the result. And using the weight as a reference to distribute weight for each feature, and using the product of the weight and the feature value as formal input of the Transformer.

For step 3.2, step 3.3 and step 3.4, the influence of each feature in the causal analysis module on the final result is calculated by using formulas 3 to 8, a causal weight value is generated by using the result and is respectively added to each feature value for the input of the perceptron, and 7 important features are analyzed by using numerical analysis in the used large-scale equipment data set. And respectively carrying out causal calculation on the two signals, and calculating which characteristics have larger influence on fault detection.

The invention has the advantages of step 3: the characteristics are analyzed through the causal analysis module, unimportant characteristics are restrained, all the characteristics can be treated equally through an existing algorithm, in fact, different signal characteristics have different meanings at different stages of equipment, and screening of characteristics with important influences at the early stage of a fault has a large influence on a final detection result.

And 4, combining the time information and the result of the step 3, and obtaining the hidden layer data and the attention score of the large-scale equipment based on the model of the time attention mechanism.

Step 4 comprises the following substeps:

step 4.1, the causal influence weight obtained in step 3.4 is used to combine with formula 1 to recalculate the feature code, and the formula is shown in formula 10:

in the formula:

the feature code of the k sequence of the j feature of the ith sensor is obtained by recalculation;

obtaining a sample data sequence combining causal influence weights by adopting the same operation as the step 2.3 based on the formula 10

In the formula:

N ^W is a sequence of sample data incorporating causal impact weights;

a binary input of a signature for the kth sequence in combination with causal influence weights;

the characteristic representation of the combined causal influence weight is in the same type of large equipment fault state; after predicting all sample data at a fault

Are all the same;

step 4.2, processing the time information and embedding the characteristics into a uniform dimensional sequence, wherein a formula is as follows:

in the formula:

z _k embedding a sequence for the time information;

tan h is a hyperbolic tangent function;

t is the time from the start-up of the large-scale equipment to the occurrence of the fault;

p _k time difference from failure to slice acquisition, p _k ＝T-τ×k；

Tau is a unit time, and a data slice is made for the data acquired every time tau is processed;

w _z an initialized weight matrix for the time information embedding sequence,

b _z the deviation matrix is initialized for the time information embedding sequence,

v is w _z And b _z Dimension (d);

as described above, the time information is initialized to be a vector embedded with the features in a uniform dimension, and the closer the time of slicing is to the fault, the more likely the data is to be abnormal, and higher attention should be paid.

And 4.3, generating embedded data by combining the time information and the sample data after the causal influence weight:

the combined embedded sequence C = [ C ] can be obtained from equation 12 ₁ ，c ₂ ，c ₃ ，...，c _K ，c _T ]；

In the formula:

c _k embedding data for the combined kth sequence;

c _K embedding data for the combined kth sequence;

binary input of the feature code of the k-th sequence after combination;

w _c and b _c Initializing a weight matrix and a bias matrix for the combined embedding sequence, wherein

C is an embedded sequence after combination;

c _T embedded data combined with causal influence weight in the same type of large equipment fault state; predicting c after all sample data when a fault occurs _T Are all the same;

step 4.4, according to the combined embedded sequence, learning the relation between the embedded sequence containing time information and the large-scale equipment fault by using a single-layer structure Transformer:

[h ₁ ，h ₂ ，h ₃ ，...，h _K ，h _T ]＝TF([c ₁ ，c ₂ ，c ₃ ，...，c _K ，c _T ]) Formula 13;

in the formula:

TF (-) is a Transformer;

h _K is c _K Hidden layer data learned through a Transformer;

h _T is c _T Hidden layer data learned through a Transformer, namely fault state hidden layer representation;

step 4.5, calculating the local attention score of the combined embedding sequence, and generating local feature attention weight after obtaining the local attention score;

in the formula:

u _k a local attention score for the kth sequence of the combined embedded sequences;

h _k the data is the hidden layer data learned by a Transformer;

to initialize the weight matrix for local attention,

b _u to initialize the deviation matrix for local attention,

l is hidden layer data h _k Dimension (d);

p is b _u Dimension (d);

after obtaining the local attention score, a local feature attention weight is generated using the Softmax function, namely:

W _local ＝Softmax([u ₁ ，u ₂ ，u ₃ ，...，u _K ])＝[l ₁ ，l ₂ ，l ₃ ，...，l _K ]formula 15;

in the formula:

W _local attention weights for local features;

l _K a local attention score weight value for a kth sequence of the combined embedded sequences;

step 4.6, judging the influence of the sample time on the fault prediction by using an attention mechanism, and firstly, expressing the fault state hidden layer obtained in the step 4.4 as h _T Converting into a query vector in an attention mechanism;

x＝ReLU(W _x h _T +b _x ) Formula 16;

in the formula:

x is a query vector in the attention mechanism;

ReLU () is a modified linear unit activation function;

h _T hiding the layer representation for a fault condition;

W _x for initialization of query vectorsThe weight matrix is a matrix of the weights,

b _x to initialize the bias matrix for the query vector,

l is hidden layer data h _k Dimension of (d);

q is b _x Dimension of (d);

step 4.7, time difference p from fault occurrence to data slice acquisition _k As a key vector for the attention mechanism, as shown in equation 17:

to obtain E = [ E ] ₁ ，e ₂ ，e ₃ ，...，e _K ]；

In the formula:

e _k a key vector for the kth sequence;

e _K a key vector for the Kth sequence;

e is a time key vector set of the attention mechanism;

w _e is the initialized weight matrix for the time key vector,

b _e the bias matrix is initialized for the time key vector,

q is w _e And b _e Dimension (d);

step 4.8, based on the query vector x and the key vector e obtained in step 4.6 and step 4.7 _k Using the attention mechanism, a global time attention score can be obtained, as shown in equations 18 and 19:

in the formula:

r _k a global temporal attention score for the kth sequence;

x ^T is a transpose of the query vector x;

δ is the dimension of the time key vector;

applying the Softmax layer to normalize the global temporal attention score, the global temporal attention weight can be expressed as:

W _global ＝Softmax([r ₁ ，r ₂ ，r ₃ ，...，r _K ])＝[g ₁ ，g ₂ ，g ₃ ，...，g _K ]formula 19;

in the formula:

W _global is a global temporal attention weight;

r _K global temporal attention score for the kth sequence;

g _K global temporal attention weight for the kth sequence;

step 4.9, combining the local attention score of step 4.5 with the global temporal attention score of step 4.8;

first of all use h _T The embedding assigns weights to the local features and time information, which are normalized by Softmax, as shown in equation 20:

V＝Soffmax(W _v h _T +b _v )＝[a _loacl ，a _global ]formula 20;

in the formula:

h _T hiding the layer representation for a fault condition;

w _v an initialization weight matrix is assigned to the integrated information,

b _v an initial bias matrix is assigned to the consolidated information,

l is hidden layer data h _K Dimension (d);

obtaining a fused attention weight according to the local feature attention weight and the global time attention weight, as shown in formula 21;

in the formula:

an attention weight for fusion;

a _loacl to represent h by a fault state hidden layer _T Is distributed to W _local The weight of (c);

l _K a local feature attention weight value for a kth sequence of the combined embedded sequences;

a _global to represent h by a fault status hidden layer _T Is distributed to W _global The weight of (c);

g _K a global temporal attention weight for the kth sequence;

step 4.10, normalizing the fused attention weight to obtain the attention score of the embedded sequence

As shown in equation 22:

in the embodiment, steps 4.1 to 4.5 are a local information attention module, which is used for analyzing the feature information of each collected signal, and for step 4.1, an embedded sequence meeting the requirement of a Transformer is made by using the features of the previous causal analysis and preprocessing, and thenStep 4.2 and step 4.3 also integrate the time information into the embedding sequence, step 4.4 and step 4.5 learn the dependency relationship between the embedding sequence information and obtain the hidden vector, and obtain the local attention score converted from the embedding sequence to the hidden vector. Steps 4.6 to 4.8 are a global time attention module by analyzing the importance of the time information for the overall time signal. Step 4.6, a hidden variable h for analyzing the overall condition of the equipment is obtained by a local attention module _θ The attention mechanism is used to convert to a query vector. Step 4.7 emphasizes the two former modules of the time information on judging the current state of the equipment from different angles, and the two modules need to be considered in combination. Steps 4.9 to 4.10 therefore design an attention fusion mechanism to capture the relevant information of signal characterization and time characterization under different conditions, and give a composite score after fusing the local attention score and the global time attention score.

The invention has the advantages of step 4: the analysis is respectively carried out from two aspects, the characteristics are obtained by carrying out numerical analysis on the full-period signals of the data set, and the time information comes from the original data set. The most important thing in this study is health condition detection of large-scale equipment, so we introduce periodic signal time information, fuse signal characteristics and time information together, and analyze the change situation of long-period signal by using the time interval environment between the collected information sequences. Meanwhile, the contribution degree of the characteristics to the detection result can be better distributed by adopting fusion.

Step 5 comprises the following substeps:

and step 5.1, obtaining a fault prediction score of the large equipment according to the hidden layer data in the step 4.4 and the attention score in the step 4.10:

in the formula:

predicting a score for a fault of the large scale equipment;

an attention score for the embedded sequence;

h _k is c _k Hidden layer data learned through a Transformer;

step 5.2, the probability of the fault prediction of the large equipment is obtained by using a Softmax function for the fault prediction score of the large equipment obtained in the step 5.1;

in the formula:

w _d a weight matrix initialized for the failure prediction probability,

b _d a bias matrix initialized for the failure prediction probability,

l is hidden layer data h _K Dimension (d);

and judging the possibility that the large equipment is about to have a certain fault type according to the probability of fault prediction of the large equipment.

Specifically, in this embodiment, for step 5.1, the comprehensive attention score obtained in step 4.10 and the failure prediction score obtained in step 4.4 are used. For step 5.2, the fault detection score obtained in step 5.1 is detected, and the probability of the corresponding fault is finally detected through a Softmax function and the like.

In this embodiment, the trained model is used to verify the accuracy of the model based on the test sample. Specifically, let the parameter in the model be psi, and use the cross entropy loss functionNumber as predicted value

The objective is to minimize the average loss function, as shown in equation 25, from the actual value d:

in the formula:

to minimize the average loss function;

d is the actual value;

is a predicted value;

g is the total number of large-scale equipment.

The performance analysis of the method of the invention:

the method uses the data set of the whole life cycle vibration signal bearing manufactured by the science and technology of the university of Xian transportation and the Shanyang as the data set to prove the effectiveness of the method of the present invention.

The data set had 3 conditions, 5 bearings for each condition. A total of 15 bearing full life cycle signal samples. In the test, the sampling frequency is set to be 25.6kHz, the sampling interval is 1min, and the sampling time length is 1.28s each time. The number of signals collected by each sensor per minute is 32769.

When mechanical equipment fails, it may behave to different degrees in the time, frequency and time-frequency domains. Taking the bearing 11 as an example, when the outer ring of the bearing fails at the end of the test, the vibration signal in the horizontal direction can contain more degradation information because the load is applied in the horizontal direction. The data collected for the bearing 11 in the horizontal direction is shown in fig. 1.

The best results can be obtained by using data of the whole life cycle, but in an actual scene, the service life of the bearing can reach tens of thousands of hours. The value of the sensor for collecting a large amount of data information is extremely low. Often important data is distributed over the second half of the life cycle of the bearing. Therefore, during the normal operation of the bearing, a large amount of data should not be collected, and a signal sequence consisting of sensor information at several time points is selected as an indication of the normal state of the bearing.

In order to verify the effectiveness of the algorithm in the chapter, three sample data are selected, wherein the rotating speed is 2100r/min, and the bearing fault is outer ring crack loss under the working condition that the radial force is 12 kN. In order to ensure the effectiveness of model training, signals of each bearing are divided into a sequence, and characteristic values of 2000 data are obtained through numerical analysis. Sequence points of the full-period signal are obtained. In the later stage of the fault, the signal characteristics are already obvious, and the meaning of fault detection on the bearing is lost at the moment, so that the fault is selected to be detected in time in the early stage of the fault, as shown in fig. 2, when the fault is in the early stage or does not occur, because of the periodicity of the signal, information contained in a large amount of data is very little at the moment, the signal condition in the period is considered as few as possible at the moment, and when the signal is abnormal in the early stage or the middle stage of the fault, the signal change condition in the period is paid attention to as much as possible, which accords with the flow of the general detection process of the equipment. Therefore, when a data set is collected, in consideration of real industrial equipment inspection, 20 sequence points are randomly collected at an early stage of stable signal collection each time, 40 signal points are randomly collected at a middle stage of a fault to be combined into a signal characteristic sequence, and 1000 signal characteristic sequences are collected at each working condition. The corresponding data set description is shown in table 1.

Table 1 data set description

Bearing assembly	Bearing 1 u 1	Bearing 1 u 4	Bearing 1 u 5	Bearing 2 u 1	Bearing 2 u 5
						Training set	600	600	600	600	600
Verification set	200	200	200	200	200
						Test set	200	200	200	200	200

Because data has certain level of deletion, bearing data sets are respectively mixed to detect the accuracy of different types of faults. As shown in tables 2 and 3, in the bearing 1 and 1_4 data sets, 1 _u1 was the fault to be detected. In the bearing 1 _1and 1 _5datasets, 1 _u1 was the fault to be detected. In the bearing 2 _1and 2 _5datasets, 2 _1is the fault to be detected.

TABLE 2 hybrid bearing test accuracy results

TABLE 3 comparison of Experimental results for bearing 2_1 and 2_5 data sets

As shown in tables 2 and 3, the hybrid data at

bearings

1 and 1

\ u

1 and 1 \4achieved full recognition on all algorithms. This is because the data characteristic information of the two fault types is very different. But on the detection of the identification of different fault types. The algorithm achieves good results. The method is improved compared with a benchmark algorithm.

The experiments also verified the average performance of the proposed model and other baseline models across the data set. It is next necessary to analyze how causal targets affect the model results, as shown in table 4, allowing the model to learn the features with the highest correlation to the targets being tested. The dimensionless parameters are insensitive to the bearing load and speed of the bearing, do not need to consider the comparison between relative standard values and previous data, are more sensitive to the early stage of faults, but have poor serious anti-interference faults, and are easy to cause misjudgment. Although the parameters such as peak value, crest factor, kurtosis and the like are sensitive to the impact fault, when the fault enters a severe development stage, the parameters such as the peak value factor, the crest factor and the like are in a saturated state and lose the diagnosis capability. However, different types of faults may result in different trends for different factors. This also leads to causal analysis focusing on different features. Note that the mechanism may force the model to focus on signal features that contain important risk factors, while mitigating the impact of other features on the detection results. The contribution of each feature of the model to the final performance can be clearly known through causal analysis, and the method can be extended to other models.

TABLE 4 bearing 2_1 and 2_5 data set causal analysis results

Claims

1. A method for predicting the fault of a large-scale device with serious causality and attention in industrial Internet is characterized by comprising the following steps:

step 5, predicting the fault of the large equipment by using the hidden layer data and the attention score obtained in the step 4;

wherein, step 3 comprises the following substeps:

in the formula:

Δε,f _ij is characterized by _ij Impact on fault prediction;

f _ij is the jth characteristic of the ith sensor;

is free of feature f _ij Error in fault prediction;

ε _F error predicted for a fault;

step 3.2, according to the formula 3, measuring the influence of one characteristic on the prediction result and calculating the model error epsilon of the complete characteristic _F And does not contain feature f _ij Model error of

Using a layer of Transformer based on an attention mechanism as a model for calculating errors; generating an embedded sequence M of a Transformer and a characteristic f-free sequence from an equation 4 based on a sample data sequence N of a large-scale equipment sensor _ij Embedded sequence of (M \ f) _ij }，M＝[m ₁ ,m ₂ ,m ₃ ,...,m _K ]，M\{f _ij }＝[m` ₁ ,m` ₂ ,m` ₃ ,...,m` _K ]；

m _k ＝w _m n _k +b _m Formula 4;

in the formula:

TF (-) is a Transformer;

a label that is a prediction result;

m is an embedded sequence of a Transformer;

M\{f _ij featureless f for Transformer _ij The embedding sequence of (a);

m _K embedding data of a Kth sequence of the embedding sequence M;

n _k binary input of the characteristic code of the kth sequence;

w _m for the initialization of the weight matrix for the embedding sequence,

b _m to initialize the deviation matrix for the embedding sequence,

v is w _m And b _m Dimension (d);

sigma is n _k σ = γ × I × J;

gamma is a coefficient for controlling the space size of the feature code;

i is the number of the sensors;

j is the characteristic number of the sensor;

is a real number set;

step 3.3, representing the real label by e, using the cross entropy loss function

causal contribution of a feature can be calculated by the calculation of the model using equations 3, 4, 5, 6, 7 and 8, and the calculation without the feature f _ij The difference in the loss function between the errors of the fault predictions;

step 3.4, calculating model errors of the input characteristics by using the formulas 7 and 8 to obtain causal influence of the characteristics on the model; assigning a weight to each input feature based on causal influence, the weight assignment is as shown in equation 9:

deriving causal impact weights

In the formula:

a weight of a jth input feature for an ith sensor;

W _F are causal impact weights.

2. The method for predicting the fault of the large-scale equipment with the characteristics of causality and attention in the industrial internet as claimed in claim 1, wherein the step 1 comprises the following sub-steps:

and step 1.2, carrying out mass sampling on the marked fault data of each type of large equipment, recording the time of each sampling point in the sampling process, and arranging the acquired signal segments according to the labels of the large equipment where the signal segments are located to obtain training samples.

3. The method for predicting the fault of the large-scale equipment with the characteristics of causality and attention in the industrial internet as claimed in claim 1, wherein the step 2 comprises the following sub-steps:

in the formula:

s _ijk a signature code of a kth sequence of jth signatures of an ith sensor;

Max(f _ijk ) Is the maximum value of the jth characteristic of the ith sensor;

Min(f _ijk ) Is the minimum value of the jth characteristic of the ith sensor;

gamma is a coefficient for controlling the size of the feature code space;

j is the characteristic number of the sensor;

i is the ith sensor;

j is the jth characteristic of the sensor;

k is the kth sequence;

step 2.3, converting the feature code obtained in step 2.2 into binary input:

n _k ＝[b _11k ,b _12k ,b _13k ,...,b _IJk ]formula 2;

obtaining a sample data sequence N = [ N ] of a large-scale equipment sensor ₁ ,n ₂ ,n ₃ ,...,n _K ]；

In the formula:

b _IJk is a feature code s _IJk A binary input of (2);

n is a sample data sequence of a large-scale equipment sensor;

n _k binary input of the characteristic code of the kth sequence;

n _K is the binary input of the K-th sequence of feature codes.

4. The method for predicting the fault of the large-scale equipment with the characteristics of causality and attention in the industrial internet as claimed in claim 1, wherein the step 4 comprises the following sub-steps:

in the formula:

In the formula:

N ^W is a sequence of sample data incorporating causal impact weights;

second of the signature codes for the Kth sequence combined with causal weightsCarrying out binary input;

Are all the same;

step 4.2, processing the time information and embedding the characteristics into a uniform dimensional sequence, wherein the formula is as follows:

in the formula:

z _k embedding a sequence for time information;

tan h is a hyperbolic tangent function;

t is the time taken by the large-scale equipment from startup to failure;

p _k time difference from failure to slice acquisition, p _k ＝T-τ×k；

Tau is unit time, and a data slice is made by data acquired every time tau is passed;

in unit time tau, making a data slice by the data acquired by the large-scale equipment every time the large-scale equipment goes through tau;

w _z an initialized weight matrix for the time information embedding sequence,

b _z an initial bias matrix for the time information embedding sequence,

v is w _z And b _z Dimension of (d);

the combined embedded sequence C = [ C ] can be obtained from formula 12 ₁ ,c ₂ ,c ₃ ,...,c _K ,c _T ]；

In the formula:

c _k embedding data for the combined kth sequence;

c _K embedding data of the combined Kth sequence;

a binary input of the combined kth sequence of feature codes;

C is an embedded sequence after combination;

and 4.4, learning the relation between the embedded sequence containing the time information and the large-scale equipment fault each time by using a single-layer structure Transformer according to the combined embedded sequence:

[h ₁ ,h ₂ ,h ₃ ,...,h _K ,h _T ]＝TF([c ₁ ,c ₂ ,c ₃ ,...,c _K ,c _T ]) formula 13;

in the formula:

TF (-) is a Transformer;

h _K is c _K Hidden layer data learned through a Transformer;

in the formula:

h _k the data is the hidden layer data learned by a Transformer;

to initialize the weight matrix for local attention,

b _u to initialize the deviation matrix for local attention,

l is hidden layer data h _k Dimension (d);

p is b _u Dimension (d);

after obtaining the local attention score, a local feature attention weight is generated using the Softmax function, i.e.:

in the formula:

W _local attention weights for local features;

x＝ReLU(W _x h _T +b _x ) Formula 16;

in the formula:

x is a query vector in the attention mechanism;

ReLU () is a modified linear unit activation function;

h _T hiding the layer representation for a fault condition;

W _x to initialize the weight matrix for the query vector,

b _x to initialize the bias matrix for the query vector,

l is hidden layer data h _k Dimension (d);

q is b _x Dimension (d);

to obtain E = [ E ] ₁ ,e ₂ ,e ₃ ,...,e _K ]；

In the formula:

e _k a key vector for the kth sequence;

e _K a key vector for the kth sequence;

e is a time key vector set of the attention mechanism;

w _e is the initialized weight matrix for the time key vector,

b _e the bias matrix is initialized for the time key vector,

q is w _e And b _e Dimension (d);

in the formula:

r _k a global temporal attention score for the kth sequence;

x ^T is a transpose of the query vector x;

δ is the dimension of the time key vector;

in the formula:

W _global a global temporal attention weight;

r _K global temporal attention score for the kth sequence;

g _K global temporal attention weight for the kth sequence;

first of all use h _T Embedding assigns weights to the local features and time information, which are normalized by Softmax, as shown in equation 20:

V＝Softmax(W _v h _T +b _v )＝[a _loacl ，a _global ]formula 20;

in the formula:

h _T hiding the layer representation for a fault condition;

w _v an initialization weight matrix is assigned to the integrated information,

b _v an initial bias matrix is assigned to the integrated information,

l is hidden layer data h _K Dimension of (d);

in the formula:

an attention weight that is a fusion;

a _loacl to represent h by a fault status hidden layer _T Is distributed to W _local The weight of (c);

g _K global temporal attention weight for the kth sequence;

As shown in equation 22:

5. the method for predicting the failure of the large equipment with the cause of disease and the attention in the industrial internet, as claimed in claim 1, wherein the step 5 comprises the following sub-steps:

and 5.1, obtaining a fault prediction score of the large equipment according to the hidden layer data in the step 4.4 and the attention score in the step 4.10:

in the formula:

predicting a score for a fault of the large scale equipment;

to be embedded into(ii) an attention score of the in-sequence;

h _k is c _k Hidden layer data learned through a Transformer;

step 5.2, for the fault prediction score of the large-scale equipment obtained in the step 5.1, obtaining the probability of the fault prediction of the large-scale equipment by using a Softmax function;

in the formula:

w _d a weight matrix initialized for the failure prediction probability,

b _d a bias matrix initialized for the probability of failure prediction,

l is hidden layer data h _K Dimension of (d);