CN114580472A

CN114580472A - Large-scale equipment fault prediction method with repeated cause and effect and attention in industrial internet

Info

Publication number: CN114580472A
Application number: CN202210187119.9A
Authority: CN
Inventors: 尹小燕; 南鑫; 刘长友; 龚志敏; 王禹; 田苗; 崔瑾; 陈晓江; 房鼎益
Original assignee: Northwest University
Current assignee: Northwest University
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2022-06-03
Anticipated expiration: 2042-02-28
Also published as: CN114580472B

Abstract

The invention provides a large-scale equipment fault prediction method with repeated cause and effect and attention in industrial internet. The prediction method provided by the invention adopts a supervised learning mode, collects fault samples, extracts fault characteristics, constructs a causal analysis model by analyzing potential causal relationship between the characteristics and faults, and realizes the prediction of equipment faults by combining causal analysis and a time attention mechanism. The prediction method is based on causal analysis, the potential relation between the characteristics and the fault prediction accuracy is explored, a feasible method is provided for characteristic selection of a large-scale equipment fault prediction model, and then the major factor characteristics of the fault are distributed with larger weight, and the minor factor characteristics are distributed with smaller weight.

Description

Large-scale equipment fault prediction method with repeated cause and effect and attention in industrial internet

Technical Field

The invention belongs to the technical field of industrial internet, relates to a large-scale equipment fault prediction method, and particularly relates to a large-scale equipment fault prediction method with the effect and attention being the same in industrial internet.

Background

The long-term stable operation of large-scale equipment in the industrial Internet has important significance for safe production. The sensor nodes deployed for all-weather monitoring of large equipment can cause mass data, based on feature extraction of fault samples, real-time analysis oriented to the monitoring data in the whole life cycle can accurately control the running state of the large equipment, and can predict possible faults, so that an equipment fault emergency plan is started, the equipment is overhauled in time, and the occurrence of industrial safety accidents is avoided. Therefore, the operation state monitoring and fault prediction for industrial internet large-scale equipment are urgently needed to be researched. On the other hand, the traditional digital, networked and intelligent large-scale equipment fault analysis technology is insufficient, the requirement for real-time processing and analysis of mass monitoring large data cannot be met, a large data analysis framework facing to the industrial internet is indispensable to construct, and a high-level data model facing to the industrial internet and large data analysis capability are urgently needed.

At present, several methods of predicting the failure of the device have been proposed in succession:

(A) the signal analysis method comprises the following steps: and carrying out numerical transformation analysis according to the signal change monitored by the deployed sensor of the large-scale equipment, and carrying out equipment state detection and fault prediction based on the knowledge and experience in the professional field.

(B) Method based on linear discriminant: and (3) counting the signal characteristics monitored by the sensor in the fault state, extracting the fault characteristics by using a principal component analysis method, and inputting the extracted important characteristics to a linear discriminator for classification.

(C) The method based on the convolutional neural network comprises the following steps: and counting information in a period of time, converting the extracted time domain signal into a frequency domain signal by using the convolutional layer, and then training by using the full link layer to obtain a fault result.

(D) The method based on data fusion comprises the following steps: and synthesizing the data of each sensor, and predicting the equipment fault through a fusion algorithm.

The method can realize the operation state monitoring and the fault prediction of the large-scale equipment under specific conditions. However, the traditional signal analysis method needs deeper professional knowledge reserve, the linear discriminator and data fusion depends on the number and the feature dimension of the sensor data, and the deep learning can directly carry out the high-dimensional feature calculation characteristic of end-to-end learning, so that the sensor data of the equipment can be directly used. However, the deep learning is a black box process, and how the characteristics selected by the learning algorithm influence the experimental result still remains to be solved.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a large-scale equipment fault prediction method with serious cause and effect and attention in the industrial internet so as to solve the technical problem that the accuracy of the prediction method in the prior art needs to be further improved.

In order to solve the technical problems, the invention adopts the following technical scheme:

a method for predicting the fault of a large-scale device with serious causality and attention in industrial Internet is characterized by comprising the following steps:

step 1, collecting fault data of large equipment, and taking the fault data as a training sample;

step 2, preprocessing the data of the training samples of the large equipment faults sorted and classified in the step 1, obtaining time domain characteristics of the samples by using a signal time domain analysis method, coding the time domain characteristics, and normalizing the numerical characteristics to obtain a preprocessed sample data sequence of the large equipment sensor;

step 3, performing causal analysis on the sample data sequence of the large-scale equipment sensor obtained in the step 2, and quantifying the influence degree of each characteristic on a prediction result based on a set causal analysis objective function;

step 4, combining the time information and the result of the step 3, and obtaining hidden layer data and attention scores of the large-scale equipment based on a model of a time attention mechanism;

and 5, predicting the fault of the large equipment by using the hidden layer data and the attention score obtained in the step 4.

Compared with the prior art, the invention has the following technical effects:

the prediction method is based on causal analysis, the potential relation between the characteristics and the fault prediction accuracy is explored, a feasible method is provided for characteristic selection of a large-scale equipment fault prediction model, and then the major factor characteristics of the fault are distributed with large weight, and the minor factor characteristics are distributed with small weight.

And (II) analyzing fine-grained change of a large-scale equipment fault sample in a time dimension based on an attention mechanism, searching for a key time point, improving the accuracy of fault prediction, timely starting an equipment fault emergency plan, timely overhauling equipment and avoiding the occurrence of an industrial safety accident.

Drawings

FIG. 1 is a diagram of an early stationary phase of a signal and a mid-fault phase of the signal.

FIG. 2 is a selected feature sequence chart.

FIG. 3 is a diagram showing the internal structure of the transducer.

The present invention will be explained in further detail with reference to examples.

Detailed Description

In recent years, researchers have made various attempts and extensions to cause-effect analysis, and have achieved some research results in the field of neural networks. Attention mechanism is widely applied to translation tasks, medical tasks and the like, and the attention mechanism can be modeled through serialization, so that the change situation of numerical values in a time dimension can be captured. Recent studies have demonstrated the effectiveness of attention mechanisms, but there has been no much attention in the field of industrial internets, because attention mechanisms are usually a highly dimensional feature learning on discrete data, capturing the relationship between data and tasks. Therefore, the method for monitoring and predicting the fault of the large-scale equipment facing the industrial internet is explored based on the causal analysis and the time attention mechanism.

The invention provides a large-scale equipment fault prediction method with repeated cause and effect and attention in an industrial internet, namely a cause and effect perception large-scale equipment operation state monitoring and fault prediction method based on an attention mechanism in the industrial internet.

The prediction method provided by the invention adopts a supervised learning mode, collects fault samples, extracts fault characteristics, constructs a causal analysis model by analyzing potential causal relationship between the characteristics and faults, and realizes the prediction of equipment faults by combining causal analysis and a time attention mechanism.

It should be noted that all algorithms in the present invention, if not specifically mentioned, all employ algorithms known in the art.

In the present invention, it is to be noted that:

the SVM algorithm refers to a support vector machine algorithm.

The RF algorithm refers to a random forest algorithm.

The LR algorithm refers to a logistic regression algorithm.

The LSTM algorithm refers to a long short term memory network algorithm.

The GRU algorithm refers to a gated round-robin unit algorithm.

The DFC-CNN algorithm refers to a Deep full convolution Neural Network (English).

The DA-RNN algorithm refers to a two-stage Attention-cycle Neural Network algorithm (English).

The DW-AE algorithm refers to a depth Wavelet Auto-Encoder algorithm (English).

Transformer refers to a deep self-attention network.

AUC refers to the subject operating characteristic curve.

The Softmax function refers to a normalization function.

The present invention is not limited to the following embodiments, and all equivalent changes based on the technical solutions of the present invention fall within the protection scope of the present invention.

Example (b):

the embodiment provides a method for predicting the fault of large equipment with serious cause and attention in industrial internet, which comprises the following steps:

step 1 comprises the following substeps:

step 1.1, classifying fault data based on collected large-scale equipment fault samples, and marking each type of fault data;

specifically, in this embodiment, the experimental data is derived from the data set of the life cycle vibration signal bearing manufactured by the science and technology of the university of western-safety transportation and the Shang Yang. The experimental platform comprises a rotating speed control motor, a rotating shaft, a supporting bearing, a hydraulic loading system, a testing bearing and the like. The data set had 3 conditions, 5 bearings for each condition. A total of 15 bearing full life cycle signal samples. In the test, the sampling frequency is set to be 25.6kHz, the sampling interval is set to be 1min, and the sampling time length of each time is set to be 1.28 s. The number of signals collected by each sensor per minute is 32769. Given the fewer types of conditions in the data set, the data is first preprocessed. The essence of the data enhancement, i.e. increasing the size of the data set, is to increase the size of the data set under reasonable operation to obtain the learning result, and the time required for collecting 2000 data is taken as the unit time tau, wherein there are 15 devices of the same type, and each device has two acceleration sensors to record the data change of the bearing.

Step 1.2, carrying out mass sampling on the marked fault data of each type of large equipment, recording the time of each sampling point in the sampling process, and arranging the acquired signal segments according to the labels of the large equipment where the signal segments are located to obtain training samples;

the specific process of the arrangement is as follows: the number of the large-scale equipment is G, and each large-scale equipment is provided with I monitoring sensors; the fault type of the large-scale equipment is Q; defining unit time tau, and making data acquired by the large-scale equipment every time the large-scale equipment experiences tau into a data slice; the time from the start of the large-scale equipment to the occurrence of the fault is T; the time of the kth data slice is tau multiplied by k; the data collected by the large-scale equipment in each T process comprises K data slices which are arranged in time sequence.

In the present embodiment, sampling is used. Taking each device as an example, 2000 data can be taken as one slice, and a slice of the full-period signal can be obtained. After a fault occurs, the signal characteristics are obvious, the fault prediction of the equipment is meaningless at the moment, the fault is predicted in time at the early stage of the fault as far as possible, when the fault is early or does not occur, because of the periodicity of the signal, information contained in a large amount of data is extremely little at the moment, the signal condition of the period is considered as little as possible, when the signal is abnormal before the fault occurs, the signal change condition of the period is concerned as much as possible, and the method accords with the general detection process flow of the equipment. Therefore, when a data set is acquired, in consideration of real industrial equipment inspection, when sequence analysis is performed each time, and the size of the sequence, as shown in fig. 1, 20 slices are randomly acquired at the early stage of signal stabilization, 40 slices are randomly acquired at the middle stage of a fault and combined into a slice sequence, and K is 60, and 1000 slice sequences are acquired according to each working condition by the method.

The invention has the advantages of step 1: the sequence data set with the characteristic information and the time information is obtained through the steps, and the requirement of model data volume is met. The enhanced data set is taken entirely from the original data set and no additional information is introduced.

There are 3 fault types in this example for experimental demonstration of the new method.

Step 2, preprocessing the data of the training samples of the large-scale equipment faults classified in the step 1, obtaining time domain characteristics of the samples by using a signal time domain analysis method, coding the time domain characteristics, and normalizing the numerical characteristics to obtain a sample data sequence of the preprocessed large-scale equipment sensor;

step 2 comprises the following substeps:

step 2.1, obtaining the time domain characteristics of the data slice by using a signal time domain analysis method for the well-regulated data slice, and making the time domain characteristics of the corresponding sample into a time domain characteristic slice;

and obtaining unique time domain characteristic parameters from the data of each time domain characteristic slice by adopting a signal time domain analysis method, wherein the time domain characteristic parameters are divided into dimensional characteristic parameters and dimensionless characteristic parameters. Such as variance, root mean square, mean, etc.

Step 2.2, standardizing the extracted time domain feature slices to generate a unified feature code, wherein a standardized formula is defined as follows:

in the formula:

f_ijkslicing time domain features for a kth sequence of jth features for an ith sensor;

s_ijka signature code of a kth sequence of jth signatures of an ith sensor;

Max(f_ijk) Is the maximum value of the jth characteristic of the ith sensor;

Min(f_ijk) Is the minimum value of the jth characteristic of the ith sensor;

gamma is a coefficient for controlling the space size of the feature code;

j is the characteristic number of the sensor;

i is the ith sensor;

j is the jth characteristic of the sensor;

k is the kth sequence;

the purpose of this step is to normalize and convert all time domain features into signatures.

Step 2.3, converting the feature code obtained in step 2.2 into binary input:

n_k＝[b_11k,b_12k,b_13k,...,b_IJk]formula 2;

obtaining a sample data sequence N ═ N of a large-scale equipment sensor₁,n₂,n₃,...,n_K]；

Each feature code s_ijkCan be converted into a binary input b_ijkUsing {0,1}^oMeaning, o ═ γ × I × J, signature s_ijkThe small and medium numbers are rounded down.

In the formula:

b_IJkis a feature code s_ijkA binary input of (2);

n is a sample data sequence of a large-scale equipment sensor;

n_kas signature f of the kth sequence_ijkA binary input of (2);

n_Kas signature f of the K-th sequence_ijkIs input in binary.

In the embodiment, in the process of analyzing the signal value, the time domain characteristics of every 2000 data points are counted, the time domain signal itself contains huge information, and it is important to select a proper time domain analysis index for analyzing the bearing state. And 7 feature description slice time domain feature values of variance, root mean square value, average value, kurtosis, skewness, peak factor and margin factor are selected. In the system of signal digital characteristics, the values represented are different due to the different characteristics of the respective characteristics. To take into account the meaning of its features in the input model space, the sample data needs to be normalized before fault detection. The data is mapped to a specific interval, so that the influence of value difference caused by properties among data characteristics is eliminated. The convergence rate of the model can be accelerated, and the accuracy of the model can be improved. Firstly, unifying all characteristic value intervals by using a normalization method, and then coding the characteristic value intervals. Specifically, the sequence points of all bearings are normalized. On the existing data set, a dispersion normalization method is adopted for each feature. The spatial size coefficient assigned to each feature is 1400. All features are mapped on a sparse space of total size 9800. As shown in fig. 2, we define each sample of the input sequence as an embedded vector of size 7 × 1 × 60, where 7 is the number of rows of the sequence feature, 1 is the number of columns of the sequence feature, and 60 is the number of slices.

As shown in fig. 3, the present invention has the advantages of step 2: by analyzing the characteristic signals of numerical analysis, the characteristic dimensions which are as useful as possible for the detection result are selected, and the influence of the subsequent causal analysis on the characteristics is intuitively judged. Increasing the understanding of the impact of the features.

step 3 comprises the following substeps:

step 3.1, performing causal analysis by using the preprocessed sample data sequence of the large-scale equipment sensor obtained in the step 2;

in a sample of sensors of a large-scale device, i sensors are provided, and the sensors have j characteristics; when all the feature calculations are performed, the objective function of the quantized features to the prediction result is defined as:

in the formula:

Δε,f_ijis characterized by_ijImpact on fault prediction;

f_ijis the jth characteristic of the ith sensor;

is free of the feature f_ijError in fault prediction;

ε_Ferror predicted for a fault;

step 3.2, according to the formula 3, measuring the influence of one feature on the prediction result and calculating the model error epsilon of the complete feature_FAnd does not contain feature f_ijModel error of

Using a layer of Transformer based on an attention mechanism as a model for calculating errors; generating an embedded sequence M of a Transformer and a characteristic-free f from an equation 4 based on a sample data sequence N of a large-scale equipment sensor_ijEmbedded sequence of (M \ f)_ij}，

M＝[m₁,m₂,m₃,...,m_K]，M\{f_ij}＝[m`₁,m`₂,m`₃,...,m`_k]；

m_k＝w_mn_k+b_mFormula 4;

in the formula:

TF (-) is a Transformer;

a label that is a prediction result;

m is an embedded sequence of a Transformer;

M\{f_ijfeatureless f for Transformer_ijThe embedding sequence of (a);

m_kembedding data for a kth sequence of the embedding sequence M;

n_kas signature f of the k-th sequence_ijkBinary input of

m`_kFor embedding the sequence M \ f_ijEmbedded data of the kth sequence of };

w_mfor the initialization of the weight matrix for the embedding sequence,

b_mto initialize the deviation matrix for the embedding sequence,

v is w_mAnd b_mDimension (d);

o is n_kO ═ γ × I × J;

is a real number set;

step 3.3, representing the real label by e, using cross entropy function loss function

To express the error of the prediction result after the transform learning, the failure prediction error of equation 3 can be expressed as:

causal contribution of a feature can be calculated by the calculation of the model using equations 3, 4, 5, 6, 7 and 8, and the calculation without the feature f_ijThe difference in the loss function between the errors of the fault predictions;

step 3.4, calculating model errors of the input characteristics by using the formulas 7 and 8 to obtain causal influence of the characteristics on the model; each input feature is assigned a weight according to causal influence, and the weight is assigned as shown in equation 9:

deriving causal impact weights

In the formula:

a weight of a jth input feature for an ith sensor;

W_Fare causal impact weights.

Specifically, in this embodiment, for step 3.1, due to the limitation of the data set, the acceleration sensor signal is selected for numerical analysis, and we select 7 features from the time domain signal analysis, i.e., I is 1 and J is 7, which are related to the fault signal as much as possible. And respectively removing a certain characteristic in the pre-training process, entering Transformer learning, and obtaining a loss value without a certain characteristic and a loss value containing all the characteristics, wherein the difference between the two values reflects the influence degree of the characteristics on the result. And using the weight as a reference to distribute weight for each feature, and using the product of the weight and the feature value as formal input of the Transformer.

For step 3.2, step 3.3 and step 3.4, the influence of each feature in the causal analysis module on the final result is calculated by using equations 3 to 8, a causal weight value is generated by using the result and is respectively added to each feature value for the input of the perceptron, and 7 important features are analyzed by using numerical analysis in the used large-scale equipment data set. And respectively carrying out causal calculation on the two signals, and calculating which characteristics have larger influence on fault detection.

The invention has the advantages of step 3: the characteristics are analyzed through the causal analysis module, unimportant characteristics are restrained, all the characteristics can be treated equally through an existing algorithm, in fact, different signal characteristics have different meanings at different stages of equipment, and the characteristics which influence the characteristics are screened out in the early stage of a fault and have larger influence on a final detection result.

And 4, combining the time information and the result of the step 3, and obtaining the hidden layer data and the attention score of the large-scale equipment based on the model of the time attention mechanism.

Step 4 comprises the following substeps:

step 4.1, the causal influence weight obtained in step 3.4 is used to combine with formula 1 to recalculate the feature code, and the formula is shown in formula 10:

in the formula:

the feature code of the k sequence of the j feature of the ith sensor is obtained by recalculation;

obtaining the sample data sequence combining the causal influence weight by adopting the same operation as the step 2.3 based on the formula 10

In the formula:

N^Wis a sequence of sample data incorporating causal impact weights;

signature f for the K-th sequence combined with causal influence weights_ijkA binary input of (2);

the characteristic representation of the combined causal influence weight is in the same type of large equipment fault state; predicting a fault after all sample data

Are all the same;

step 4.2, processing the time information and embedding the characteristics into a uniform dimensional sequence, wherein the formula is as follows:

in the formula:

z_kembedding a sequence for time information;

tan h is a hyperbolic tangent function;

t is the time taken by the large-scale equipment from startup to failure;

p_ktime difference from failure to slice acquisition, p_k＝T-t×k；

w_zAn initialized weight matrix for the time information embedding sequence,

b_zan initial bias matrix for the time information embedding sequence,

v is w_zAnd b_zDimension (d);

as described above, the time information is initialized to be a vector embedded with the features in a uniform dimension, and the closer the time of slicing is to the fault, the more likely the data is to be abnormal, and higher attention should be paid.

And 4.3, generating the combined embedded data by the time information and the sample data combined with the causal influence weight:

the combined embedded sequence C ═ C can be obtained from formula 12₁,c₂,c₃,...,c_K,c_T]；

In the formula:

c_kembedding data for the combined kth sequence;

c_Kembedding data of the combined Kth sequence;

as signature f of the combined k-th sequence_ijkA binary input of (2);

w_cand b_cInitializing a weight matrix and a bias matrix for the combined embedding sequence, wherein

C is the combined embedded sequence;

c_Tembedded data combined with causal influence weight in the same type of large equipment fault state; predicting c after all sample data when a fault occurs_TAre all the same;

and 4.4, learning the relation between the embedded sequence containing the time information and the large-scale equipment fault each time by using a single-layer structure Transformer according to the combined embedded sequence:

[h₁,h₂,h₃,...,h_K,h_T]＝TF([c₁,c₂,c₃,...,c_K,c_T]) Formula 13;

in the formula:

TF (-) is a Transformer;

h_Kis c_KHidden layer data learned through a Transformer;

h_Tis c_THidden layer data learned through a Transformer, namely fault state hidden layer representation;

step 4.5, calculating the local score of the combined embedding sequence, and generating local feature attention weight after obtaining the attention score;

in the formula:

u_ka local fraction of the kth sequence being the combined embedded sequence;

h_kthe data is the hidden layer data learned by a Transformer;

to initialize the weight matrix for local attention,

b_uto initialize the deviation matrix for local attention,

l is hidden layer data h_kDimension (d);

p is w_uAnd b_uDimension (d);

after obtaining the local attention score, a local feature attention weight is generated using the Softmax function, i.e.:

W_local＝Softmax([u₁,u₂,u₃,...,u_K])＝[l₁,l₂,l₃,...,l_K]formula 15;

in the formula:

W_localattention weights for local features;

u_Ka local feature score for a kth sequence of the combined embedded sequences;

l_Ka local feature score weight value for a Kth sequence of the combined embedded sequences;

step 4.6, judging the influence of the sample time on the fault prediction by using an attention mechanism, and firstly, expressing the fault state hidden layer obtained in the step 4.4 as h_TConverting into a query vector in an attention mechanism;

x＝ReLU(W_xh_T+b_x) Formula 16;

in the formula:

x is a query vector in the attention mechanism;

ReLU () is a modified linear unit activation function;

h_Thiding the layer representation for a fault condition;

w_xto initialize the weight matrix for the query vector,

b_xto initialize the bias matrix for the query vector,

l is hidden layer data h_kDimension (d);

q is w_xAnd b_xDimension (d);

step 4.7, time difference p from fault occurrence to data slice acquisition_kAs a key vector for the attention mechanism, as shown in equation 17:

to obtain E ═ E₁,e₂,e₃,...,e_K]；

In the formula:

e_ka key vector for the kth sequence;

e_Ka key vector for the Kth sequence;

e is a time key vector set of the attention mechanism;

w_eis the initialized weight matrix for the time key vector,

b_ethe bias matrix is initialized for the time key vector,

q is w_eAnd b_eDimension (d);

step 4.8, based on the query vector x and the key vector e obtained in step 4.6 and step 4.7_kUsing the attention mechanism, a global time attention score can be obtained, as shown in equations 18 and 19:

in the formula:

r_kglobal temporal attention score for the kth sequence;

x^Tis a transpose of the query vector x;

δ is the dimension of the time key vector;

applying the Softmax layer to normalize the attention score, the global temporal attention weight can be expressed as:

W_global＝Softmax([r₁,r₂,r₃,...,r_K])＝[g₁,g₂,g₃,...,g_K]formula 19;

in the formula:

W_globala global temporal attention weight;

r_Kglobal temporal attention score for the kth sequence;

g_Kglobal temporal attention weight for the kth sequence;

step 4.9, combining the local feature score of step 4.5 with the global time score of step 4.8;

first of all use h_TThe embedding assigns weights to the local features and time information, which are normalized by Softmax, as shown in equation 20:

V＝Softmax(W_vh_T+b_v)＝[a_loacl,a_global]formula 20;

in the formula:

h_Thiding the layer representation for a fault condition;

w_van initialization weight matrix is assigned to the integrated information,

b_van initial bias matrix is assigned to the integrated information,

l is hidden layer data h_KDimension (d);

obtaining a fused attention weight according to the local feature attention weight and the global time attention weight, as shown in formula 21;

in the formula:

an attention weight that is a fusion;

a_loaclto indicate h by a fault_TIs distributed to W_localThe weight of (c);

l_Ka local feature weight value for a Kth sequence of the combined embedded sequences;

a_globalto indicate h by a fault_TIs distributed to W_globalThe weight of (c);

g_Kglobal temporal attention weight for the kth sequence;

step 4.10, normalizing the fused attention weight to obtain the attention score of the embedded sequence

As shown in equation 22:

specifically, in this embodiment, steps 4.1 to 4.5 are a local information attention module, which is used to analyze the feature information of each collected signal, for step 4.1, an embedded sequence meeting the requirement of a Transformer is made by using the features of the previous causal analysis and preprocessing, and the time information is also integrated into the embedded sequence in steps 4.2 and 4.3, and steps 4.4 and 4.5 learn the dependency relationship between the embedded sequence information and obtain a hidden vector, so as to obtain a local attention score converted from the embedded sequence to the hidden vector. Steps 4.6 to 4.8 are a global time attention module by analyzing the importance of the time information for the overall time signal. Step 4.6, a hidden variable h for analyzing the overall condition of the equipment is obtained by a local attention module_θThe attention mechanism is used to convert to a query vector. Step 4.7 emphasizes the two former modules of the time information on judging the current state of the equipment from different angles, and the two modules need to be considered in combination. Steps 4.9 to 4.10 therefore design an attention fusion mechanism to capture the relevant information of signal characterization and time characterization under different conditions, and give a composite score after fusing the attention scores of the local features and the attention scores of the global time information.

The invention has the advantages of step 4: the analysis is respectively carried out from two aspects, the characteristics are obtained by carrying out numerical analysis on the full-period signals of the data set, and the time information is from the original data set. The most important thing in this study is health condition detection of large-scale equipment, so we introduce periodic signal time information, fuse signal characteristics and time information together, and analyze the change situation of long-period signal by using the time interval environment between the collected information sequences. Meanwhile, the contribution degree of the characteristics to the detection result can be better distributed by adopting fusion.

Step 5 comprises the following substeps:

step 5.1, according to the hidden layer data in step 4.4 and the attention score in step 4.10, a failure prediction score of the large-scale equipment can be obtained:

in the formula:

predicting a score for a fault of the large scale equipment;

an attention score for the embedded sequence;

h_kis c_kHidden layer data learned through a Transformer;

step 5.2, for the fault prediction score of the large-scale equipment obtained in the step 5.1, obtaining the probability of the fault prediction of the large-scale equipment by using a Softmax function;

in the formula:

w_da weight matrix initialized for the failure prediction probability,

b_da bias matrix initialized for the probability of failure prediction,

l is hidden layer data h_KDimension (d);

and judging the possibility that the large equipment is about to have a certain fault type according to the probability of fault prediction of the large equipment.

Specifically, in this embodiment, for step 5.1, the comprehensive attention score obtained in step 4.10 and the failure prediction score obtained in step 4.4 are used. For step 5.2, the fault detection score obtained in step 5.1 is detected, and the probability of the corresponding fault is finally detected through a Softmax function or the like.

In this embodiment, the trained model is used to verify the accuracy of the model based on the test sample. Specifically, let the parameter in the model be psi, and use the cross entropy loss function as the predicted value

The objective is to minimize the average loss function, as shown in equation 25, from the actual value d:

in the formula:

to minimize the average loss function;

d is the actual value;

is a predicted value;

g is the total number of large-scale equipment.

The performance analysis of the method of the invention:

the method uses the data set of the whole life cycle vibration signal bearing manufactured by the science and technology of the university of Xian transportation and the Shanyang as the data set to prove the effectiveness of the method of the present invention.

The data set had 3 conditions, 5 bearings for each condition. A total of 15 bearing full life cycle signal samples. In the test, the sampling frequency is set to be 25.6kHz, the sampling interval is set to be 1min, and the sampling time length of each time is set to be 1.28 s. The number of signals collected by each sensor per minute is 32769.

When mechanical equipment fails, it may behave to different degrees in the time, frequency and time-frequency domains. Taking the bearing 1_1 as an example, when the outer ring of the bearing fails at the end of the test, the vibration signal in the horizontal direction can contain more degradation information because the load is applied in the horizontal direction. The data collected for the horizontal direction of the bearing 1_1 is shown in fig. 1.

The best results can be obtained by using data of the whole life cycle, but in an actual scene, the service life of the bearing can reach tens of thousands of hours. The value of the sensor for collecting a large amount of data information is extremely low. Often important data is distributed over the second half of the life cycle of the bearing. Therefore, during the normal operation of the bearing, a large amount of data should not be collected, and a signal sequence consisting of sensor information at several time points is selected as an indication of the normal state of the bearing.

In order to verify the effectiveness of the algorithm in the chapter, three sample data are selected, wherein the rotating speed is 2100r/min, and the bearing fault is outer ring crack loss under the working condition that the radial force is 12 kN. In order to ensure the effectiveness of model training, signals of each bearing are divided into a sequence, and characteristic values of 2000 data are obtained through numerical analysis. Sequence points of the full-period signal are obtained. In the later stage of the fault, the signal characteristics are already obvious, and the meaning of fault detection on the bearing is lost at the moment, so that the fault is selected to be detected in time in the early stage of the fault, as shown in fig. 2, when the fault is in the early stage or does not occur, because of the periodicity of the signal, information contained in a large amount of data is very little at the moment, the signal condition in the period is considered as few as possible at the moment, and when the signal is abnormal in the early stage or the middle stage of the fault, the signal change condition in the period is paid attention to as much as possible, which accords with the flow of the general detection process of the equipment. Therefore, when a data set is collected, in consideration of real industrial equipment inspection, 20 sequence points are randomly collected at an early stage of stable signal collection each time, 40 signal points are randomly collected at a middle stage of a fault to be combined into a signal characteristic sequence, and 1000 signal characteristic sequences are collected at each working condition. The corresponding data set description is shown in table 1.

Table 1 data set description

Bearing assembly	Bearing 1_1	Bearing 1_4	Bearing 1_5	Bearing 2_1	Bearing 2_5
						Training set	600	600	600	600	600
Verification set	200	200	200	200	200
						Test set	200	200	200	200	200

Because data has certain level of deletion, bearing data sets are respectively mixed to detect the accuracy of different types of faults. As shown in tables 2 and 3, in the bearing 1_1 and 1_4 data sets, 1_1 is the fault to be detected. In the bearing 1_1 and 1_5 data sets, 1_1 is the fault to be detected. In the bearing 2_1 and 2_5 data sets, 2_1 is the fault to be detected.

TABLE 2 hybrid bearing test accuracy results

TABLE 3 comparison of Experimental results for bearing 2_1 and 2_5 data sets

As shown in tables 2 and 3, the hybrid data at bearings 1_1 and 1_4 achieved full recognition at all algorithms. This is because the data characteristic information of the two types of failure are greatly different. But on the detection of the identification of different fault types. The algorithm achieves good results. The method is improved compared with a benchmark algorithm.

The experiments also verified the average performance of the proposed model and other baseline models across the data set. It is next necessary to analyze how causal targets affect the model results, as shown in table 4, allowing the model to learn the features with the highest correlation to the targets being tested. The dimensionless parameters are insensitive to the bearing load and speed of the bearing, do not need to consider the comparison between relative standard values and previous data, are more sensitive to the early stage of the fault, but have serious anti-interference fault difference, and are easy to cause misjudgment. Although the parameters such as peak value, crest factor, kurtosis and the like are sensitive to the impact fault, when the fault enters a severe development stage, the parameters such as the peak value factor, the crest factor and the like are in a saturated state and lose the diagnosis capability. However, different types of faults may result in different trends for different factors. This also leads to causal analysis focusing on different features. Note that the mechanism may force the model to focus on signal features that contain significant risk factors, while mitigating the impact of other features on the detection results. The contribution of each feature of the model to the final performance can be clearly known through causal analysis, and the method can be extended to other models.

TABLE 4 bearing 2_1 and 2_5 data set causal analysis results

Claims

1. A method for predicting the fault of a large-scale device with serious causality and attention in industrial Internet is characterized by comprising the following steps:

step 5, predicting the fault of the large equipment by using the hidden layer data and the attention score obtained in the step 4;

wherein, step 3 comprises the following substeps:

in the formula:

Δε，f_ijis characterized by_ijImpact on fault prediction;

f_ijis the jth characteristic of the ith sensor;

is free of the feature f_ijError in fault prediction;

ε_Ferror predicted for a fault;

step 3.2, according to the formula 3, measuring the influence of one characteristic on the prediction result and calculating the model error epsilon of the complete characteristic_FAnd does not contain feature f_ijModel error of

Using a layer of Transformer based on an attention mechanism as a model for calculating errors; generating an embedded sequence M of a Transformer and a characteristic f-free sequence from an equation 4 based on a sample data sequence N of a large-scale equipment sensor_ijEmbedded sequence of (M \ f)_ij}，M＝[m₁，m₂，m₃，...，m_K]，M\{f_ij}＝[m`₁，m`₂，m`₃，...，m`_k]；

m_k＝w_mn_k+b_mFormula 4;

in the formula:

TF (-) is a Transformer;

a label that is a prediction result;

m is an embedded sequence of a Transformer;

M\{f_ijfeatureless f for Transformer_ijThe embedding sequence of (a);

m_kembedding data for a kth sequence of the embedding sequence M;

n_kas signature f of the k-th sequence_ijkBinary input of

m`_kFor embedding the sequence M \ f_ijEmbedded data of the kth sequence of };

w_mfor the initialization of the weight matrix for the embedding sequence,

b_mto initialize the deviation matrix for the embedding sequence,

v is w_mAnd b_mDimension (d);

o is n_kO ═ γ × I × J;

is a real number set;

deriving causal impact weights

In the formula:

a weight of a jth input feature for an ith sensor;

W_Fare causal impact weights.

2. The method for predicting the fault of the large-scale equipment with the characteristics of causality and attention in the industrial internet as claimed in claim 1, wherein the step 1 comprises the following sub-steps:

and step 1.2, carrying out mass sampling on the marked fault data of each type of large equipment, recording the time of each sampling point in the sampling process, and arranging the acquired signal segments according to the labels of the large equipment where the signal segments are located to obtain training samples.

3. The method for predicting the fault of the large-scale equipment with the characteristics of causality and attention in the industrial internet as claimed in claim 1, wherein the step 2 comprises the following sub-steps:

step 2.2, standardizing the extracted time domain feature slices to generate a uniform feature code, wherein the standardized formula is defined as:

in the formula:

s_ijka signature code of a kth sequence of jth signatures of an ith sensor;

Max(f_ijk) Is the maximum value of the jth characteristic of the ith sensor;

Min(f_ijk) Is the minimum value of the jth characteristic of the ith sensor;

gamma is a coefficient for controlling the space size of the feature code;

j is the characteristic number of the sensor;

i is the ith sensor;

j is the jth characteristic of the sensor;

k is the kth sequence;

step 2.3, converting the feature code obtained in step 2.2 into binary input:

n_k＝[b_11k，b_12k，b_13k，...，b_IJk]formula 2;

obtaining a sample data sequence N ═ N of a large-scale equipment sensor₁，n₂，n₃，...，n_K]；

In the formula:

b_IJkis a feature code s_ijkA binary input of (2);

n is a sample data sequence of a large-scale equipment sensor;

n_kas signature f of the kth sequence_ijkA binary input of (2);

n_Kas signature f of the K-th sequence_ijkIs input in binary.

4. The method for predicting the fault of the large-scale equipment with the characteristics of causality and attention in the industrial internet as claimed in claim 1, wherein the step 4 comprises the following sub-steps:

in the formula:

based on the formula 10 andstep 2.3 same operation, obtaining sample data sequence combined with causal influence weight

In the formula:

N^Wis a sequence of sample data incorporating causal impact weights;

the characteristic representation of the combined causal influence weight is in the same type of large equipment fault state; after predicting all sample data at a fault

Are all the same;

in the formula:

z_kembedding a sequence for time information;

tan h is a hyperbolic tangent function;

t is the time taken by the large-scale equipment from startup to failure;

p_ktime difference from failure to slice acquisition, p_k＝T-t×k；

w_zAn initialized weight matrix for the time information embedding sequence,

b_zthe deviation matrix is initialized for the time information embedding sequence,

v is w_zAnd b_zDimension (d);

and 4.3, generating combined embedded data by the time information and the sample data combined with the causal influence weight:

the combined embedded sequence C ═ C can be obtained from formula 12₁，c₂，c₃，...，c_K，c_T]；

In the formula:

c_kembedding data for the combined kth sequence;

c_Kembedding data of the combined Kth sequence;

as signature f of the combined k-th sequence_ijkA binary input of (2);

C is the combined embedded sequence;

c_Tembedding data which are combined with the causal influence weight when the large-scale equipment of the same type is in a fault state; all samples at the time of predicting a failureC after this data_TAre all the same;

[h₁，h₂，h₃，...，h_K，h_T]＝TF([c₁，c₂，c₃，...，c_K，c_T]) Formula 13;

in the formula:

TF (-) is a Transformer;

h_Kis c_KHidden layer data learned through a Transformer;

in the formula:

u_ka local fraction of the kth sequence being the combined embedded sequence;

h_kthe data is the hidden layer data learned by a Transformer;

to initialize the weight matrix for local attention,

b_uto initialize the deviation matrix for local attention,

l is hidden layer data h_kDimension (d);

p is w_uAnd b_uDimension (d);

after obtaining the local attention score, a local feature attention weight is generated using the Softmax function, namely:

W_local＝Softmax([u₁，u₂，u₃，...，u_K])＝[l₁，l₂，l₃，...，l_K]formula 15;

in the formula:

W_localattention weights for local features;

u_Ka local feature score for a kth sequence of the combined embedded sequences;

x＝ReLU(W_xh_T+b_x) Formula 16;

in the formula:

x is a query vector in the attention mechanism;

ReLU () is a modified linear unit activation function;

h_Thiding the layer representation for a fault condition;

w_xto initialize the weight matrix for the query vector,

b_xto initialize the bias matrix for the query vector,

l is hidden layer data h_kDimension of (d);

q is w_xAnd b_xDimension of (d);

step 4.7, time difference p from fault occurrence to data slice acquisition_kAs a key vector of the attention mechanism, as shown in equation 17:

to obtain E ═ E₁，e₂，e₃，...，e_K]；

In the formula:

e_ka key vector for the kth sequence;

e_Ka key vector for the Kth sequence;

e is a time key vector set of the attention mechanism;

w_eis the initialized weight matrix for the time key vector,

b_ethe bias matrix is initialized for the time key vector,

q is w_eAnd b_eDimension (d);

in the formula:

r_kglobal temporal attention score for the kth sequence;

is a transpose of the query vector x;

δ is the dimension of the time key vector;

W_global＝Softmax([r₁，r₂，r₃，...，r_K])＝[g₁，g₂，g₃，...，g_K]formula 19;

in the formula:

W_globalis a global temporal attention weight;

r_Kglobal temporal attention score for the kth sequence;

g_Kglobal temporal attention weight for the kth sequence;

y＝Soffmax(W_vh_T+b_v)＝[a_loacl，a_global]formula 20;

in the formula:

h_Thiding the layer representation for a fault condition;

w_van initialization weight matrix is assigned to the integrated information,

b_van initial bias matrix is assigned to the integrated information,

l is hidden layer data h_KDimension (d);

in the formula:

an attention weight for fusion;

a_loaclto indicate h by a fault_TIs distributed to W_localThe weight of (c);

a_globalto indicate h by a fault_TIs distributed to W_globalThe weight of (c);

g_Kglobal temporal attention weight for the kth sequence;

As shown in equation 22:

5. the method for predicting the failure of the large equipment with the cause of disease and the attention in the industrial internet, as claimed in claim 1, wherein the step 5 comprises the following sub-steps:

in the formula:

predicting a score for a fault of the large scale equipment;

an attention score for the embedded sequence;

h_kis c_kHidden layer data learned through a Transformer;

in the formula:

w_da weight matrix initialized for the failure prediction probability,

b_da bias matrix initialized for the failure prediction probability,

l is hidden layer data h_KDimension (d);