CN113626597B

CN113626597B - Intelligent manufacturing equipment fault prediction method based on gated three towers

Info

Publication number: CN113626597B
Application number: CN202110830568.6A
Authority: CN
Inventors: 张新; 陈嘉; 陈涛; 王东京; 石云海
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2022-04-01
Anticipated expiration: 2041-07-22
Also published as: CN113626597A

Abstract

The invention discloses a method for predicting the fault of intelligent manufacturing equipment based on a gated three-tower, which comprises the following steps: an S1 channel tower encoder; s2 a sliding window tower encoder with a multi-scale aggregation module; s3 text tower encoder with cross tower attention module; s4 gating the module. Firstly, inputting a channel embedding matrix into a channel tower encoder to obtain channel characteristics; inputting the time sequence embedded matrix into a sliding window tower encoder with a multi-scale aggregation module to obtain an aggregation time sequence characteristic; further, inputting the text representation into a text tower encoder to obtain text features, and inputting the text features and the time sequence features into a tower crossing attention module to obtain weighted text features; finally, fusing the channel characteristics, the aggregation time sequence characteristics and the weighted text characteristics through a gate control module to predict the fault types; by calculating the cross entropy loss optimization parameters, the model can dynamically and adaptively fuse three characteristics of a plurality of intelligent manufacturing devices, so that the accuracy of fault prediction is improved.

Description

Intelligent manufacturing equipment fault prediction method based on gated three towers

Technical Field

The invention relates to the field of predictive maintenance of intelligent manufacturing, and provides a method for predicting a fault of intelligent manufacturing equipment based on a gated three-tower Transformer, aiming at the problem of predicting the fault category by using numerical data acquired by a sensor during the operation of the intelligent manufacturing equipment and text data of an operation log, and combining channel characteristics, time characteristics and text characteristics of log data of the numerical data.

Background

In recent years, the country has continuously come out encouragement policies to support smart manufacturing, which has become an important development trend in the manufacturing industry. The rise of intelligent manufacturing industries in various places has brought forward a batch of intelligent manufacturing enterprises and industrial parks, and corresponding system equipment is continuously scaled and complicated, so that the operation and maintenance requirements of intelligent manufacturing equipment are higher. The intelligent manufacturing equipment may malfunction during the production process, and if a processing method or a maintenance strategy is not established in advance, the quality and the production efficiency of the product may be affected and even huge economic losses are caused. For this purpose, a fault prediction technology in predictive maintenance is introduced, a future possible fault mode of the intelligent manufacturing equipment is predicted according to the operation state data of the intelligent manufacturing equipment, and a predictive maintenance plan is made in advance. The fault prediction technology takes numerical data collected by a sensor during the operation of intelligent manufacturing equipment, text data of an operation log and the like as input, and outputs predicted fault types by extracting and analyzing data characteristics. In recent years, the development of data feature extraction and analysis methods is greatly promoted by the rapid development of deep learning, and the method is expected to be applied to a fault prediction technology.

At present, scholars at home and abroad make a lot of valuable research achievements in the field of fault prediction. The fault prediction technology (such as grey theory, independent element analysis method and the like) based on statistical analysis analyzes and predicts the future operating state of the intelligent manufacturing equipment by counting historical operating data, but because the constraint of dimensional linearity is adopted, the fault prediction technology is difficult to adapt to a complex nonlinear system in a real situation. Fault prediction technologies based on signal processing (such as wavelet transform method, spectrum analysis method, etc.) are difficult to track the operation data sequence of intelligent manufacturing equipment for a long time, and easily cause the reduction of prediction performance. The fault prediction technology (such as a convolutional neural network, a cyclic neural network and the like) based on deep learning can effectively extract important characteristic information from historical operating data to perform fault prediction, and is suitable for uncertain and complex intelligent manufacturing equipment systems. Recently, a transform model derived from the natural language processing field is popular in the deep learning field, and a multi-point attention mechanism thereof can be used for extracting the key feature information of diversity in the running data.

The existing failure prediction method still has many defects: firstly, many methods only utilize the time sequence characteristics of numerical data acquired by a sensor, but do not fully utilize channel characteristics, and a method for extracting the channel characteristics based on convolution needs to design a receptive field in a complicated way and cannot establish a global channel incidence relation; secondly, the time sequence scale is often fixed when the time sequence features are extracted, local time sequence information is not utilized, and the calculation cost is easily overlarge by a method of stacking convolution layers; in addition, when the existing method processes log text data, feature extraction is often needed manually and failure types are further analyzed and predicted, and an end-to-end training method capable of effectively fusing numerical data and text data features is lacked.

Disclosure of Invention

The invention provides an intelligent manufacturing equipment fault prediction method based on a gated three-tower Transformer, aiming at the defects of extraction and fusion of numerical data acquired by a sensor and text data characteristics of an operation log in the existing fault prediction technology. Firstly, a sliding window mask attention mechanism is designed to extract multi-scale time sequence characteristics and a multi-scale polymerization module is used for time sequence characteristic polymerization, so that not only can the mask attention be applied in a plurality of sliding windows, the calculation expense of a model is reduced, but also the extraction and expression capacity of the model to local time sequence characteristic information can be increased; and then, after text features are extracted, learning the text-time sequence attention weight by adopting a cross-tower attention mechanism, and effectively realizing end-to-end intelligent manufacturing equipment fault prediction.

The invention adopts a Transformer architecture consisting of a plurality of encoders. Firstly, inputting a channel embedding matrix into a channel tower encoder to obtain channel characteristics of numerical data; then, inputting the time sequence embedded matrix into a sliding window tower encoder with a multi-scale aggregation module to obtain an aggregation time sequence characteristic, wherein the characteristic comprises multi-scale global and local time sequence information; further, the text representation is input to a text tower encoder to obtain text characteristics, and the text characteristics and the time sequence characteristics are input to a tower crossing attention module to obtain weighted text characteristics, so that the model can tend to predict by using text characteristics related to fault information; finally, the gate control module is adopted to fuse the channel characteristics, the aggregation time sequence characteristics and the weighted text characteristics to predict the fault types, so that the model can dynamically and adaptively fuse three characteristics of a plurality of intelligent manufacturing equipment, and the fault prediction accuracy of the intelligent manufacturing equipment is improved.

The method firstly obtains a related numerical data set collected by a sensor of a plurality of intelligent manufacturing equipment of the same type in a certain number of days

Running a journal text dataset

And equipment status data set

Wherein

K-type numerical data (such as filling temperature, pressure, flow and the like) of the intelligent manufacturing equipment on the D day for the s th intelligent manufacturing equipment, wherein T is the total number of days of data, D is the failure prediction date, and N is the total number of the intelligent manufacturing equipment;

providing a plurality of log text data for the s intelligent manufacturing equipment on the d day;

the state of the s intelligent manufacturing equipment on the day D is truly marked, if the state type of the s intelligent manufacturing equipment belongs to B, the state type is 1, otherwise, the state type is 0, and B is the total number of states (for example, the state is normal, a certain component of the equipment is in failure, and the like). In combination with

All numerical data representing the s-th smart manufacturing equipment,

all text data representing day d of the s smart manufacturing facility.

The specific implementation of the invention comprises the following steps:

s1, data acquisition: acquiring related numerical data sets collected by sensors of multiple intelligent manufacturing equipment of the same type in a certain number of days

Running a journal text dataset

And equipment status data set

Wherein

The K-type numerical data of the s-th intelligent manufacturing equipment on the D-th day are obtained, T is the total number of days of data, D is the failure prediction date, and N is the total number of the intelligent manufacturing equipment;

the true mark of the state B of the s intelligent manufacturing equipment on the D day is represented, if the state type of the s intelligent manufacturing equipment belongs to B, the state type is 1, otherwise, the state type is 0, and B is the total number of states; in combination with

All numerical data representing the s-th smart manufacturing equipment,

all text data representing day d of the s smart manufacturing facility;

s2, transforming the numerical data and the text data to obtain a channel embedding matrix, a time sequence embedding matrix and a sentence embedding vector, and specifically comprising the following substeps:

s21, transposed numerical data of the intelligent manufacturing equipment of the s-th platform

Inputting to the linear layer to obtain a channel embedding matrix

S22, the original numerical data of the intelligent manufacturing equipment of the s-th station

Inputting the time sequence embedded matrix into the linear layer and obtaining the time sequence embedded matrix through position coding

S23, text data of the intelligent manufacturing equipment of the s-th station

Inputting the sentence embedding vector into a BERT model to obtain a sentence embedding vector of each text datum;

s24, performing pooling operation on sentence embedding vectors of each day by respectively adopting minimum value, average value and maximum value to obtain text representation

S3, embedding the channel into the matrix

Inputting the data to a channel tower encoder to obtain the channel characteristics of numerical data, wherein the channel tower encoder is composed of an encoder L^cIndividual channel coding layer

The method specifically comprises the following substeps:

s31. pair

Normalization processing is carried out on channel characteristics extracted from layer channel coding layer

S32, multi-head attention layer feature extraction is carried out on the normalized features obtained in the step S31, a residual error structure is adopted in the layer, and the calculation formula is as follows:

the specific operation of the multi-head attention layer comprises the following sub-steps:

SU1 multiplication of channel characteristics and parameter matrix to obtain the a-th self-attention SA_a(. input) embedding matrix mapped query matrix q_aKey matrix k_aSum matrix v_a；

SU2. calculating by normalized index softmax function to obtain self-attention weight matrix

Dimension size of each vector in the mapping matrix;

SU3. self-attention weight matrix s_aSum matrix v_aMultiplication to obtain

And SU4, splicing the features obtained by the attention layers, and multiplying the parameter matrix to obtain the output features of the MSA layer

S33, normalizing the characteristics obtained in the step S32

S34, inputting the normalized features into a multi-layer perceptron to perform feature extraction, wherein the result is the first

Layer channel coding characteristics, calculation formula thereofComprises the following steps:

s35, the first

Layer channel coding features

Is input to the first

Layer channel coding layer, repeating steps S31-S34;

s4. sliding window tower encoder is to chronogenesis embedding matrix

Extracting multilayer and multi-scale features to obtain multi-scale time sequence features

The method specifically comprises the following substeps:

s41, pair

Normalizing the time sequence characteristics extracted layer by layer of layer sliding window codes

S42, extracting the characteristics of the normalized characteristics by adopting a sliding window mask attention layer, wherein the characteristics are obtained by extracting the characteristics of the normalized characteristics by adopting a sliding window mask attention layer

When the number is odd, the following operations are performed:

SN1. the first

A sliding window mask attention layer

The time sequence characteristics are non-overlapped and equally divided by the sliding window with the time unit size

SN2. using no more than

The sliding window with the time unit size completely comprises the edge timing sequence characteristics;

SN3, performing mask self-attention calculation in the sliding window of the layer, and calculating the self-attention weight matrix s in the multi-head attention calculation step_aSetting the upper triangular element to be 0;

SN4, performing mask self-attention calculation by using a fixed sliding window, and calculating the output time sequence characteristics of the layer by adopting a residual error structure

When in use

When the number is even, the following operations are performed:

SN5. the first

The first window mask attention layer is first

The sliding window mask is focused on all sliding windows of the power layer

Moving according to the size of a time unit;

SN6. adopt a composition of not more than

SN7. inPerforming mask self-attention calculation in the sliding window of the layer, and calculating the self-attention weight matrix s in the multi-head attention calculation step_aSetting the upper triangular element to be 0;

SN8, performing mask self-attention calculation by using a fixed sliding window, and calculating the output time sequence characteristics of the layer by adopting a residual error structure

S43, normalizing the time sequence characteristics extracted in the step S42

S44, extracting the characteristics of the normalized time sequence characteristics by adopting a multilayer perceptron,

s45, multi-scale time sequence characteristics output by sliding window coding layer

L^tPerforming multi-scale polymerization to obtain polymerization timing characteristics

And outputting, wherein the characteristics comprise multi-scale global and local time sequence information, and the calculation formula is as follows:

s5, inputting the text representation into a text tower encoder to obtain text characteristics, and inputting the text characteristics and the time sequence characteristics into a tower crossing attention module to obtain weighted text characteristics, and the method specifically comprises the following substeps:

s51. pair

Normalization of text features of layer outputs

S52, extracting characteristics of the normalized text characteristics through a multi-head attention layer, wherein the calculation formula is as follows:

s53, normalizing the text features obtained by calculation in the step S52

S54, carrying out multi-layer perceptron feature extraction on the normalized text features, wherein the calculation formula is as follows:

s6, calculating and outputting a prediction fault category probability vector by adopting a gate control module to fuse channel characteristics, aggregation time sequence characteristics and weighted text characteristics, and specifically comprising the following substeps:

s61, global time sequence characteristics are combined

Inputting the global time sequence characteristics to the full connection layer to obtain linear mapping

S62, global time sequence characteristics are combined

Transposing to obtain aligned features

S63, obtaining the text-time sequence attention weight by using a matrix multiplication operation and a Softmax (-) function

The calculation formula is as follows:

wherein FC (-) is a fully connected layer;

s64, outputting weighted text features calculated by adopting text-time sequence attention weight:

s65, obtaining a prediction result according to the weighted fusion of the three features, and specifically comprising the following substeps:

SW1. characterization of channels

Aggregation timing characterization

And weighted text features

Inputting the data to a gate control module;

and SW2, the gating layer G performs weighted fusion on the three characteristics through self-adaptive weight to obtain gating characteristics

SW3, inputting gating characteristics to a full connection layer FC to obtain the probability vector y of the predicted fault category of the s-th intelligent manufacturing equipment_sThe calculation formula is as follows:

and S7, calculating cross entropy loss according to the probability vector of the predicted fault category, wherein the step is only used in the training process and is used for guiding the model to accurately predict the fault category of the intelligent manufacturing equipment.

Preferably, step S42 implements efficient multi-scale time series feature extraction by performing mask attention calculation in multiple non-overlapping adjacent time series sliding windows, and establishes a time series information exchange mechanism in multiple sliding windows through the sliding windows.

A sliding window mask attention layer realizes efficient multi-scale time sequence feature extraction by performing mask attention calculation in a plurality of non-overlapping adjacent time sequence sliding windows and establishes a plurality of time sequence information exchange mechanisms in the sliding windows; meanwhile, the invention designs a multi-scale aggregation module as a functional module for aggregating multi-scale time sequence characteristics. The calculation formula of the multi-scale time sequence feature extraction and the multi-scale aggregation is as follows:

preferably, step S5 is performed by the cross-tower attention module by computing a global timing feature

And text features

The attention weights of (a) enable end-to-end learning of the model for text-to-time correlations.

Preferably, in step S6, a gate control module is used to fuse the channel feature, the aggregation timing feature and the weighted text feature, and a multi-feature fusion vector method is used to perform fault prediction, so as to improve the accuracy of the model for fault prediction and the robustness of the model.

The gated three-tower Transformer architecture is composed of a channel tower encoder, a sliding window tower encoder with a multi-scale aggregation module, a text tower encoder with a tower-crossing attention module and a gated layer. The channel tower encoder, the sliding window tower encoder and the text tower encoder can effectively extract channel characteristics, aggregation time sequence characteristics and weighted text characteristics of text data of numerical data respectively, and the gating layer performs weighted fusion on the three characteristics by using dynamic weights, so that the model can perform characteristic self-adaptation on data of multiple intelligent manufacturing equipment, and accuracy of predicting fault types of the intelligent manufacturing equipment is improved.

Drawings

FIG. 1 is a diagram of a gated three tower Transformer architecture;

FIG. 2 is a schematic diagram of channel characteristics and timing characteristics;

FIG. 3 is a diagram of a daily textual representation extraction architecture;

FIG. 4 is a diagram of a structure of a sliding window mask attention layer and a multi-scale aggregation module;

FIG. 5 is a cross-tower attention module block diagram.

Detailed Description

Example 1

The invention provides an intelligent manufacturing equipment fault prediction technology based on a gated three-tower Transformer. As shown in fig. 1, the overall architecture consists of a channel tower encoder, a sliding window tower encoder with a multi-scale aggregation module, a text tower encoder with a cross-tower attention module, and a gating module. Firstly, inputting a channel embedding matrix into a channel tower encoder to obtain channel characteristics of numerical data; then, inputting the time sequence embedded matrix into a sliding window tower encoder with a multi-scale aggregation module to obtain an aggregation time sequence characteristic, wherein the characteristic comprises multi-scale global and local time sequence information; further, the text representation is input to a text tower encoder to obtain text characteristics, and the text characteristics and the time sequence characteristics are input to a tower crossing attention module to obtain weighted text characteristics, so that the model can tend to predict by using text characteristics related to fault information; then, a gating module is adopted to fuse the channel characteristics, the aggregation time sequence characteristics and the weighted text characteristics to calculate and output a predicted fault category probability vector; and finally, calculating cross entropy loss according to the probability vector of the predicted fault category, wherein the step is only used in the training process and is used for guiding the model to accurately predict the fault category of the intelligent manufacturing equipment.

Running a journal text dataset

And equipment status data set

Wherein

All numerical data representing the s-th smart manufacturing equipment,

all text data representing day d of the s smart manufacturing facility.

The implementation steps are described in detail below with reference to the accompanying drawings.

Step (1) As shown in FIG. 1, the s-th station is intelligently controlledTransposed numerical data of a manufacturing installation

Inputting to the linear layer to obtain a channel embedding matrix

Raw numerical data of the s-th intelligent manufacturing equipment

As shown in fig. 2, the channel embedding matrix and the timing embedding matrix are embedded representations of numerical data within a channel and a timing, respectively. As shown in fig. 3, the log text data of the s-th smart manufacturing equipment is recorded

Inputting into BERT (bidirectional Encoder retrieval from transformations) to obtain sentence embedding vector of each text, and performing minimum value, average value and maximum value pooling operation on the sentence embedding vectors of each day to obtain text representation

Step (2) embedding the channels into the matrix as shown in FIG. 1

Input to a channel tower encoder consisting of L^cIndividual channel coding layer

Forming, extracting the multi-layer characteristics of the channel embedding matrix to obtain the channel characteristics

The channel coding layer is composed of two sub-layers with residual structure, the first sub-layer is composed of layer normalization operation and multi-head attention layer, the second sub-layer is composed of layer normalization operation and multi-layer perceptron, the first sub-layer is composed of layer normalization operation and multi-head perceptron

Individual channel coding layer

The calculation formula of the channel feature extraction in (1) is as follows:

wherein

LN (-) is a layer normalization operation, MLP (-) is a multilayer perceptron, MSA (-) is a multi-head attention, and the calculation steps are:

wherein

Is a parameter matrix, q_ak_av_aFor the a-th self attention SA_a(. input) embedding matrix mapped query, key and value matrices, s_aIn order to self-attention the weight matrix,

for the dimension size of each vector in the mapping matrix, Softmax (-) is a normalized exponential function, A is the total number of self-attentions, [, … ·]For a splicing operation.

Channel tower encoder output

Is the L th^cIndividual channel coding layer

Channel characteristics of the output:

step (3) embedding the timing sequence into the matrix as shown in FIG. 1

Input to a polymerization module M with multi-scale^tThe sliding window tower encoder is composed of

A sliding window coding layer

The method comprises the steps of extracting multilayer and multi-scale features of a time sequence embedded matrix to obtain multi-scale time sequence features

Performing multi-scale polymerization on the multi-scale time sequence characteristics by using a multi-scale polymerization module to obtain polymerization time sequence characteristics

The invention provides a sliding window mask attention layer, which realizes high-efficiency multi-scale time sequence characteristic extraction by performing mask attention calculation in a plurality of non-overlapping adjacent time sequence sliding windows and establishes a time sequence information exchange mechanism in the plurality of sliding windows through the sliding windows; meanwhile, the invention designs a multi-scale aggregation module as a functional module for aggregating multi-scale time sequence characteristics. The calculation formula of the multi-scale time sequence feature extraction and the multi-scale aggregation is as follows:

and (3.1) the sliding window coding layer is composed of two sub-layers with residual error structures, the first sub-layer is composed of a layer normalization operation and a sliding window mask attention layer, and the second sub-layer is composed of a layer normalization operation and a multi-layer perceptron. As shown in fig. 4, when

When it is odd, the first

The window mask attention layer is first

Then using no more than

A time unit sized sliding window fully encompasses the edge timing feature. When in use

When it is even, the first

The first window mask attention layer is first

The sliding window mask is focused on all sliding windows of the power layer

Moving by a time unit size not greater than

A time unit sized sliding window fully encompasses the edge timing feature. Each window mask attention layer is on the layerPerforming mask self-attention calculation in the sliding window, and calculating the self-attention weight matrix s in the multi-head attention calculation step_aSetting the upper triangle element to 0 is the mask self-attention calculation step. When in use

Is odd, the first

Is first and second

Individual channel coding layer

The calculation formula of the channel feature extraction in (1) is as follows:

wherein the content of the first and second substances,

RW-MSA (-) performs mask self-attention computation for odd sliding window mask attention layer using fixed sliding window, SW-MSA (-) performs mask self-attention computation for even sliding window mask attention layer using moved sliding window.

Step (3.2) multiscale aggregation Module M^tFormed by matrix splicing operations, for even numbers of sums L^tMulti-scale timing feature of multiple sliding window coding layer output

And outputting, wherein the calculation formula is as follows:

step (4) representing the text

Input to attention module with tower crossing

The text tower encoder is composed of

A text coding layer

The composition is that the text representation is subjected to multi-layer feature extraction to obtain text features

Step (4.1) the text coding layer is composed of two sub-layers with residual error structures, the first sub-layer is composed of layer normalization operation and a multi-head attention layer, the second sub-layer is composed of layer normalization operation and a multi-layer perceptron, the first sub-layer is composed of a layer normalization operation and a multi-head attention layer

A text coding layer

The calculation formula of the text feature extraction in (1) is as follows:

wherein

Step (4.2) As shown in FIG. 5, the Cross-Tower attention Module

Utilizing global timing features

And text features

Calculating text-time sequence attention weight and weighting text features to obtain weighted text features

First global timing characteristics

Transposing the same to obtain

To align the timing. Text-to-time attention weighting using matrix multiplication operations and Softmax (-) function

The calculation formula is as follows:

where FC (-) is the full connectivity layer.

Output of a cross-tower attention module

For weighted text features computed using text-to-time attention weights:

a cross-tower attention module by computing globalTiming characteristics

And text features

The attention weights of (a) enable end-to-end learning of the model for text-to-time correlations. The text feature extraction and text feature weighting calculation formula is as follows:

step (5) characterizing the channel

Aggregation timing characterization

And weighted text features

Input to the gating module. The gating layer G performs weighted fusion on the three characteristics through self-adaptive weight to obtain gating characteristics

The calculation formula is as follows:

to gate features

Inputting the data into a full connection layer to obtain the predicted fault category probability vector of the intelligent manufacturing equipment of the s th station

The calculation formula is as follows:

and (6) calculating loss, and optimizing the gated three-tower Transformer. In order to improve the consistency between the fault category and the real equipment state, the invention adopts cross entropy loss, and the calculation formula is

Wherein y is_s,bEquipping the s intelligent manufacturing with a real mark belonging to the b state,

and if the state type of the s-th intelligent manufacturing equipment belongs to B, the state type is 1, otherwise, the state type is 0, B is the total number of states, and N is the total number of the intelligent manufacturing equipment.

Claims

1. A method for predicting the fault of intelligent manufacturing equipment based on a gated three-tower is characterized by comprising the following steps:

Running a journal text dataset

And equipment status data set

Wherein

The K-type numerical data of the s-th intelligent manufacturing equipment on the D-th day, T is the total number of days of the data, D is the failure prediction date, and N isThe total number of intelligent manufacturing equipment;

All numerical data representing the s-th smart manufacturing equipment,

all text data representing day d of the s smart manufacturing facility;

Inputting to the linear layer to obtain a channel embedding matrix

Input to the linear layer and pass throughObtaining a time-sequence embedding matrix by over-position coding

S23, text data of the intelligent manufacturing equipment of the s-th station

S3, embedding the channel into the matrix

The method specifically comprises the following substeps:

s31. pair

Dimension size of each vector in the mapping matrix;

SU3. self-attention weight matrix s_aSum matrix v_aMultiplication to obtain

S33, normalizing the characteristics obtained in the step S32

The layer channel coding characteristic has the calculation formula as follows:

s35, the first

Layer channel coding features

Is input to the first

Layer channel coding layer, repeating steps S31-S34;

s4. sliding window tower encoder is to chronogenesis embedding matrix

The method specifically comprises the following substeps:

s41, pair

Normalization processing is carried out on the time sequence characteristics extracted from the layer sliding window coding layer

When the number is odd, the following operations are performed:

SN1. the first

A sliding window mask attention layer

SN2. using no more than

When in use

When the number is even, the following operations are performed:

SN5. the first

The first window mask attention layer is first

The sliding window mask is focused on all sliding windows of the power layer

Moving according to the size of a time unit;

SN6. adopt a composition of not more than

SN7. in this layerPerforming mask self-attention calculation in the sliding window, and calculating the self-attention weight matrix s in the multi-head attention calculation step_aSetting the upper triangular element to be 0;

SN8. mask self-attention calculation using fixed sliding window and residual structure calculation

s51. pair

Normalization of text features of layer outputs

s53, normalizing the text features obtained by calculation in the step S52

s61, global time sequence characteristics are combined

S62, global time sequence characteristics are combined

Transposing to obtain aligned features

The calculation formula is as follows:

wherein FC (-) is a fully connected layer;

SW1. characterization of channels

Aggregation timing characterization

And weighted text features

Inputting the data to a gate control module;

2. The method for predicting the failure of the intelligent manufacturing equipment based on the gated three towers is characterized in that the step S42 realizes efficient multi-scale time sequence feature extraction by performing mask attention calculation in a plurality of non-overlapping adjacent time sequence sliding windows, and establishes a time sequence information exchange mechanism in the plurality of sliding windows through the sliding windows.

3. The method of claim 1, wherein step S5 is performed by calculating global timing characteristics through a cross-tower attention module

And text features

4. The intelligent manufacturing equipment fault prediction method based on the gated three towers is characterized in that in the step S6, the gated module is adopted to fuse the channel characteristics, the aggregation time sequence characteristics and the weighted text characteristics, and a multi-characteristic fusion vector method is used to perform fault prediction, so that the fault prediction accuracy of the model and the robustness of the model are improved.