CN113626597B - Intelligent manufacturing equipment fault prediction method based on gated three towers - Google Patents

Intelligent manufacturing equipment fault prediction method based on gated three towers Download PDF

Info

Publication number
CN113626597B
CN113626597B CN202110830568.6A CN202110830568A CN113626597B CN 113626597 B CN113626597 B CN 113626597B CN 202110830568 A CN202110830568 A CN 202110830568A CN 113626597 B CN113626597 B CN 113626597B
Authority
CN
China
Prior art keywords
layer
text
attention
time sequence
tower
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110830568.6A
Other languages
Chinese (zh)
Other versions
CN113626597A (en
Inventor
张新
陈嘉
陈涛
王东京
石云海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202110830568.6A priority Critical patent/CN113626597B/en
Publication of CN113626597A publication Critical patent/CN113626597A/en
Application granted granted Critical
Publication of CN113626597B publication Critical patent/CN113626597B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Abstract

The invention discloses a method for predicting the fault of intelligent manufacturing equipment based on a gated three-tower, which comprises the following steps: an S1 channel tower encoder; s2 a sliding window tower encoder with a multi-scale aggregation module; s3 text tower encoder with cross tower attention module; s4 gating the module. Firstly, inputting a channel embedding matrix into a channel tower encoder to obtain channel characteristics; inputting the time sequence embedded matrix into a sliding window tower encoder with a multi-scale aggregation module to obtain an aggregation time sequence characteristic; further, inputting the text representation into a text tower encoder to obtain text features, and inputting the text features and the time sequence features into a tower crossing attention module to obtain weighted text features; finally, fusing the channel characteristics, the aggregation time sequence characteristics and the weighted text characteristics through a gate control module to predict the fault types; by calculating the cross entropy loss optimization parameters, the model can dynamically and adaptively fuse three characteristics of a plurality of intelligent manufacturing devices, so that the accuracy of fault prediction is improved.

Description

Intelligent manufacturing equipment fault prediction method based on gated three towers
Technical Field
The invention relates to the field of predictive maintenance of intelligent manufacturing, and provides a method for predicting a fault of intelligent manufacturing equipment based on a gated three-tower Transformer, aiming at the problem of predicting the fault category by using numerical data acquired by a sensor during the operation of the intelligent manufacturing equipment and text data of an operation log, and combining channel characteristics, time characteristics and text characteristics of log data of the numerical data.
Background
In recent years, the country has continuously come out encouragement policies to support smart manufacturing, which has become an important development trend in the manufacturing industry. The rise of intelligent manufacturing industries in various places has brought forward a batch of intelligent manufacturing enterprises and industrial parks, and corresponding system equipment is continuously scaled and complicated, so that the operation and maintenance requirements of intelligent manufacturing equipment are higher. The intelligent manufacturing equipment may malfunction during the production process, and if a processing method or a maintenance strategy is not established in advance, the quality and the production efficiency of the product may be affected and even huge economic losses are caused. For this purpose, a fault prediction technology in predictive maintenance is introduced, a future possible fault mode of the intelligent manufacturing equipment is predicted according to the operation state data of the intelligent manufacturing equipment, and a predictive maintenance plan is made in advance. The fault prediction technology takes numerical data collected by a sensor during the operation of intelligent manufacturing equipment, text data of an operation log and the like as input, and outputs predicted fault types by extracting and analyzing data characteristics. In recent years, the development of data feature extraction and analysis methods is greatly promoted by the rapid development of deep learning, and the method is expected to be applied to a fault prediction technology.
At present, scholars at home and abroad make a lot of valuable research achievements in the field of fault prediction. The fault prediction technology (such as grey theory, independent element analysis method and the like) based on statistical analysis analyzes and predicts the future operating state of the intelligent manufacturing equipment by counting historical operating data, but because the constraint of dimensional linearity is adopted, the fault prediction technology is difficult to adapt to a complex nonlinear system in a real situation. Fault prediction technologies based on signal processing (such as wavelet transform method, spectrum analysis method, etc.) are difficult to track the operation data sequence of intelligent manufacturing equipment for a long time, and easily cause the reduction of prediction performance. The fault prediction technology (such as a convolutional neural network, a cyclic neural network and the like) based on deep learning can effectively extract important characteristic information from historical operating data to perform fault prediction, and is suitable for uncertain and complex intelligent manufacturing equipment systems. Recently, a transform model derived from the natural language processing field is popular in the deep learning field, and a multi-point attention mechanism thereof can be used for extracting the key feature information of diversity in the running data.
The existing failure prediction method still has many defects: firstly, many methods only utilize the time sequence characteristics of numerical data acquired by a sensor, but do not fully utilize channel characteristics, and a method for extracting the channel characteristics based on convolution needs to design a receptive field in a complicated way and cannot establish a global channel incidence relation; secondly, the time sequence scale is often fixed when the time sequence features are extracted, local time sequence information is not utilized, and the calculation cost is easily overlarge by a method of stacking convolution layers; in addition, when the existing method processes log text data, feature extraction is often needed manually and failure types are further analyzed and predicted, and an end-to-end training method capable of effectively fusing numerical data and text data features is lacked.
Disclosure of Invention
The invention provides an intelligent manufacturing equipment fault prediction method based on a gated three-tower Transformer, aiming at the defects of extraction and fusion of numerical data acquired by a sensor and text data characteristics of an operation log in the existing fault prediction technology. Firstly, a sliding window mask attention mechanism is designed to extract multi-scale time sequence characteristics and a multi-scale polymerization module is used for time sequence characteristic polymerization, so that not only can the mask attention be applied in a plurality of sliding windows, the calculation expense of a model is reduced, but also the extraction and expression capacity of the model to local time sequence characteristic information can be increased; and then, after text features are extracted, learning the text-time sequence attention weight by adopting a cross-tower attention mechanism, and effectively realizing end-to-end intelligent manufacturing equipment fault prediction.
The invention adopts a Transformer architecture consisting of a plurality of encoders. Firstly, inputting a channel embedding matrix into a channel tower encoder to obtain channel characteristics of numerical data; then, inputting the time sequence embedded matrix into a sliding window tower encoder with a multi-scale aggregation module to obtain an aggregation time sequence characteristic, wherein the characteristic comprises multi-scale global and local time sequence information; further, the text representation is input to a text tower encoder to obtain text characteristics, and the text characteristics and the time sequence characteristics are input to a tower crossing attention module to obtain weighted text characteristics, so that the model can tend to predict by using text characteristics related to fault information; finally, the gate control module is adopted to fuse the channel characteristics, the aggregation time sequence characteristics and the weighted text characteristics to predict the fault types, so that the model can dynamically and adaptively fuse three characteristics of a plurality of intelligent manufacturing equipment, and the fault prediction accuracy of the intelligent manufacturing equipment is improved.
The method firstly obtains a related numerical data set collected by a sensor of a plurality of intelligent manufacturing equipment of the same type in a certain number of days
Figure BDA0003175298900000021
Running a journal text dataset
Figure BDA0003175298900000022
And equipment status data set
Figure BDA0003175298900000023
Wherein
Figure BDA0003175298900000024
K-type numerical data (such as filling temperature, pressure, flow and the like) of the intelligent manufacturing equipment on the D day for the s th intelligent manufacturing equipment, wherein T is the total number of days of data, D is the failure prediction date, and N is the total number of the intelligent manufacturing equipment;
Figure BDA0003175298900000025
providing a plurality of log text data for the s intelligent manufacturing equipment on the d day;
Figure BDA0003175298900000026
the state of the s intelligent manufacturing equipment on the day D is truly marked, if the state type of the s intelligent manufacturing equipment belongs to B, the state type is 1, otherwise, the state type is 0, and B is the total number of states (for example, the state is normal, a certain component of the equipment is in failure, and the like). In combination with
Figure BDA0003175298900000027
All numerical data representing the s-th smart manufacturing equipment,
Figure BDA0003175298900000028
all text data representing day d of the s smart manufacturing facility.
The specific implementation of the invention comprises the following steps:
s1, data acquisition: acquiring related numerical data sets collected by sensors of multiple intelligent manufacturing equipment of the same type in a certain number of days
Figure BDA0003175298900000029
Running a journal text dataset
Figure BDA00031752989000000210
Figure BDA00031752989000000211
And equipment status data set
Figure BDA00031752989000000212
Wherein
Figure BDA00031752989000000213
The K-type numerical data of the s-th intelligent manufacturing equipment on the D-th day are obtained, T is the total number of days of data, D is the failure prediction date, and N is the total number of the intelligent manufacturing equipment;
Figure BDA00031752989000000214
providing a plurality of log text data for the s intelligent manufacturing equipment on the d day;
Figure BDA00031752989000000215
the true mark of the state B of the s intelligent manufacturing equipment on the D day is represented, if the state type of the s intelligent manufacturing equipment belongs to B, the state type is 1, otherwise, the state type is 0, and B is the total number of states; in combination with
Figure BDA00031752989000000216
All numerical data representing the s-th smart manufacturing equipment,
Figure BDA00031752989000000217
all text data representing day d of the s smart manufacturing facility;
s2, transforming the numerical data and the text data to obtain a channel embedding matrix, a time sequence embedding matrix and a sentence embedding vector, and specifically comprising the following substeps:
s21, transposed numerical data of the intelligent manufacturing equipment of the s-th platform
Figure BDA0003175298900000031
Inputting to the linear layer to obtain a channel embedding matrix
Figure BDA0003175298900000032
S22, the original numerical data of the intelligent manufacturing equipment of the s-th station
Figure BDA0003175298900000033
Inputting the time sequence embedded matrix into the linear layer and obtaining the time sequence embedded matrix through position coding
Figure BDA0003175298900000034
S23, text data of the intelligent manufacturing equipment of the s-th station
Figure BDA0003175298900000035
Inputting the sentence embedding vector into a BERT model to obtain a sentence embedding vector of each text datum;
s24, performing pooling operation on sentence embedding vectors of each day by respectively adopting minimum value, average value and maximum value to obtain text representation
Figure BDA0003175298900000036
S3, embedding the channel into the matrix
Figure BDA0003175298900000037
Inputting the data to a channel tower encoder to obtain the channel characteristics of numerical data, wherein the channel tower encoder is composed of an encoder LcIndividual channel coding layer
Figure BDA0003175298900000038
The method specifically comprises the following substeps:
s31. pair
Figure BDA0003175298900000039
Normalization processing is carried out on channel characteristics extracted from layer channel coding layer
Figure BDA00031752989000000310
S32, multi-head attention layer feature extraction is carried out on the normalized features obtained in the step S31, a residual error structure is adopted in the layer, and the calculation formula is as follows:
Figure BDA00031752989000000311
the specific operation of the multi-head attention layer comprises the following sub-steps:
SU1 multiplication of channel characteristics and parameter matrix to obtain the a-th self-attention SAa(. input) embedding matrix mapped query matrix qaKey matrix kaSum matrix va
SU2. calculating by normalized index softmax function to obtain self-attention weight matrix
Figure BDA00031752989000000312
Figure BDA00031752989000000313
Figure BDA00031752989000000314
Dimension size of each vector in the mapping matrix;
SU3. self-attention weight matrix saSum matrix vaMultiplication to obtain
Figure BDA00031752989000000315
And SU4, splicing the features obtained by the attention layers, and multiplying the parameter matrix to obtain the output features of the MSA layer
Figure BDA00031752989000000316
S33, normalizing the characteristics obtained in the step S32
Figure BDA00031752989000000317
S34, inputting the normalized features into a multi-layer perceptron to perform feature extraction, wherein the result is the first
Figure BDA00031752989000000318
Layer channel coding characteristics, calculation formula thereofComprises the following steps:
Figure BDA00031752989000000319
s35, the first
Figure BDA00031752989000000320
Layer channel coding features
Figure BDA00031752989000000321
Is input to the first
Figure BDA00031752989000000322
Layer channel coding layer, repeating steps S31-S34;
s4. sliding window tower encoder is to chronogenesis embedding matrix
Figure BDA00031752989000000323
Extracting multilayer and multi-scale features to obtain multi-scale time sequence features
Figure BDA00031752989000000324
The method specifically comprises the following substeps:
s41, pair
Figure BDA00031752989000000325
Normalizing the time sequence characteristics extracted layer by layer of layer sliding window codes
Figure BDA00031752989000000326
S42, extracting the characteristics of the normalized characteristics by adopting a sliding window mask attention layer, wherein the characteristics are obtained by extracting the characteristics of the normalized characteristics by adopting a sliding window mask attention layer
Figure BDA00031752989000000327
When the number is odd, the following operations are performed:
SN1. the first
Figure BDA00031752989000000328
A sliding window mask attention layer
Figure BDA00031752989000000329
The time sequence characteristics are non-overlapped and equally divided by the sliding window with the time unit size
Figure BDA0003175298900000041
SN2. using no more than
Figure BDA0003175298900000042
The sliding window with the time unit size completely comprises the edge timing sequence characteristics;
SN3, performing mask self-attention calculation in the sliding window of the layer, and calculating the self-attention weight matrix s in the multi-head attention calculation stepaSetting the upper triangular element to be 0;
SN4, performing mask self-attention calculation by using a fixed sliding window, and calculating the output time sequence characteristics of the layer by adopting a residual error structure
Figure BDA0003175298900000043
When in use
Figure BDA0003175298900000044
When the number is even, the following operations are performed:
SN5. the first
Figure BDA0003175298900000045
The first window mask attention layer is first
Figure BDA0003175298900000046
The sliding window mask is focused on all sliding windows of the power layer
Figure BDA0003175298900000047
Moving according to the size of a time unit;
SN6. adopt a composition of not more than
Figure BDA0003175298900000048
The sliding window with the time unit size completely comprises the edge timing sequence characteristics;
SN7. inPerforming mask self-attention calculation in the sliding window of the layer, and calculating the self-attention weight matrix s in the multi-head attention calculation stepaSetting the upper triangular element to be 0;
SN8, performing mask self-attention calculation by using a fixed sliding window, and calculating the output time sequence characteristics of the layer by adopting a residual error structure
Figure BDA0003175298900000049
S43, normalizing the time sequence characteristics extracted in the step S42
Figure BDA00031752989000000410
S44, extracting the characteristics of the normalized time sequence characteristics by adopting a multilayer perceptron,
Figure BDA00031752989000000411
Figure BDA00031752989000000412
s45, multi-scale time sequence characteristics output by sliding window coding layer
Figure BDA00031752989000000413
LtPerforming multi-scale polymerization to obtain polymerization timing characteristics
Figure BDA00031752989000000414
And outputting, wherein the characteristics comprise multi-scale global and local time sequence information, and the calculation formula is as follows:
Figure BDA00031752989000000415
s5, inputting the text representation into a text tower encoder to obtain text characteristics, and inputting the text characteristics and the time sequence characteristics into a tower crossing attention module to obtain weighted text characteristics, and the method specifically comprises the following substeps:
s51. pair
Figure BDA00031752989000000416
Normalization of text features of layer outputs
Figure BDA00031752989000000417
S52, extracting characteristics of the normalized text characteristics through a multi-head attention layer, wherein the calculation formula is as follows:
Figure BDA00031752989000000418
Figure BDA00031752989000000419
s53, normalizing the text features obtained by calculation in the step S52
Figure BDA00031752989000000420
S54, carrying out multi-layer perceptron feature extraction on the normalized text features, wherein the calculation formula is as follows:
Figure BDA00031752989000000421
Figure BDA00031752989000000422
s6, calculating and outputting a prediction fault category probability vector by adopting a gate control module to fuse channel characteristics, aggregation time sequence characteristics and weighted text characteristics, and specifically comprising the following substeps:
s61, global time sequence characteristics are combined
Figure BDA00031752989000000423
Inputting the global time sequence characteristics to the full connection layer to obtain linear mapping
Figure BDA00031752989000000424
S62, global time sequence characteristics are combined
Figure BDA00031752989000000425
Transposing to obtain aligned features
Figure BDA00031752989000000426
S63, obtaining the text-time sequence attention weight by using a matrix multiplication operation and a Softmax (-) function
Figure BDA00031752989000000427
The calculation formula is as follows:
Figure BDA00031752989000000428
wherein FC (-) is a fully connected layer;
s64, outputting weighted text features calculated by adopting text-time sequence attention weight:
Figure BDA0003175298900000051
s65, obtaining a prediction result according to the weighted fusion of the three features, and specifically comprising the following substeps:
SW1. characterization of channels
Figure BDA0003175298900000052
Aggregation timing characterization
Figure BDA0003175298900000053
And weighted text features
Figure BDA0003175298900000054
Inputting the data to a gate control module;
and SW2, the gating layer G performs weighted fusion on the three characteristics through self-adaptive weight to obtain gating characteristics
Figure BDA0003175298900000055
SW3, inputting gating characteristics to a full connection layer FC to obtain the probability vector y of the predicted fault category of the s-th intelligent manufacturing equipmentsThe calculation formula is as follows:
Figure BDA0003175298900000056
and S7, calculating cross entropy loss according to the probability vector of the predicted fault category, wherein the step is only used in the training process and is used for guiding the model to accurately predict the fault category of the intelligent manufacturing equipment.
Preferably, step S42 implements efficient multi-scale time series feature extraction by performing mask attention calculation in multiple non-overlapping adjacent time series sliding windows, and establishes a time series information exchange mechanism in multiple sliding windows through the sliding windows.
A sliding window mask attention layer realizes efficient multi-scale time sequence feature extraction by performing mask attention calculation in a plurality of non-overlapping adjacent time sequence sliding windows and establishes a plurality of time sequence information exchange mechanisms in the sliding windows; meanwhile, the invention designs a multi-scale aggregation module as a functional module for aggregating multi-scale time sequence characteristics. The calculation formula of the multi-scale time sequence feature extraction and the multi-scale aggregation is as follows:
Figure BDA0003175298900000057
preferably, step S5 is performed by the cross-tower attention module by computing a global timing feature
Figure BDA0003175298900000058
And text features
Figure BDA0003175298900000059
The attention weights of (a) enable end-to-end learning of the model for text-to-time correlations.
Preferably, in step S6, a gate control module is used to fuse the channel feature, the aggregation timing feature and the weighted text feature, and a multi-feature fusion vector method is used to perform fault prediction, so as to improve the accuracy of the model for fault prediction and the robustness of the model.
The gated three-tower Transformer architecture is composed of a channel tower encoder, a sliding window tower encoder with a multi-scale aggregation module, a text tower encoder with a tower-crossing attention module and a gated layer. The channel tower encoder, the sliding window tower encoder and the text tower encoder can effectively extract channel characteristics, aggregation time sequence characteristics and weighted text characteristics of text data of numerical data respectively, and the gating layer performs weighted fusion on the three characteristics by using dynamic weights, so that the model can perform characteristic self-adaptation on data of multiple intelligent manufacturing equipment, and accuracy of predicting fault types of the intelligent manufacturing equipment is improved.
Drawings
FIG. 1 is a diagram of a gated three tower Transformer architecture;
FIG. 2 is a schematic diagram of channel characteristics and timing characteristics;
FIG. 3 is a diagram of a daily textual representation extraction architecture;
FIG. 4 is a diagram of a structure of a sliding window mask attention layer and a multi-scale aggregation module;
FIG. 5 is a cross-tower attention module block diagram.
Detailed Description
Example 1
The invention provides an intelligent manufacturing equipment fault prediction technology based on a gated three-tower Transformer. As shown in fig. 1, the overall architecture consists of a channel tower encoder, a sliding window tower encoder with a multi-scale aggregation module, a text tower encoder with a cross-tower attention module, and a gating module. Firstly, inputting a channel embedding matrix into a channel tower encoder to obtain channel characteristics of numerical data; then, inputting the time sequence embedded matrix into a sliding window tower encoder with a multi-scale aggregation module to obtain an aggregation time sequence characteristic, wherein the characteristic comprises multi-scale global and local time sequence information; further, the text representation is input to a text tower encoder to obtain text characteristics, and the text characteristics and the time sequence characteristics are input to a tower crossing attention module to obtain weighted text characteristics, so that the model can tend to predict by using text characteristics related to fault information; then, a gating module is adopted to fuse the channel characteristics, the aggregation time sequence characteristics and the weighted text characteristics to calculate and output a predicted fault category probability vector; and finally, calculating cross entropy loss according to the probability vector of the predicted fault category, wherein the step is only used in the training process and is used for guiding the model to accurately predict the fault category of the intelligent manufacturing equipment.
The method firstly obtains a related numerical data set collected by a sensor of a plurality of intelligent manufacturing equipment of the same type in a certain number of days
Figure BDA0003175298900000061
Running a journal text dataset
Figure BDA0003175298900000062
And equipment status data set
Figure BDA0003175298900000063
Wherein
Figure BDA0003175298900000064
K-type numerical data (such as filling temperature, pressure, flow and the like) of the intelligent manufacturing equipment on the D day for the s th intelligent manufacturing equipment, wherein T is the total number of days of data, D is the failure prediction date, and N is the total number of the intelligent manufacturing equipment;
Figure BDA0003175298900000065
providing a plurality of log text data for the s intelligent manufacturing equipment on the d day;
Figure BDA0003175298900000066
the state of the s intelligent manufacturing equipment on the day D is truly marked, if the state type of the s intelligent manufacturing equipment belongs to B, the state type is 1, otherwise, the state type is 0, and B is the total number of states (for example, the state is normal, a certain component of the equipment is in failure, and the like). In combination with
Figure BDA0003175298900000067
All numerical data representing the s-th smart manufacturing equipment,
Figure BDA0003175298900000068
all text data representing day d of the s smart manufacturing facility.
The implementation steps are described in detail below with reference to the accompanying drawings.
Step (1) As shown in FIG. 1, the s-th station is intelligently controlledTransposed numerical data of a manufacturing installation
Figure BDA0003175298900000069
Inputting to the linear layer to obtain a channel embedding matrix
Figure BDA00031752989000000610
Raw numerical data of the s-th intelligent manufacturing equipment
Figure BDA00031752989000000611
Inputting the time sequence embedded matrix into the linear layer and obtaining the time sequence embedded matrix through position coding
Figure BDA00031752989000000612
Figure BDA0003175298900000071
As shown in fig. 2, the channel embedding matrix and the timing embedding matrix are embedded representations of numerical data within a channel and a timing, respectively. As shown in fig. 3, the log text data of the s-th smart manufacturing equipment is recorded
Figure BDA0003175298900000072
Inputting into BERT (bidirectional Encoder retrieval from transformations) to obtain sentence embedding vector of each text, and performing minimum value, average value and maximum value pooling operation on the sentence embedding vectors of each day to obtain text representation
Figure BDA0003175298900000073
Step (2) embedding the channels into the matrix as shown in FIG. 1
Figure BDA0003175298900000074
Input to a channel tower encoder consisting of LcIndividual channel coding layer
Figure BDA0003175298900000075
Forming, extracting the multi-layer characteristics of the channel embedding matrix to obtain the channel characteristics
Figure BDA0003175298900000076
The channel coding layer is composed of two sub-layers with residual structure, the first sub-layer is composed of layer normalization operation and multi-head attention layer, the second sub-layer is composed of layer normalization operation and multi-layer perceptron, the first sub-layer is composed of layer normalization operation and multi-head perceptron
Figure BDA0003175298900000077
Individual channel coding layer
Figure BDA0003175298900000078
The calculation formula of the channel feature extraction in (1) is as follows:
Figure BDA0003175298900000079
wherein
Figure BDA00031752989000000710
LN (-) is a layer normalization operation, MLP (-) is a multilayer perceptron, MSA (-) is a multi-head attention, and the calculation steps are:
Figure BDA00031752989000000711
wherein
Figure BDA00031752989000000712
Is a parameter matrix, qakavaFor the a-th self attention SAa(. input) embedding matrix mapped query, key and value matrices, saIn order to self-attention the weight matrix,
Figure BDA00031752989000000713
for the dimension size of each vector in the mapping matrix, Softmax (-) is a normalized exponential function, A is the total number of self-attentions, [, … ·]For a splicing operation.
Channel tower encoder output
Figure BDA00031752989000000714
Is the L thcIndividual channel coding layer
Figure BDA00031752989000000715
Channel characteristics of the output:
Figure BDA00031752989000000716
step (3) embedding the timing sequence into the matrix as shown in FIG. 1
Figure BDA00031752989000000717
Input to a polymerization module M with multi-scaletThe sliding window tower encoder is composed of
Figure BDA00031752989000000718
A sliding window coding layer
Figure BDA00031752989000000719
The method comprises the steps of extracting multilayer and multi-scale features of a time sequence embedded matrix to obtain multi-scale time sequence features
Figure BDA00031752989000000720
Performing multi-scale polymerization on the multi-scale time sequence characteristics by using a multi-scale polymerization module to obtain polymerization time sequence characteristics
Figure BDA00031752989000000721
The invention provides a sliding window mask attention layer, which realizes high-efficiency multi-scale time sequence characteristic extraction by performing mask attention calculation in a plurality of non-overlapping adjacent time sequence sliding windows and establishes a time sequence information exchange mechanism in the plurality of sliding windows through the sliding windows; meanwhile, the invention designs a multi-scale aggregation module as a functional module for aggregating multi-scale time sequence characteristics. The calculation formula of the multi-scale time sequence feature extraction and the multi-scale aggregation is as follows:
Figure BDA00031752989000000722
and (3.1) the sliding window coding layer is composed of two sub-layers with residual error structures, the first sub-layer is composed of a layer normalization operation and a sliding window mask attention layer, and the second sub-layer is composed of a layer normalization operation and a multi-layer perceptron. As shown in fig. 4, when
Figure BDA0003175298900000081
When it is odd, the first
Figure BDA0003175298900000082
The window mask attention layer is first
Figure BDA0003175298900000083
The time sequence characteristics are non-overlapped and equally divided by the sliding window with the time unit size
Figure BDA0003175298900000084
Then using no more than
Figure BDA0003175298900000085
A time unit sized sliding window fully encompasses the edge timing feature. When in use
Figure BDA0003175298900000086
When it is even, the first
Figure BDA0003175298900000087
The first window mask attention layer is first
Figure BDA0003175298900000088
The sliding window mask is focused on all sliding windows of the power layer
Figure BDA0003175298900000089
Moving by a time unit size not greater than
Figure BDA00031752989000000810
A time unit sized sliding window fully encompasses the edge timing feature. Each window mask attention layer is on the layerPerforming mask self-attention calculation in the sliding window, and calculating the self-attention weight matrix s in the multi-head attention calculation stepaSetting the upper triangle element to 0 is the mask self-attention calculation step. When in use
Figure BDA00031752989000000811
Is odd, the first
Figure BDA00031752989000000812
Is first and second
Figure BDA00031752989000000813
Individual channel coding layer
Figure BDA00031752989000000814
The calculation formula of the channel feature extraction in (1) is as follows:
Figure BDA00031752989000000815
wherein the content of the first and second substances,
Figure BDA00031752989000000816
RW-MSA (-) performs mask self-attention computation for odd sliding window mask attention layer using fixed sliding window, SW-MSA (-) performs mask self-attention computation for even sliding window mask attention layer using moved sliding window.
Step (3.2) multiscale aggregation Module MtFormed by matrix splicing operations, for even numbers of sums LtMulti-scale timing feature of multiple sliding window coding layer output
Figure BDA00031752989000000817
LtPerforming multi-scale polymerization to obtain polymerization timing characteristics
Figure BDA00031752989000000818
And outputting, wherein the calculation formula is as follows:
Figure BDA00031752989000000819
step (4) representing the text
Figure BDA00031752989000000820
Input to attention module with tower crossing
Figure BDA00031752989000000821
The text tower encoder is composed of
Figure BDA00031752989000000822
A text coding layer
Figure BDA00031752989000000823
The composition is that the text representation is subjected to multi-layer feature extraction to obtain text features
Figure BDA00031752989000000824
Step (4.1) the text coding layer is composed of two sub-layers with residual error structures, the first sub-layer is composed of layer normalization operation and a multi-head attention layer, the second sub-layer is composed of layer normalization operation and a multi-layer perceptron, the first sub-layer is composed of a layer normalization operation and a multi-head attention layer
Figure BDA00031752989000000825
A text coding layer
Figure BDA00031752989000000826
The calculation formula of the text feature extraction in (1) is as follows:
Figure BDA00031752989000000827
wherein
Figure BDA00031752989000000828
Step (4.2) As shown in FIG. 5, the Cross-Tower attention Module
Figure BDA00031752989000000829
Utilizing global timing features
Figure BDA00031752989000000830
And text features
Figure BDA00031752989000000831
Calculating text-time sequence attention weight and weighting text features to obtain weighted text features
Figure BDA00031752989000000832
First global timing characteristics
Figure BDA00031752989000000833
Inputting the global time sequence characteristics to the full connection layer to obtain linear mapping
Figure BDA00031752989000000834
Transposing the same to obtain
Figure BDA00031752989000000835
To align the timing. Text-to-time attention weighting using matrix multiplication operations and Softmax (-) function
Figure BDA00031752989000000836
The calculation formula is as follows:
Figure BDA0003175298900000091
where FC (-) is the full connectivity layer.
Output of a cross-tower attention module
Figure BDA0003175298900000092
For weighted text features computed using text-to-time attention weights:
Figure BDA0003175298900000093
a cross-tower attention module by computing globalTiming characteristics
Figure BDA0003175298900000094
And text features
Figure BDA0003175298900000095
The attention weights of (a) enable end-to-end learning of the model for text-to-time correlations. The text feature extraction and text feature weighting calculation formula is as follows:
Figure BDA0003175298900000096
step (5) characterizing the channel
Figure BDA0003175298900000097
Aggregation timing characterization
Figure BDA0003175298900000098
And weighted text features
Figure BDA0003175298900000099
Input to the gating module. The gating layer G performs weighted fusion on the three characteristics through self-adaptive weight to obtain gating characteristics
Figure BDA00031752989000000910
The calculation formula is as follows:
Figure BDA00031752989000000911
to gate features
Figure BDA00031752989000000912
Inputting the data into a full connection layer to obtain the predicted fault category probability vector of the intelligent manufacturing equipment of the s th station
Figure BDA00031752989000000913
The calculation formula is as follows:
Figure BDA00031752989000000914
and (6) calculating loss, and optimizing the gated three-tower Transformer. In order to improve the consistency between the fault category and the real equipment state, the invention adopts cross entropy loss, and the calculation formula is
Figure BDA00031752989000000915
Wherein y iss,bEquipping the s intelligent manufacturing with a real mark belonging to the b state,
Figure BDA00031752989000000916
and if the state type of the s-th intelligent manufacturing equipment belongs to B, the state type is 1, otherwise, the state type is 0, B is the total number of states, and N is the total number of the intelligent manufacturing equipment.

Claims (4)

1. A method for predicting the fault of intelligent manufacturing equipment based on a gated three-tower is characterized by comprising the following steps:
s1, data acquisition: acquiring related numerical data sets collected by sensors of multiple intelligent manufacturing equipment of the same type in a certain number of days
Figure FDA0003511787140000011
Running a journal text dataset
Figure FDA0003511787140000012
And equipment status data set
Figure FDA0003511787140000013
Wherein
Figure FDA0003511787140000014
The K-type numerical data of the s-th intelligent manufacturing equipment on the D-th day, T is the total number of days of the data, D is the failure prediction date, and N isThe total number of intelligent manufacturing equipment;
Figure FDA0003511787140000015
providing a plurality of log text data for the s intelligent manufacturing equipment on the d day;
Figure FDA0003511787140000016
the true mark of the state B of the s intelligent manufacturing equipment on the D day is represented, if the state type of the s intelligent manufacturing equipment belongs to B, the state type is 1, otherwise, the state type is 0, and B is the total number of states; in combination with
Figure FDA0003511787140000017
All numerical data representing the s-th smart manufacturing equipment,
Figure FDA0003511787140000018
Figure FDA0003511787140000019
all text data representing day d of the s smart manufacturing facility;
s2, transforming the numerical data and the text data to obtain a channel embedding matrix, a time sequence embedding matrix and a sentence embedding vector, and specifically comprising the following substeps:
s21, transposed numerical data of the intelligent manufacturing equipment of the s-th platform
Figure FDA00035117871400000110
Inputting to the linear layer to obtain a channel embedding matrix
Figure FDA00035117871400000111
S22, the original numerical data of the intelligent manufacturing equipment of the s-th station
Figure FDA00035117871400000112
Input to the linear layer and pass throughObtaining a time-sequence embedding matrix by over-position coding
Figure FDA00035117871400000113
S23, text data of the intelligent manufacturing equipment of the s-th station
Figure FDA00035117871400000114
Inputting the sentence embedding vector into a BERT model to obtain a sentence embedding vector of each text datum;
s24, performing pooling operation on sentence embedding vectors of each day by respectively adopting minimum value, average value and maximum value to obtain text representation
Figure FDA00035117871400000115
S3, embedding the channel into the matrix
Figure FDA00035117871400000116
Inputting the data to a channel tower encoder to obtain the channel characteristics of numerical data, wherein the channel tower encoder is composed of an encoder LcIndividual channel coding layer
Figure FDA00035117871400000117
The method specifically comprises the following substeps:
s31. pair
Figure FDA00035117871400000118
Normalization processing is carried out on channel characteristics extracted from layer channel coding layer
Figure FDA00035117871400000119
S32, multi-head attention layer feature extraction is carried out on the normalized features obtained in the step S31, a residual error structure is adopted in the layer, and the calculation formula is as follows:
Figure FDA00035117871400000120
the specific operation of the multi-head attention layer comprises the following sub-steps:
SU1 multiplication of channel characteristics and parameter matrix to obtain the a-th self-attention SAa(. input) embedding matrix mapped query matrix qaKey matrix kaSum matrix va
SU2. calculating by normalized index softmax function to obtain self-attention weight matrix
Figure FDA0003511787140000021
Dimension size of each vector in the mapping matrix;
SU3. self-attention weight matrix saSum matrix vaMultiplication to obtain
Figure FDA0003511787140000022
And SU4, splicing the features obtained by the attention layers, and multiplying the parameter matrix to obtain the output features of the MSA layer
Figure FDA0003511787140000023
S33, normalizing the characteristics obtained in the step S32
Figure FDA0003511787140000024
S34, inputting the normalized features into a multi-layer perceptron to perform feature extraction, wherein the result is the first
Figure FDA0003511787140000025
The layer channel coding characteristic has the calculation formula as follows:
Figure FDA0003511787140000026
s35, the first
Figure FDA0003511787140000027
Layer channel coding features
Figure FDA0003511787140000028
Is input to the first
Figure FDA0003511787140000029
Layer channel coding layer, repeating steps S31-S34;
s4. sliding window tower encoder is to chronogenesis embedding matrix
Figure FDA00035117871400000210
Extracting multilayer and multi-scale features to obtain multi-scale time sequence features
Figure FDA00035117871400000211
The method specifically comprises the following substeps:
s41, pair
Figure FDA00035117871400000212
Normalization processing is carried out on the time sequence characteristics extracted from the layer sliding window coding layer
Figure FDA00035117871400000213
S42, extracting the characteristics of the normalized characteristics by adopting a sliding window mask attention layer, wherein the characteristics are obtained by extracting the characteristics of the normalized characteristics by adopting a sliding window mask attention layer
Figure FDA00035117871400000214
When the number is odd, the following operations are performed:
SN1. the first
Figure FDA00035117871400000215
A sliding window mask attention layer
Figure FDA00035117871400000216
The time sequence characteristics are non-overlapped and equally divided by the sliding window with the time unit size
Figure FDA00035117871400000217
SN2. using no more than
Figure FDA00035117871400000218
The sliding window with the time unit size completely comprises the edge timing sequence characteristics;
SN3, performing mask self-attention calculation in the sliding window of the layer, and calculating the self-attention weight matrix s in the multi-head attention calculation stepaSetting the upper triangular element to be 0;
SN4, performing mask self-attention calculation by using a fixed sliding window, and calculating the output time sequence characteristics of the layer by adopting a residual error structure
Figure FDA00035117871400000219
Figure FDA00035117871400000220
When in use
Figure FDA00035117871400000221
When the number is even, the following operations are performed:
SN5. the first
Figure FDA00035117871400000222
The first window mask attention layer is first
Figure FDA00035117871400000223
The sliding window mask is focused on all sliding windows of the power layer
Figure FDA00035117871400000224
Moving according to the size of a time unit;
SN6. adopt a composition of not more than
Figure FDA00035117871400000225
The sliding window with the time unit size completely comprises the edge timing sequence characteristics;
SN7. in this layerPerforming mask self-attention calculation in the sliding window, and calculating the self-attention weight matrix s in the multi-head attention calculation stepaSetting the upper triangular element to be 0;
SN8. mask self-attention calculation using fixed sliding window and residual structure calculation
Figure FDA0003511787140000031
S44, extracting the characteristics of the normalized time sequence characteristics by adopting a multilayer perceptron,
Figure FDA0003511787140000032
Figure FDA0003511787140000033
s45, multi-scale time sequence characteristics output by sliding window coding layer
Figure FDA0003511787140000034
LtPerforming multi-scale polymerization to obtain polymerization timing characteristics
Figure FDA0003511787140000035
And outputting, wherein the characteristics comprise multi-scale global and local time sequence information, and the calculation formula is as follows:
Figure FDA0003511787140000036
s5, inputting the text representation into a text tower encoder to obtain text characteristics, and inputting the text characteristics and the time sequence characteristics into a tower crossing attention module to obtain weighted text characteristics, and the method specifically comprises the following substeps:
s51. pair
Figure FDA0003511787140000037
Normalization of text features of layer outputs
Figure FDA0003511787140000038
S52, extracting characteristics of the normalized text characteristics through a multi-head attention layer, wherein the calculation formula is as follows:
Figure FDA0003511787140000039
s53, normalizing the text features obtained by calculation in the step S52
Figure FDA00035117871400000310
S54, carrying out multi-layer perceptron feature extraction on the normalized text features, wherein the calculation formula is as follows:
Figure FDA00035117871400000311
s6, calculating and outputting a prediction fault category probability vector by adopting a gate control module to fuse channel characteristics, aggregation time sequence characteristics and weighted text characteristics, and specifically comprising the following substeps:
s61, global time sequence characteristics are combined
Figure FDA00035117871400000312
Inputting the global time sequence characteristics to the full connection layer to obtain linear mapping
Figure FDA00035117871400000313
S62, global time sequence characteristics are combined
Figure FDA00035117871400000314
Transposing to obtain aligned features
Figure FDA00035117871400000315
S63, obtaining the text-time sequence attention weight by using a matrix multiplication operation and a Softmax (-) function
Figure FDA00035117871400000316
The calculation formula is as follows:
Figure FDA0003511787140000041
wherein FC (-) is a fully connected layer;
s64, outputting weighted text features calculated by adopting text-time sequence attention weight:
Figure FDA0003511787140000042
Figure FDA0003511787140000043
s65, obtaining a prediction result according to the weighted fusion of the three features, and specifically comprising the following substeps:
SW1. characterization of channels
Figure FDA0003511787140000044
Aggregation timing characterization
Figure FDA0003511787140000045
And weighted text features
Figure FDA0003511787140000046
Inputting the data to a gate control module;
and SW2, the gating layer G performs weighted fusion on the three characteristics through self-adaptive weight to obtain gating characteristics
Figure FDA0003511787140000047
SW3, inputting gating characteristics to a full connection layer FC to obtain the probability vector y of the predicted fault category of the s-th intelligent manufacturing equipmentsThe calculation formula is as follows:
Figure FDA0003511787140000048
and S7, calculating cross entropy loss according to the probability vector of the predicted fault category, wherein the step is only used in the training process and is used for guiding the model to accurately predict the fault category of the intelligent manufacturing equipment.
2. The method for predicting the failure of the intelligent manufacturing equipment based on the gated three towers is characterized in that the step S42 realizes efficient multi-scale time sequence feature extraction by performing mask attention calculation in a plurality of non-overlapping adjacent time sequence sliding windows, and establishes a time sequence information exchange mechanism in the plurality of sliding windows through the sliding windows.
3. The method of claim 1, wherein step S5 is performed by calculating global timing characteristics through a cross-tower attention module
Figure FDA0003511787140000049
And text features
Figure FDA00035117871400000410
The attention weights of (a) enable end-to-end learning of the model for text-to-time correlations.
4. The intelligent manufacturing equipment fault prediction method based on the gated three towers is characterized in that in the step S6, the gated module is adopted to fuse the channel characteristics, the aggregation time sequence characteristics and the weighted text characteristics, and a multi-characteristic fusion vector method is used to perform fault prediction, so that the fault prediction accuracy of the model and the robustness of the model are improved.
CN202110830568.6A 2021-07-22 2021-07-22 Intelligent manufacturing equipment fault prediction method based on gated three towers Active CN113626597B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110830568.6A CN113626597B (en) 2021-07-22 2021-07-22 Intelligent manufacturing equipment fault prediction method based on gated three towers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110830568.6A CN113626597B (en) 2021-07-22 2021-07-22 Intelligent manufacturing equipment fault prediction method based on gated three towers

Publications (2)

Publication Number Publication Date
CN113626597A CN113626597A (en) 2021-11-09
CN113626597B true CN113626597B (en) 2022-04-01

Family

ID=78380538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110830568.6A Active CN113626597B (en) 2021-07-22 2021-07-22 Intelligent manufacturing equipment fault prediction method based on gated three towers

Country Status (1)

Country Link
CN (1) CN113626597B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10699700B2 (en) * 2018-07-31 2020-06-30 Tencent Technology (Shenzhen) Company Limited Monaural multi-talker speech recognition with attention mechanism and gated convolutional networks
CN111079532B (en) * 2019-11-13 2021-07-13 杭州电子科技大学 Video content description method based on text self-encoder
CN112489635B (en) * 2020-12-03 2022-11-11 杭州电子科技大学 Multi-mode emotion recognition method based on attention enhancement mechanism
CN112818035B (en) * 2021-01-29 2022-05-17 湖北工业大学 Network fault prediction method, terminal equipment and storage medium
CN112926303B (en) * 2021-02-23 2023-06-27 南京邮电大学 Malicious URL detection method based on BERT-BiGRU

Also Published As

Publication number Publication date
CN113626597A (en) 2021-11-09

Similar Documents

Publication Publication Date Title
CN111832814B (en) Air pollutant concentration prediction method based on graph attention mechanism
CN113316163B (en) Long-term network traffic prediction method based on deep learning
CN111046583B (en) Point machine fault diagnosis method based on DTW algorithm and ResNet network
CN113255209B (en) Method for predicting residual life of bearing of gearbox
CN113361559B (en) Multi-mode data knowledge information extraction method based on deep-width combined neural network
CN115587454A (en) Traffic flow long-term prediction method and system based on improved Transformer model
CN115758891B (en) Airfoil flow field prediction method based on converter decoder network
CN112560948A (en) Eye fundus map classification method and imaging method under data deviation
CN114819386A (en) Conv-Transformer-based flood forecasting method
CN116151459A (en) Power grid flood prevention risk probability prediction method and system based on improved Transformer
CN115831377A (en) Intra-hospital death risk prediction method based on ICU (intensive care unit) medical record data
CN113626597B (en) Intelligent manufacturing equipment fault prediction method based on gated three towers
CN110335160A (en) A kind of medical treatment migratory behaviour prediction technique and system for improving Bi-GRU based on grouping and attention
CN117154256A (en) Electrochemical repair method for lithium battery
CN113792919B (en) Wind power prediction method based on combination of transfer learning and deep learning
CN115801152A (en) WiFi action identification method based on hierarchical transform model
Yu et al. Time series cross-correlation network for wind power prediction
CN114707829A (en) Target person rescission risk prediction method based on structured data linear expansion
Lin Intelligent Fault Diagnosis of Consumer Electronics Sensor in IoE via Transformer
Dun et al. A novel hybrid model based on spatiotemporal correlation for air quality prediction
CN117725491B (en) SCINet-based power system fault state detection and classification method
CN116361673B (en) Quasi-periodic time sequence unsupervised anomaly detection method, system and terminal
CN116579505B (en) Electromechanical equipment cross-domain residual life prediction method and system without full life cycle sample
CN117349610B (en) Fracturing operation multi-time-step pressure prediction method based on time sequence model
US20230350402A1 (en) Multi-task learning based rul predication method under sensor fault condition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant