CN113626597B - Intelligent manufacturing equipment fault prediction method based on gated three towers - Google Patents
Intelligent manufacturing equipment fault prediction method based on gated three towers Download PDFInfo
- Publication number
- CN113626597B CN113626597B CN202110830568.6A CN202110830568A CN113626597B CN 113626597 B CN113626597 B CN 113626597B CN 202110830568 A CN202110830568 A CN 202110830568A CN 113626597 B CN113626597 B CN 113626597B
- Authority
- CN
- China
- Prior art keywords
- layer
- text
- attention
- time sequence
- tower
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/08—Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/12—Classification; Matching
Abstract
The invention discloses a method for predicting the fault of intelligent manufacturing equipment based on a gated three-tower, which comprises the following steps: an S1 channel tower encoder; s2 a sliding window tower encoder with a multi-scale aggregation module; s3 text tower encoder with cross tower attention module; s4 gating the module. Firstly, inputting a channel embedding matrix into a channel tower encoder to obtain channel characteristics; inputting the time sequence embedded matrix into a sliding window tower encoder with a multi-scale aggregation module to obtain an aggregation time sequence characteristic; further, inputting the text representation into a text tower encoder to obtain text features, and inputting the text features and the time sequence features into a tower crossing attention module to obtain weighted text features; finally, fusing the channel characteristics, the aggregation time sequence characteristics and the weighted text characteristics through a gate control module to predict the fault types; by calculating the cross entropy loss optimization parameters, the model can dynamically and adaptively fuse three characteristics of a plurality of intelligent manufacturing devices, so that the accuracy of fault prediction is improved.
Description
Technical Field
The invention relates to the field of predictive maintenance of intelligent manufacturing, and provides a method for predicting a fault of intelligent manufacturing equipment based on a gated three-tower Transformer, aiming at the problem of predicting the fault category by using numerical data acquired by a sensor during the operation of the intelligent manufacturing equipment and text data of an operation log, and combining channel characteristics, time characteristics and text characteristics of log data of the numerical data.
Background
In recent years, the country has continuously come out encouragement policies to support smart manufacturing, which has become an important development trend in the manufacturing industry. The rise of intelligent manufacturing industries in various places has brought forward a batch of intelligent manufacturing enterprises and industrial parks, and corresponding system equipment is continuously scaled and complicated, so that the operation and maintenance requirements of intelligent manufacturing equipment are higher. The intelligent manufacturing equipment may malfunction during the production process, and if a processing method or a maintenance strategy is not established in advance, the quality and the production efficiency of the product may be affected and even huge economic losses are caused. For this purpose, a fault prediction technology in predictive maintenance is introduced, a future possible fault mode of the intelligent manufacturing equipment is predicted according to the operation state data of the intelligent manufacturing equipment, and a predictive maintenance plan is made in advance. The fault prediction technology takes numerical data collected by a sensor during the operation of intelligent manufacturing equipment, text data of an operation log and the like as input, and outputs predicted fault types by extracting and analyzing data characteristics. In recent years, the development of data feature extraction and analysis methods is greatly promoted by the rapid development of deep learning, and the method is expected to be applied to a fault prediction technology.
At present, scholars at home and abroad make a lot of valuable research achievements in the field of fault prediction. The fault prediction technology (such as grey theory, independent element analysis method and the like) based on statistical analysis analyzes and predicts the future operating state of the intelligent manufacturing equipment by counting historical operating data, but because the constraint of dimensional linearity is adopted, the fault prediction technology is difficult to adapt to a complex nonlinear system in a real situation. Fault prediction technologies based on signal processing (such as wavelet transform method, spectrum analysis method, etc.) are difficult to track the operation data sequence of intelligent manufacturing equipment for a long time, and easily cause the reduction of prediction performance. The fault prediction technology (such as a convolutional neural network, a cyclic neural network and the like) based on deep learning can effectively extract important characteristic information from historical operating data to perform fault prediction, and is suitable for uncertain and complex intelligent manufacturing equipment systems. Recently, a transform model derived from the natural language processing field is popular in the deep learning field, and a multi-point attention mechanism thereof can be used for extracting the key feature information of diversity in the running data.
The existing failure prediction method still has many defects: firstly, many methods only utilize the time sequence characteristics of numerical data acquired by a sensor, but do not fully utilize channel characteristics, and a method for extracting the channel characteristics based on convolution needs to design a receptive field in a complicated way and cannot establish a global channel incidence relation; secondly, the time sequence scale is often fixed when the time sequence features are extracted, local time sequence information is not utilized, and the calculation cost is easily overlarge by a method of stacking convolution layers; in addition, when the existing method processes log text data, feature extraction is often needed manually and failure types are further analyzed and predicted, and an end-to-end training method capable of effectively fusing numerical data and text data features is lacked.
Disclosure of Invention
The invention provides an intelligent manufacturing equipment fault prediction method based on a gated three-tower Transformer, aiming at the defects of extraction and fusion of numerical data acquired by a sensor and text data characteristics of an operation log in the existing fault prediction technology. Firstly, a sliding window mask attention mechanism is designed to extract multi-scale time sequence characteristics and a multi-scale polymerization module is used for time sequence characteristic polymerization, so that not only can the mask attention be applied in a plurality of sliding windows, the calculation expense of a model is reduced, but also the extraction and expression capacity of the model to local time sequence characteristic information can be increased; and then, after text features are extracted, learning the text-time sequence attention weight by adopting a cross-tower attention mechanism, and effectively realizing end-to-end intelligent manufacturing equipment fault prediction.
The invention adopts a Transformer architecture consisting of a plurality of encoders. Firstly, inputting a channel embedding matrix into a channel tower encoder to obtain channel characteristics of numerical data; then, inputting the time sequence embedded matrix into a sliding window tower encoder with a multi-scale aggregation module to obtain an aggregation time sequence characteristic, wherein the characteristic comprises multi-scale global and local time sequence information; further, the text representation is input to a text tower encoder to obtain text characteristics, and the text characteristics and the time sequence characteristics are input to a tower crossing attention module to obtain weighted text characteristics, so that the model can tend to predict by using text characteristics related to fault information; finally, the gate control module is adopted to fuse the channel characteristics, the aggregation time sequence characteristics and the weighted text characteristics to predict the fault types, so that the model can dynamically and adaptively fuse three characteristics of a plurality of intelligent manufacturing equipment, and the fault prediction accuracy of the intelligent manufacturing equipment is improved.
The method firstly obtains a related numerical data set collected by a sensor of a plurality of intelligent manufacturing equipment of the same type in a certain number of daysRunning a journal text datasetAnd equipment status data setWhereinK-type numerical data (such as filling temperature, pressure, flow and the like) of the intelligent manufacturing equipment on the D day for the s th intelligent manufacturing equipment, wherein T is the total number of days of data, D is the failure prediction date, and N is the total number of the intelligent manufacturing equipment;providing a plurality of log text data for the s intelligent manufacturing equipment on the d day;the state of the s intelligent manufacturing equipment on the day D is truly marked, if the state type of the s intelligent manufacturing equipment belongs to B, the state type is 1, otherwise, the state type is 0, and B is the total number of states (for example, the state is normal, a certain component of the equipment is in failure, and the like). In combination withAll numerical data representing the s-th smart manufacturing equipment,all text data representing day d of the s smart manufacturing facility.
The specific implementation of the invention comprises the following steps:
s1, data acquisition: acquiring related numerical data sets collected by sensors of multiple intelligent manufacturing equipment of the same type in a certain number of daysRunning a journal text dataset And equipment status data set
WhereinThe K-type numerical data of the s-th intelligent manufacturing equipment on the D-th day are obtained, T is the total number of days of data, D is the failure prediction date, and N is the total number of the intelligent manufacturing equipment;
the true mark of the state B of the s intelligent manufacturing equipment on the D day is represented, if the state type of the s intelligent manufacturing equipment belongs to B, the state type is 1, otherwise, the state type is 0, and B is the total number of states; in combination withAll numerical data representing the s-th smart manufacturing equipment,all text data representing day d of the s smart manufacturing facility;
s2, transforming the numerical data and the text data to obtain a channel embedding matrix, a time sequence embedding matrix and a sentence embedding vector, and specifically comprising the following substeps:
s21, transposed numerical data of the intelligent manufacturing equipment of the s-th platformInputting to the linear layer to obtain a channel embedding matrix
S22, the original numerical data of the intelligent manufacturing equipment of the s-th stationInputting the time sequence embedded matrix into the linear layer and obtaining the time sequence embedded matrix through position coding
S23, text data of the intelligent manufacturing equipment of the s-th stationInputting the sentence embedding vector into a BERT model to obtain a sentence embedding vector of each text datum;
s24, performing pooling operation on sentence embedding vectors of each day by respectively adopting minimum value, average value and maximum value to obtain text representation
S3, embedding the channel into the matrixInputting the data to a channel tower encoder to obtain the channel characteristics of numerical data, wherein the channel tower encoder is composed of an encoder LcIndividual channel coding layerThe method specifically comprises the following substeps:
s31. pairNormalization processing is carried out on channel characteristics extracted from layer channel coding layer
S32, multi-head attention layer feature extraction is carried out on the normalized features obtained in the step S31, a residual error structure is adopted in the layer, and the calculation formula is as follows:
the specific operation of the multi-head attention layer comprises the following sub-steps:
SU1 multiplication of channel characteristics and parameter matrix to obtain the a-th self-attention SAa(. input) embedding matrix mapped query matrix qaKey matrix kaSum matrix va;
SU2. calculating by normalized index softmax function to obtain self-attention weight matrix Dimension size of each vector in the mapping matrix;
And SU4, splicing the features obtained by the attention layers, and multiplying the parameter matrix to obtain the output features of the MSA layer
S34, inputting the normalized features into a multi-layer perceptron to perform feature extraction, wherein the result is the firstLayer channel coding characteristics, calculation formula thereofComprises the following steps:
s35, the firstLayer channel coding featuresIs input to the firstLayer channel coding layer, repeating steps S31-S34;
s4. sliding window tower encoder is to chronogenesis embedding matrixExtracting multilayer and multi-scale features to obtain multi-scale time sequence featuresThe method specifically comprises the following substeps:
s41, pairNormalizing the time sequence characteristics extracted layer by layer of layer sliding window codes
S42, extracting the characteristics of the normalized characteristics by adopting a sliding window mask attention layer, wherein the characteristics are obtained by extracting the characteristics of the normalized characteristics by adopting a sliding window mask attention layerWhen the number is odd, the following operations are performed:
SN1. the firstA sliding window mask attention layerThe time sequence characteristics are non-overlapped and equally divided by the sliding window with the time unit size
SN2. using no more thanThe sliding window with the time unit size completely comprises the edge timing sequence characteristics;
SN3, performing mask self-attention calculation in the sliding window of the layer, and calculating the self-attention weight matrix s in the multi-head attention calculation stepaSetting the upper triangular element to be 0;
SN4, performing mask self-attention calculation by using a fixed sliding window, and calculating the output time sequence characteristics of the layer by adopting a residual error structure
SN5. the firstThe first window mask attention layer is firstThe sliding window mask is focused on all sliding windows of the power layerMoving according to the size of a time unit;
SN6. adopt a composition of not more thanThe sliding window with the time unit size completely comprises the edge timing sequence characteristics;
SN7. inPerforming mask self-attention calculation in the sliding window of the layer, and calculating the self-attention weight matrix s in the multi-head attention calculation stepaSetting the upper triangular element to be 0;
SN8, performing mask self-attention calculation by using a fixed sliding window, and calculating the output time sequence characteristics of the layer by adopting a residual error structure
S44, extracting the characteristics of the normalized time sequence characteristics by adopting a multilayer perceptron,
s45, multi-scale time sequence characteristics output by sliding window coding layerLtPerforming multi-scale polymerization to obtain polymerization timing characteristicsAnd outputting, wherein the characteristics comprise multi-scale global and local time sequence information, and the calculation formula is as follows:
s5, inputting the text representation into a text tower encoder to obtain text characteristics, and inputting the text characteristics and the time sequence characteristics into a tower crossing attention module to obtain weighted text characteristics, and the method specifically comprises the following substeps:
S52, extracting characteristics of the normalized text characteristics through a multi-head attention layer, wherein the calculation formula is as follows:
S54, carrying out multi-layer perceptron feature extraction on the normalized text features, wherein the calculation formula is as follows:
s6, calculating and outputting a prediction fault category probability vector by adopting a gate control module to fuse channel characteristics, aggregation time sequence characteristics and weighted text characteristics, and specifically comprising the following substeps:
s61, global time sequence characteristics are combinedInputting the global time sequence characteristics to the full connection layer to obtain linear mapping
S63, obtaining the text-time sequence attention weight by using a matrix multiplication operation and a Softmax (-) functionThe calculation formula is as follows:
wherein FC (-) is a fully connected layer;
s65, obtaining a prediction result according to the weighted fusion of the three features, and specifically comprising the following substeps:
SW1. characterization of channelsAggregation timing characterizationAnd weighted text featuresInputting the data to a gate control module;
and SW2, the gating layer G performs weighted fusion on the three characteristics through self-adaptive weight to obtain gating characteristics
SW3, inputting gating characteristics to a full connection layer FC to obtain the probability vector y of the predicted fault category of the s-th intelligent manufacturing equipmentsThe calculation formula is as follows:
and S7, calculating cross entropy loss according to the probability vector of the predicted fault category, wherein the step is only used in the training process and is used for guiding the model to accurately predict the fault category of the intelligent manufacturing equipment.
Preferably, step S42 implements efficient multi-scale time series feature extraction by performing mask attention calculation in multiple non-overlapping adjacent time series sliding windows, and establishes a time series information exchange mechanism in multiple sliding windows through the sliding windows.
A sliding window mask attention layer realizes efficient multi-scale time sequence feature extraction by performing mask attention calculation in a plurality of non-overlapping adjacent time sequence sliding windows and establishes a plurality of time sequence information exchange mechanisms in the sliding windows; meanwhile, the invention designs a multi-scale aggregation module as a functional module for aggregating multi-scale time sequence characteristics. The calculation formula of the multi-scale time sequence feature extraction and the multi-scale aggregation is as follows:
preferably, step S5 is performed by the cross-tower attention module by computing a global timing featureAnd text featuresThe attention weights of (a) enable end-to-end learning of the model for text-to-time correlations.
Preferably, in step S6, a gate control module is used to fuse the channel feature, the aggregation timing feature and the weighted text feature, and a multi-feature fusion vector method is used to perform fault prediction, so as to improve the accuracy of the model for fault prediction and the robustness of the model.
The gated three-tower Transformer architecture is composed of a channel tower encoder, a sliding window tower encoder with a multi-scale aggregation module, a text tower encoder with a tower-crossing attention module and a gated layer. The channel tower encoder, the sliding window tower encoder and the text tower encoder can effectively extract channel characteristics, aggregation time sequence characteristics and weighted text characteristics of text data of numerical data respectively, and the gating layer performs weighted fusion on the three characteristics by using dynamic weights, so that the model can perform characteristic self-adaptation on data of multiple intelligent manufacturing equipment, and accuracy of predicting fault types of the intelligent manufacturing equipment is improved.
Drawings
FIG. 1 is a diagram of a gated three tower Transformer architecture;
FIG. 2 is a schematic diagram of channel characteristics and timing characteristics;
FIG. 3 is a diagram of a daily textual representation extraction architecture;
FIG. 4 is a diagram of a structure of a sliding window mask attention layer and a multi-scale aggregation module;
FIG. 5 is a cross-tower attention module block diagram.
Detailed Description
Example 1
The invention provides an intelligent manufacturing equipment fault prediction technology based on a gated three-tower Transformer. As shown in fig. 1, the overall architecture consists of a channel tower encoder, a sliding window tower encoder with a multi-scale aggregation module, a text tower encoder with a cross-tower attention module, and a gating module. Firstly, inputting a channel embedding matrix into a channel tower encoder to obtain channel characteristics of numerical data; then, inputting the time sequence embedded matrix into a sliding window tower encoder with a multi-scale aggregation module to obtain an aggregation time sequence characteristic, wherein the characteristic comprises multi-scale global and local time sequence information; further, the text representation is input to a text tower encoder to obtain text characteristics, and the text characteristics and the time sequence characteristics are input to a tower crossing attention module to obtain weighted text characteristics, so that the model can tend to predict by using text characteristics related to fault information; then, a gating module is adopted to fuse the channel characteristics, the aggregation time sequence characteristics and the weighted text characteristics to calculate and output a predicted fault category probability vector; and finally, calculating cross entropy loss according to the probability vector of the predicted fault category, wherein the step is only used in the training process and is used for guiding the model to accurately predict the fault category of the intelligent manufacturing equipment.
The method firstly obtains a related numerical data set collected by a sensor of a plurality of intelligent manufacturing equipment of the same type in a certain number of daysRunning a journal text datasetAnd equipment status data setWhereinK-type numerical data (such as filling temperature, pressure, flow and the like) of the intelligent manufacturing equipment on the D day for the s th intelligent manufacturing equipment, wherein T is the total number of days of data, D is the failure prediction date, and N is the total number of the intelligent manufacturing equipment;providing a plurality of log text data for the s intelligent manufacturing equipment on the d day;the state of the s intelligent manufacturing equipment on the day D is truly marked, if the state type of the s intelligent manufacturing equipment belongs to B, the state type is 1, otherwise, the state type is 0, and B is the total number of states (for example, the state is normal, a certain component of the equipment is in failure, and the like). In combination withAll numerical data representing the s-th smart manufacturing equipment,all text data representing day d of the s smart manufacturing facility.
The implementation steps are described in detail below with reference to the accompanying drawings.
Step (1) As shown in FIG. 1, the s-th station is intelligently controlledTransposed numerical data of a manufacturing installationInputting to the linear layer to obtain a channel embedding matrixRaw numerical data of the s-th intelligent manufacturing equipmentInputting the time sequence embedded matrix into the linear layer and obtaining the time sequence embedded matrix through position coding As shown in fig. 2, the channel embedding matrix and the timing embedding matrix are embedded representations of numerical data within a channel and a timing, respectively. As shown in fig. 3, the log text data of the s-th smart manufacturing equipment is recordedInputting into BERT (bidirectional Encoder retrieval from transformations) to obtain sentence embedding vector of each text, and performing minimum value, average value and maximum value pooling operation on the sentence embedding vectors of each day to obtain text representation
Step (2) embedding the channels into the matrix as shown in FIG. 1Input to a channel tower encoder consisting of LcIndividual channel coding layerForming, extracting the multi-layer characteristics of the channel embedding matrix to obtain the channel characteristicsThe channel coding layer is composed of two sub-layers with residual structure, the first sub-layer is composed of layer normalization operation and multi-head attention layer, the second sub-layer is composed of layer normalization operation and multi-layer perceptron, the first sub-layer is composed of layer normalization operation and multi-head perceptronIndividual channel coding layerThe calculation formula of the channel feature extraction in (1) is as follows:
whereinLN (-) is a layer normalization operation, MLP (-) is a multilayer perceptron, MSA (-) is a multi-head attention, and the calculation steps are:
whereinIs a parameter matrix, qakavaFor the a-th self attention SAa(. input) embedding matrix mapped query, key and value matrices, saIn order to self-attention the weight matrix,for the dimension size of each vector in the mapping matrix, Softmax (-) is a normalized exponential function, A is the total number of self-attentions, [, … ·]For a splicing operation.
Channel tower encoder outputIs the L thcIndividual channel coding layerChannel characteristics of the output:
step (3) embedding the timing sequence into the matrix as shown in FIG. 1Input to a polymerization module M with multi-scaletThe sliding window tower encoder is composed ofA sliding window coding layerThe method comprises the steps of extracting multilayer and multi-scale features of a time sequence embedded matrix to obtain multi-scale time sequence featuresPerforming multi-scale polymerization on the multi-scale time sequence characteristics by using a multi-scale polymerization module to obtain polymerization time sequence characteristics
The invention provides a sliding window mask attention layer, which realizes high-efficiency multi-scale time sequence characteristic extraction by performing mask attention calculation in a plurality of non-overlapping adjacent time sequence sliding windows and establishes a time sequence information exchange mechanism in the plurality of sliding windows through the sliding windows; meanwhile, the invention designs a multi-scale aggregation module as a functional module for aggregating multi-scale time sequence characteristics. The calculation formula of the multi-scale time sequence feature extraction and the multi-scale aggregation is as follows:
and (3.1) the sliding window coding layer is composed of two sub-layers with residual error structures, the first sub-layer is composed of a layer normalization operation and a sliding window mask attention layer, and the second sub-layer is composed of a layer normalization operation and a multi-layer perceptron. As shown in fig. 4, whenWhen it is odd, the firstThe window mask attention layer is firstThe time sequence characteristics are non-overlapped and equally divided by the sliding window with the time unit sizeThen using no more thanA time unit sized sliding window fully encompasses the edge timing feature. When in useWhen it is even, the firstThe first window mask attention layer is firstThe sliding window mask is focused on all sliding windows of the power layerMoving by a time unit size not greater thanA time unit sized sliding window fully encompasses the edge timing feature. Each window mask attention layer is on the layerPerforming mask self-attention calculation in the sliding window, and calculating the self-attention weight matrix s in the multi-head attention calculation stepaSetting the upper triangle element to 0 is the mask self-attention calculation step. When in useIs odd, the firstIs first and secondIndividual channel coding layerThe calculation formula of the channel feature extraction in (1) is as follows:
wherein the content of the first and second substances,RW-MSA (-) performs mask self-attention computation for odd sliding window mask attention layer using fixed sliding window, SW-MSA (-) performs mask self-attention computation for even sliding window mask attention layer using moved sliding window.
Step (3.2) multiscale aggregation Module MtFormed by matrix splicing operations, for even numbers of sums LtMulti-scale timing feature of multiple sliding window coding layer outputLtPerforming multi-scale polymerization to obtain polymerization timing characteristicsAnd outputting, wherein the calculation formula is as follows:
step (4) representing the textInput to attention module with tower crossingThe text tower encoder is composed ofA text coding layerThe composition is that the text representation is subjected to multi-layer feature extraction to obtain text features
Step (4.1) the text coding layer is composed of two sub-layers with residual error structures, the first sub-layer is composed of layer normalization operation and a multi-head attention layer, the second sub-layer is composed of layer normalization operation and a multi-layer perceptron, the first sub-layer is composed of a layer normalization operation and a multi-head attention layerA text coding layerThe calculation formula of the text feature extraction in (1) is as follows:
Step (4.2) As shown in FIG. 5, the Cross-Tower attention ModuleUtilizing global timing featuresAnd text featuresCalculating text-time sequence attention weight and weighting text features to obtain weighted text featuresFirst global timing characteristicsInputting the global time sequence characteristics to the full connection layer to obtain linear mappingTransposing the same to obtainTo align the timing. Text-to-time attention weighting using matrix multiplication operations and Softmax (-) functionThe calculation formula is as follows:
where FC (-) is the full connectivity layer.
Output of a cross-tower attention moduleFor weighted text features computed using text-to-time attention weights:
a cross-tower attention module by computing globalTiming characteristicsAnd text featuresThe attention weights of (a) enable end-to-end learning of the model for text-to-time correlations. The text feature extraction and text feature weighting calculation formula is as follows:
step (5) characterizing the channelAggregation timing characterizationAnd weighted text featuresInput to the gating module. The gating layer G performs weighted fusion on the three characteristics through self-adaptive weight to obtain gating characteristicsThe calculation formula is as follows:
to gate featuresInputting the data into a full connection layer to obtain the predicted fault category probability vector of the intelligent manufacturing equipment of the s th stationThe calculation formula is as follows:
and (6) calculating loss, and optimizing the gated three-tower Transformer. In order to improve the consistency between the fault category and the real equipment state, the invention adopts cross entropy loss, and the calculation formula is
Wherein y iss,bEquipping the s intelligent manufacturing with a real mark belonging to the b state,and if the state type of the s-th intelligent manufacturing equipment belongs to B, the state type is 1, otherwise, the state type is 0, B is the total number of states, and N is the total number of the intelligent manufacturing equipment.
Claims (4)
1. A method for predicting the fault of intelligent manufacturing equipment based on a gated three-tower is characterized by comprising the following steps:
s1, data acquisition: acquiring related numerical data sets collected by sensors of multiple intelligent manufacturing equipment of the same type in a certain number of daysRunning a journal text datasetAnd equipment status data set
WhereinThe K-type numerical data of the s-th intelligent manufacturing equipment on the D-th day, T is the total number of days of the data, D is the failure prediction date, and N isThe total number of intelligent manufacturing equipment;
the true mark of the state B of the s intelligent manufacturing equipment on the D day is represented, if the state type of the s intelligent manufacturing equipment belongs to B, the state type is 1, otherwise, the state type is 0, and B is the total number of states; in combination withAll numerical data representing the s-th smart manufacturing equipment, all text data representing day d of the s smart manufacturing facility;
s2, transforming the numerical data and the text data to obtain a channel embedding matrix, a time sequence embedding matrix and a sentence embedding vector, and specifically comprising the following substeps:
s21, transposed numerical data of the intelligent manufacturing equipment of the s-th platformInputting to the linear layer to obtain a channel embedding matrix
S22, the original numerical data of the intelligent manufacturing equipment of the s-th stationInput to the linear layer and pass throughObtaining a time-sequence embedding matrix by over-position coding
S23, text data of the intelligent manufacturing equipment of the s-th stationInputting the sentence embedding vector into a BERT model to obtain a sentence embedding vector of each text datum;
s24, performing pooling operation on sentence embedding vectors of each day by respectively adopting minimum value, average value and maximum value to obtain text representation
S3, embedding the channel into the matrixInputting the data to a channel tower encoder to obtain the channel characteristics of numerical data, wherein the channel tower encoder is composed of an encoder LcIndividual channel coding layerThe method specifically comprises the following substeps:
s31. pairNormalization processing is carried out on channel characteristics extracted from layer channel coding layer
S32, multi-head attention layer feature extraction is carried out on the normalized features obtained in the step S31, a residual error structure is adopted in the layer, and the calculation formula is as follows:
the specific operation of the multi-head attention layer comprises the following sub-steps:
SU1 multiplication of channel characteristics and parameter matrix to obtain the a-th self-attention SAa(. input) embedding matrix mapped query matrix qaKey matrix kaSum matrix va;
SU2. calculating by normalized index softmax function to obtain self-attention weight matrixDimension size of each vector in the mapping matrix;
And SU4, splicing the features obtained by the attention layers, and multiplying the parameter matrix to obtain the output features of the MSA layer
S34, inputting the normalized features into a multi-layer perceptron to perform feature extraction, wherein the result is the firstThe layer channel coding characteristic has the calculation formula as follows:
s35, the firstLayer channel coding featuresIs input to the firstLayer channel coding layer, repeating steps S31-S34;
s4. sliding window tower encoder is to chronogenesis embedding matrixExtracting multilayer and multi-scale features to obtain multi-scale time sequence featuresThe method specifically comprises the following substeps:
s41, pairNormalization processing is carried out on the time sequence characteristics extracted from the layer sliding window coding layer
S42, extracting the characteristics of the normalized characteristics by adopting a sliding window mask attention layer, wherein the characteristics are obtained by extracting the characteristics of the normalized characteristics by adopting a sliding window mask attention layerWhen the number is odd, the following operations are performed:
SN1. the firstA sliding window mask attention layerThe time sequence characteristics are non-overlapped and equally divided by the sliding window with the time unit size
SN2. using no more thanThe sliding window with the time unit size completely comprises the edge timing sequence characteristics;
SN3, performing mask self-attention calculation in the sliding window of the layer, and calculating the self-attention weight matrix s in the multi-head attention calculation stepaSetting the upper triangular element to be 0;
SN4, performing mask self-attention calculation by using a fixed sliding window, and calculating the output time sequence characteristics of the layer by adopting a residual error structure
SN5. the firstThe first window mask attention layer is firstThe sliding window mask is focused on all sliding windows of the power layerMoving according to the size of a time unit;
SN6. adopt a composition of not more thanThe sliding window with the time unit size completely comprises the edge timing sequence characteristics;
SN7. in this layerPerforming mask self-attention calculation in the sliding window, and calculating the self-attention weight matrix s in the multi-head attention calculation stepaSetting the upper triangular element to be 0;
SN8. mask self-attention calculation using fixed sliding window and residual structure calculation
S44, extracting the characteristics of the normalized time sequence characteristics by adopting a multilayer perceptron,
s45, multi-scale time sequence characteristics output by sliding window coding layerLtPerforming multi-scale polymerization to obtain polymerization timing characteristicsAnd outputting, wherein the characteristics comprise multi-scale global and local time sequence information, and the calculation formula is as follows:
s5, inputting the text representation into a text tower encoder to obtain text characteristics, and inputting the text characteristics and the time sequence characteristics into a tower crossing attention module to obtain weighted text characteristics, and the method specifically comprises the following substeps:
S52, extracting characteristics of the normalized text characteristics through a multi-head attention layer, wherein the calculation formula is as follows:
S54, carrying out multi-layer perceptron feature extraction on the normalized text features, wherein the calculation formula is as follows:
s6, calculating and outputting a prediction fault category probability vector by adopting a gate control module to fuse channel characteristics, aggregation time sequence characteristics and weighted text characteristics, and specifically comprising the following substeps:
s61, global time sequence characteristics are combinedInputting the global time sequence characteristics to the full connection layer to obtain linear mapping
S63, obtaining the text-time sequence attention weight by using a matrix multiplication operation and a Softmax (-) functionThe calculation formula is as follows:
wherein FC (-) is a fully connected layer;
s65, obtaining a prediction result according to the weighted fusion of the three features, and specifically comprising the following substeps:
SW1. characterization of channelsAggregation timing characterizationAnd weighted text featuresInputting the data to a gate control module;
and SW2, the gating layer G performs weighted fusion on the three characteristics through self-adaptive weight to obtain gating characteristics
SW3, inputting gating characteristics to a full connection layer FC to obtain the probability vector y of the predicted fault category of the s-th intelligent manufacturing equipmentsThe calculation formula is as follows:
and S7, calculating cross entropy loss according to the probability vector of the predicted fault category, wherein the step is only used in the training process and is used for guiding the model to accurately predict the fault category of the intelligent manufacturing equipment.
2. The method for predicting the failure of the intelligent manufacturing equipment based on the gated three towers is characterized in that the step S42 realizes efficient multi-scale time sequence feature extraction by performing mask attention calculation in a plurality of non-overlapping adjacent time sequence sliding windows, and establishes a time sequence information exchange mechanism in the plurality of sliding windows through the sliding windows.
4. The intelligent manufacturing equipment fault prediction method based on the gated three towers is characterized in that in the step S6, the gated module is adopted to fuse the channel characteristics, the aggregation time sequence characteristics and the weighted text characteristics, and a multi-characteristic fusion vector method is used to perform fault prediction, so that the fault prediction accuracy of the model and the robustness of the model are improved.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110830568.6A CN113626597B (en) | 2021-07-22 | 2021-07-22 | Intelligent manufacturing equipment fault prediction method based on gated three towers |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110830568.6A CN113626597B (en) | 2021-07-22 | 2021-07-22 | Intelligent manufacturing equipment fault prediction method based on gated three towers |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113626597A CN113626597A (en) | 2021-11-09 |
CN113626597B true CN113626597B (en) | 2022-04-01 |
Family
ID=78380538
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110830568.6A Active CN113626597B (en) | 2021-07-22 | 2021-07-22 | Intelligent manufacturing equipment fault prediction method based on gated three towers |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113626597B (en) |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10699700B2 (en) * | 2018-07-31 | 2020-06-30 | Tencent Technology (Shenzhen) Company Limited | Monaural multi-talker speech recognition with attention mechanism and gated convolutional networks |
CN111079532B (en) * | 2019-11-13 | 2021-07-13 | 杭州电子科技大学 | Video content description method based on text self-encoder |
CN112489635B (en) * | 2020-12-03 | 2022-11-11 | 杭州电子科技大学 | Multi-mode emotion recognition method based on attention enhancement mechanism |
CN112818035B (en) * | 2021-01-29 | 2022-05-17 | 湖北工业大学 | Network fault prediction method, terminal equipment and storage medium |
CN112926303B (en) * | 2021-02-23 | 2023-06-27 | 南京邮电大学 | Malicious URL detection method based on BERT-BiGRU |
-
2021
- 2021-07-22 CN CN202110830568.6A patent/CN113626597B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN113626597A (en) | 2021-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111832814B (en) | Air pollutant concentration prediction method based on graph attention mechanism | |
CN113316163B (en) | Long-term network traffic prediction method based on deep learning | |
CN111046583B (en) | Point machine fault diagnosis method based on DTW algorithm and ResNet network | |
CN113255209B (en) | Method for predicting residual life of bearing of gearbox | |
CN113361559B (en) | Multi-mode data knowledge information extraction method based on deep-width combined neural network | |
CN115587454A (en) | Traffic flow long-term prediction method and system based on improved Transformer model | |
CN115758891B (en) | Airfoil flow field prediction method based on converter decoder network | |
CN112560948A (en) | Eye fundus map classification method and imaging method under data deviation | |
CN114819386A (en) | Conv-Transformer-based flood forecasting method | |
CN116151459A (en) | Power grid flood prevention risk probability prediction method and system based on improved Transformer | |
CN115831377A (en) | Intra-hospital death risk prediction method based on ICU (intensive care unit) medical record data | |
CN113626597B (en) | Intelligent manufacturing equipment fault prediction method based on gated three towers | |
CN110335160A (en) | A kind of medical treatment migratory behaviour prediction technique and system for improving Bi-GRU based on grouping and attention | |
CN117154256A (en) | Electrochemical repair method for lithium battery | |
CN113792919B (en) | Wind power prediction method based on combination of transfer learning and deep learning | |
CN115801152A (en) | WiFi action identification method based on hierarchical transform model | |
Yu et al. | Time series cross-correlation network for wind power prediction | |
CN114707829A (en) | Target person rescission risk prediction method based on structured data linear expansion | |
Lin | Intelligent Fault Diagnosis of Consumer Electronics Sensor in IoE via Transformer | |
Dun et al. | A novel hybrid model based on spatiotemporal correlation for air quality prediction | |
CN117725491B (en) | SCINet-based power system fault state detection and classification method | |
CN116361673B (en) | Quasi-periodic time sequence unsupervised anomaly detection method, system and terminal | |
CN116579505B (en) | Electromechanical equipment cross-domain residual life prediction method and system without full life cycle sample | |
CN117349610B (en) | Fracturing operation multi-time-step pressure prediction method based on time sequence model | |
US20230350402A1 (en) | Multi-task learning based rul predication method under sensor fault condition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |