CN116028891B - Industrial anomaly detection model training method and device based on multi-model fusion - Google Patents
Industrial anomaly detection model training method and device based on multi-model fusion Download PDFInfo
- Publication number
- CN116028891B CN116028891B CN202310123067.3A CN202310123067A CN116028891B CN 116028891 B CN116028891 B CN 116028891B CN 202310123067 A CN202310123067 A CN 202310123067A CN 116028891 B CN116028891 B CN 116028891B
- Authority
- CN
- China
- Prior art keywords
- teacher
- model
- tensor
- layer
- student
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012549 training Methods 0.000 title claims abstract description 39
- 238000001514 detection method Methods 0.000 title claims abstract description 33
- 230000004927 fusion Effects 0.000 title claims abstract description 25
- 239000013598 vector Substances 0.000 claims abstract description 70
- 238000013528 artificial neural network Methods 0.000 claims abstract description 23
- 238000004821 distillation Methods 0.000 claims abstract description 13
- 238000013507 mapping Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 239000010410 layer Substances 0.000 claims description 166
- 230000009466 transformation Effects 0.000 claims description 20
- 230000002159 abnormal effect Effects 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 238000002372 labelling Methods 0.000 claims description 5
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 239000011229 interlayer Substances 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 239000002356 single layer Substances 0.000 claims description 2
- 239000002131 composite material Substances 0.000 claims 1
- 230000006870 function Effects 0.000 description 13
- 238000012545 processing Methods 0.000 description 10
- 230000005856 abnormality Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 101100497221 Bacillus thuringiensis subsp. alesti cry1Ae gene Proteins 0.000 description 1
- 238000007636 ensemble learning method Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses an industrial anomaly detection model training method and device based on multi-model fusion, wherein the method comprises the following steps: step one, preprocessing after acquiring sensor data; step two, respectively inputting the sensor characteristic tensor obtained by pretreatment into a plurality of teacher models and student models, and obtaining the characteristics of each network layer in the models; mapping the middle layer tensor in the characteristics into a public space tensor; step four, carrying out weighted average on public space tensors of all teacher models to obtain teacher weighted tensors corresponding to the public space tensors of students, and transversely splicing task layer vectors of all the teacher models into a teacher task layer splicing vector; step five, obtaining distillation loss, task loss and prediction loss of the model, and obtaining total loss by weighting and summing; and step six, repeating the steps, minimizing total loss, updating the parameters of the neural network of the student model until convergence, finally fixing the parameters of the neural network of the student model, obtaining a target model, and finishing training.
Description
Technical Field
The invention relates to the field of industrial equipment anomaly detection, in particular to an industrial anomaly detection model training method and device based on multi-model fusion.
Background
In the industrial field, correctly identifying the type of equipment anomaly helps the operation and maintenance personnel to lock the problem more quickly, so as to take corresponding measures in time. With the widespread use of industrial sensors, a large number of monitoring data for critical devices can be collected. The data-driven abnormality detection method has been developed, and by monitoring sensor data in real time, it is possible to dynamically identify whether an abnormality occurs in the apparatus, and to identify the type of the abnormality.
The deep learning neural network-based industrial abnormality detection method is gaining attention, and the deep learning-based industrial abnormality detection method has the following advantages: 1. the dependence on the characteristic engineering is less, and the end-to-end training can be realized; 2. the model has flexible structure and strong fitting capability, and can extract complex modes in data; however, the deep learning method has higher requirements on the labeled data set, and a larger amount of labeled data is often required to achieve a better prediction effect.
In the field of industrial anomaly detection, the difficulty of data annotation is high, and marked data is generally difficult to obtain; in addition, industrial data relates to data security and business confidentiality, equipment operation data of different factories and departments cannot be shared, and original data is difficult to obtain; in addition, the industrial equipment structure and the operation environment are complex, and it is difficult to grasp all abnormal types at first; there is therefore a need for an iterative model to take into account the newly discovered and defined anomaly types.
Typically, multiple models are trained for the same model of device, different factories, or the same factory over different historic periods; the prediction effect can be effectively improved by utilizing the existing model, wherein the effect of the integrated model can be improved by integrating a plurality of sub-models through traditional integrated learning; however, the ensemble learning method has the following problems: 1. all the submodels need to participate in calculation, and when the number of the submodels is large, the calculation pressure is obviously increased; 2. all sub-models are generally required to classify several identical categories, whereas in the field of industrial anomaly detection new anomaly types often occur, the anomaly categories supported by the different period models differ.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides an industrial anomaly detection model training method and device based on multi-model fusion, and the specific technical scheme is as follows:
an industrial anomaly detection model training method based on multi-model fusion comprises the following steps:
step one, preprocessing after acquiring sensor data;
step two, respectively inputting the sensor characteristic tensor obtained by preprocessing into a plurality of teacher models and student models, and obtaining the characteristics output by each network layer in the models, wherein the characteristics comprise middle layer tensors and task layer vectors;
step three, mapping the middle layer tensor of the teacher model and the middle layer tensor of the student model into a teacher public space tensor and a student public space tensor respectively;
step four, obtaining and averaging all the teacher public space tensors in a weighted manner according to the attention coefficient of each teacher public space tensor to obtain a teacher weighted tensor corresponding to the student public space tensor, and transversely splicing all the teacher model task layer vectors into one-dimensional teacher task layer splicing vector;
step five, comparing the public space tensor of the students with the corresponding teacher weighting tensor to obtain distillation loss; comparing the task layer vector of the student model with the teacher task layer splicing vector to obtain task loss; comparing the label marked by the data set with the task layer vector of the student model to obtain a prediction loss; obtaining a total loss based on the distillation loss, the mission loss and the predicted loss;
and step six, repeating the step one to the step five, minimizing the total loss, updating the neural network parameters of the student model until the neural network parameters of the student model are converged and fixed, obtaining a target model, and finishing training.
Further, the first step specifically comprises: converting sensor data into sensor feature tensors using a single-layer LSTM networkWherein->Is the time window size of the sensor data, i.e. the data length,/->Hidden layer dimensions for sensor feature tensors.
Further, the second step specifically includes the following substeps:
s21, tensor of sensor characteristicsRespectively input pre-trained +.>A teacher model for->A plurality of teacher models, each model having +.>A plurality of intermediate layers; for->No. H of the personal model>Intermediate layers, the output intermediate layer tensor isThe method comprises the steps of carrying out a first treatment on the surface of the Calculating to obtain the total->Middle layer tensors of the individual teacher models;
s22, tensor of sensor characteristicsInputting a student model, for the +.>Intermediate layer, calculated as->Intermediate layer tensor of student model of layer->;
S23, for the firstThe final layer of each teacher model is used for calculating and obtaining teacher task vector +.>The method comprises the steps of carrying out a first treatment on the surface of the For the final layer of the student model, the student task vector is calculated>The method comprises the steps of carrying out a first treatment on the surface of the The student task vector->Is equal to the sum of all teacher model task vector dimensions plus the number of categories that are newly present in the dataset.
Further, the third step specifically includes the following substeps:
s31, converting the middle layer tensor of the teacher model into a teacher public space tensor with the same dimension, for the firstNo. H of the teacher model>Individual layers with corresponding tensor of teacher's public space>Wherein->Representing a nonlinear transformation implemented by a convolutional neural network layer;
s32, if nonlinear transformationParameter->Is a solidIf yes, calculate to get the total->Personal teacher public space tensor->The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, update the nonlinear transformation by step S33, step S34->Parameter->;
S33, regarding the firstNo. H of the teacher model>Teacher public space tensor corresponding to each layer>By nonlinear transformation->Mapping it to middle layer tensor +.>Teacher middle layer reconstruction tensor with same dimension>;
S34, comparing the intermediate layer tensor of the teacher modelReconstructing tensor with teacher interlayer>Calculate reconstruction error +.>:
Wherein,,reconstructing a loss function; minimizing +.>Update nonlinear transformation->Parameter of->Until reconstruction error->Less than threshold->Or the iteration step number is satisfied, the parameter is fixed +.>;
S35, converting the middle layer tensor of the student model into a student public space tensor with the same dimension, for the firstIndividual layers (I)>The method comprises the steps of carrying out a first treatment on the surface of the Wherein->Is a nonlinear transformation realized by a neural network layer, and the parameters are +.>The method comprises the steps of carrying out a first treatment on the surface of the The dimensions of the student public space tensors are the same as those of the teacher public space tensor in the step S31.
Further, the step four specifically includes the following substeps:
s41, based on the firstLayered student public space tensor->First->Personal teacher model->Layer teacher public space tensorThrough the attention mechanism, the attention coefficient of the tensor of the teacher public space is obtained>The expression is:
s42, according to the attention coefficientWeighted average of all teacher public space tensors to obtain +.>Layer-corresponding teacher-weighted tensor->The expression is:
s43, splicing all task layer vectors of the teacher model into one-dimensional teacher task layer splicing vectorWherein if +_appears in the annotation dataset>New abnormal categories are spliced on the teacher task layer splicing vector with a length of +.>To obtain a new teacher task layer splicing vector, and the expression is:
wherein,,for vector concatenation operations, ++>For length +.>Is zero vector,/->The number of anomaly categories that are newly present in the dataset.
Further, the fifth step specifically includes the following substeps:
s51, comparing the student public space tensor of the kth layer of the student modelThe kth teacher-weighted tensor corresponding thereto>Obtaining distillation loss->The expression is:
wherein,,for the number of intermediate layers of the student model, +.>As a mean square error loss function, the expression is:
s52, comparing the task layer vectors of the student model, namely the output vectorsAnd the corresponding teacher task layer splicing vector +.>Obtaining a soft target loss function>The expression is:
s53, comparing the vector output by the student model with a small quantity of marked data setsOne-hot representation corresponding to labeling correct category +.>Obtaining predicted loss->The expression is:
wherein,,、/>respectively correct category of independent heat indicates +.>Bit-corresponding value, student model task vector +.>Probability value of bit prediction->For the total number of categories, its value and student model task vector +.>Is uniform in length;
s54, distillation lossLoss of->、/>Weighted summation to obtain the final total lossI.e. the total loss function expression is:
Further, the sixth step specifically includes the following substeps:
s61, repeating the first step to the fifth step, and minimizing the loss function by using a gradient descent algorithmNeural network parameters for student model +.>Nonlinear transformation neural network parameters->Updating;
s62, when model loss functionLowering to threshold->Or after the iteration reaches the preset times, finishing training; fixing the student model parameters->The student model is the target model.
An industrial anomaly detection model training device based on multi-model fusion comprises one or more processors, and is used for realizing the industrial anomaly detection model training method based on multi-model fusion.
A computer readable storage medium having stored thereon a program which, when executed by a processor, implements the method for training an industrial anomaly detection model based on multi-model fusion.
Compared with other methods, the method has the following advantages:
1. by adopting a multi-model fusion method, information of a plurality of teacher models is fused, and the existing models can be reused to obtain models capable of identifying more abnormal categories;
2. the training can be completed through a large amount of non-labeling data under the condition of not contacting with training data of a teacher model, and the dependence on an industrial labeling data set during training of an industrial anomaly detection model is reduced;
3. compared with the traditional integrated learning method, all teacher models are not required to participate in calculation during prediction, and once training is completed, only a single student model is required to be used for prediction, so that the consumption of calculation resources can be reduced;
4. the target model can realize the identification and prediction of the new abnormal type through a small amount of marked new abnormal type training data.
Drawings
FIG. 1 is a schematic flow chart of an industrial anomaly detection model training method based on multi-model fusion;
fig. 2 is a schematic structural diagram of an industrial anomaly detection model training device based on multi-model fusion according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and technical effects of the present invention more apparent, the present invention will be further described in detail with reference to the drawings and examples of the specification.
In this embodiment, the detection of the industrial abnormal signal needs to correctly identify the type of the equipment fault according to the sensor signal, and in this scenario, the first teacher model may identify 6 different types of abnormalities, and the other type is "normal state" (7 types in total); the second teacher model can identify 7 different types of anomalies, another is "normal state" (8 categories total); the existing dataset contains an abnormal type which does not appear in multiple types, and the problems of the scene can be abstracted into multi-classification problems; the data sets of the different periods are not already available; the data currently available are large amounts of unlabeled sensor data collected recently, and small amounts of manually labeled data.
Based on the above examples, as shown in fig. 1, the method for training the industrial anomaly detection model based on multi-model fusion provided by the invention comprises the following steps:
step one, preprocessing is performed after sensor data are acquired.
Wherein, the sensor data specifically is: assuming that there isA plurality of sensors selected to +.>The sensor data for the length and the time window size is a 2D time series matrix +.>
Wherein each column is data over a time step, for each time step:
wherein each row of the array is data acquired by a single sensor within a time window, < >>Is->The individual sensors are at the moment->Is a reading of (2); likewise, for the sensor +.>The time sequence within the selected time window is:
one embodiment of the invention is to use a singleThe LSTM network of layer is regarded as the data processing module of the sensor, carry on the preconditioning to the sensor data, namely: sensor dataInputting the sensor characteristic tensor into the LSTM network, and calculating to obtain the sensor characteristic tensor asWherein->Is the dimension of the LSTM network layer output tensor hidden layer.
And step two, respectively inputting the sensor characteristic tensor obtained by preprocessing into a plurality of teacher models and student models, and obtaining the characteristics output by each network layer in the models, wherein the characteristics comprise middle layer tensors and task layer vectors.
Specifically, the sensor characteristic tensorRespectively inputting a plurality of teacher models and student models, and calculating middle layer tensors and task layer vectors of the teacher models and the student models; the teacher models are pre-trained models, and the neural network parameters of the models are fixed; the plurality of teacher models are consistent with the input form of the student models, wherein the input form is consistent with the characteristics, the format and the dimension of input data used by the models; the categories output by the teacher model and the student model among the plurality of teacher models may not be identical, but the same categories exist; the parameters of the student model need to be determined through subsequent steps in an iterative manner; the middle layer tensor refers to output results of all the neural network layers except the last layer in the model; the task layer vector refers to a class probability vector of the output of the last layer of the model.
In the embodiment of the invention, the plurality of teacher models are models with two different structures, and comprise a teacher model 1 and a teacher model 2, wherein the teacher model 1 consists of four convolutional neural network CNN layers and a full connection layer, and the output of the models is probability distribution of 7 categories; the teacher model 2 consists of two superimposed long-short-time memory LSTM layers and a full-connection layer, and the output of the model is probability distribution of 8 categories; the student model consists of three Self-attention layers and a full-connection layer; a new data set has a new type of anomaly; the output of the student model is a probability distribution of 16 categories, where 16 categories are the categories in the two teacher models that are not deduplicated, plus the total number of categories for the new category in the dataset.
The second step specifically comprises the following substeps:
s21, tensor of sensor characteristicsRespectively input pre-trained +.>A teacher model for->A plurality of teacher models, each model having +.>A plurality of intermediate layers; for->No. H of the personal model>Intermediate layers, the output intermediate layer tensor isThe method comprises the steps of carrying out a first treatment on the surface of the Calculating to obtain the total->Middle layer tensors of the individual teacher models;
in the embodiment of the invention, for four CNN layers of the teacher model 1, the input sensor characteristic tensorCalculating to obtain correspondingThe middle layer tensors of the teacher model are respectively:
a first layer:
a second layer:
third layer:
fourth layer:
wherein,,、/>、/>、/>nonlinear transformation corresponding to four CNN layers in the teacher model 1 is respectively carried out; />、/>、/>Respectively teacher model 1->Layer intermediate layer tensor width, height,Dimension in depth direction, & gt> =(1,2,3,4);
For the two LSTM layers of teacher model 2, the tensor is characterized by the sensorThe intermediate layer tensors of the corresponding teacher model are calculated as follows:
a first layer:
a second layer:
wherein,,、/>respectively corresponding nonlinear transformation of two LSTM layers in the teacher model 2;、/>respectively teacher model 2->Dimension in the width and height directions of the tensor of the layer intermediate layer.
S22, tensor of sensor characteristicsInputting a student model, for the +.>Intermediate layer, calculated as->Intermediate layer tensor of student model of layer->;
In this embodiment, for two Self-saturation layers of the student model, the tensor is characterized by the sensorThe intermediate layer tensors of the corresponding student models are obtained through calculation respectively:
a first layer:
a second layer:
wherein,,、/>respectively carrying out nonlinear transformation corresponding to two Self-saturation layers in the student model; />、/>Respectively student models->Dimension in the width and height directions of the tensor of the layer intermediate layer.
S23, for the firstThe final layer of each teacher model is used for calculating and obtaining teacher task vector +.>The method comprises the steps of carrying out a first treatment on the surface of the For the final layer of the student model, the student task vector is calculated>The method comprises the steps of carrying out a first treatment on the surface of the The student task vector->The dimension of the task vector is equal to the sum of the dimensions of all teacher model tasks, and the newly-appearing category number in the data set is added;
in this embodiment, for the teacher model 1, the teacher model 2, and the student models, the corresponding task layer vectors are respectively:
the dimension of the task layer vector of the student model is the sum of the dimensions of the task layer vectors of all teacher models, 15 dimensions are added, and the newly-appearing category number and 1 category in the data set are added.
Step three, mapping the middle layer tensor of the teacher model and the middle layer tensor of the student model into a teacher public space tensor and a student public space tensor respectively, wherein the dimensions of the teacher public space tensor and the student public space tensor are the same, and the method comprises the following substeps:
s31, converting the middle layer tensor of the teacher model into a teacher public space tensor with the same dimension, for the firstNo. H of the teacher model>Individual layers with corresponding tensor of teacher's public space>Wherein->Is a convolution of 1*1 convolution kernelA nonlinear transformation consisting of a neural network layer and a convolutional neural network layer of the teacher model;
taking the middle layer tensor of the second layer of the teacher model 1 as an example, the expression of the corresponding teacher public space tensor is:
wherein,,、/>the convolution transformation of the second layer tensor and the convolution transformation of the 1*1 convolution kernel of the teacher model 1 are respectively; />、/>The corresponding dimension numbers of the public space tensor in the width direction and the height direction are respectively;
s32, if nonlinear transformationParameter->Is fixed, the total ∈is calculated>Personal teacher public space tensor->The method comprises the steps of carrying out a first treatment on the surface of the In this embodiment, the public space tensors of all teachers are calculated as follows:
wherein, for the transformation of the middle layer tensor of the teacher model into the teacher public space tensorIf the neural network parameters of the teacher model +.>If not, obtaining the neural network parameters through the following steps of content training:
S33, regarding the firstNo. H of the teacher model>Teacher public space tensor corresponding to each layer>Transformation by nonlinear reconstruction->Mapping it to middle layer tensor +.>Teacher intermediate layer reconstruction tensor with same dimensionThe expression is:
wherein the nonlinear reconstruction transformationIs composed of two layers of convolutional neural networks, +.>、Two convolution transforms in the nonlinear reconstruction transform respectively;
s34, comparing the intermediate layer tensor of the teacher modelReconstructing tensor with teacher interlayer>Calculate reconstruction error +.>The expression is:
wherein,,reconstruction of the loss function by gradient descent method to minimize +.>Updating nonlinear transformation->Parameter of->Until reconstruction error->Less than threshold->Or the iteration step number is satisfied, the parameter is fixed +.>。
S35, converting the middle layer tensor of the student model into a student public space tensor with the same dimension; in an embodiment, the intermediate layer tensor of the student model、/>The method is converted into a student public space tensor, and the expression is as follows:
intermediate layer tensor of the student model>、/>The dimension of the transformed student public space tensor is consistent with that of the teacher public space tensor;
for the firstIndividual layers (I)>The method comprises the steps of carrying out a first treatment on the surface of the Wherein->The nonlinear transformation is realized by a neural network layer, taking the middle layer tensor of a second layer of a student model as an example:
wherein,,、/>the convolution transformation is corresponding to the second layer tensor of the student model and the convolution transformation is corresponding to the 1*1 convolution kernel; />、/>The corresponding dimension of the public space tensor in the height direction is wide and is consistent with the dimension of the public space tensor of the teacher.
Step four, obtaining and averaging all the teacher public space tensors in a weighted manner according to the attention coefficient of each teacher public space tensor to obtain a teacher weighted tensor corresponding to the student public space tensor, and transversely splicing all the teacher model task layer vectors into a one-dimensional teacher task layer splicing vector, wherein the method specifically comprises the following substeps:
s41, based on the firstLayered student public space tensor->First->Personal teacher model->Layer teacher public space tensorThrough the attention mechanism, the attention coefficient of the tensor of the teacher public space is obtained>The expression is:
s42, according to the attention coefficientWeighted average of all teacher public space tensors to obtain +.>Layer-corresponding teacher-weighted tensor->The expression is:
s43, splicing all task layer vectors of the teacher model into one-dimensional teacher task layer splicing vectorWherein if +_appears in the annotation dataset>New abnormal categories are spliced on the teacher task layer splicing vector with a length of +.>To obtain a new teacher task layer splicing vector, and the expression is:
wherein,,for vector concatenation operations, ++>For length +.>All zero vectors of (2), here, +.>The number of anomaly categories that are newly present in the dataset.
Step five, comparing the public space tensor of the students with the corresponding teacher weighting tensor to obtain distillation loss; comparing the task layer vector of the student model with the teacher task layer splicing vector to obtain task loss; comparing the label marked by the data set with the task layer vector of the student model to obtain the prediction loss for a small amount of marked data sets; based on the distillation loss, the task loss and the predicted loss, obtaining total loss, specifically comprising the following substeps:
s51, comparing the student public space tensor of the kth layer of the student modelThe kth teacher-weighted tensor corresponding thereto>Obtaining distillation loss->The expression is:
wherein,,for the number of intermediate layers of the student model, +.>As a mean square error loss function, the expression is:
s52, comparing the task layer vectors of the student model, namely the output vectorsAnd the corresponding teacher task layer splicing vector +.>Obtaining a soft target loss function>The expression is:
s53, comparing the vector output by the student model with a small quantity of marked data setsOne-hot representation corresponding to labeling correct category +.>Obtaining predicted loss->The expression is:
wherein,,、/>respectively correct category of independent heat indicates +.>Bit-corresponding value, student model task vector +.>Probability values for bit predictions;
s54, distillation lossLoss of->、/>Weighted summation to obtain the most significantTotal loss of endI.e. the total loss function expression is:
Step six, repeating the step one to the step five, minimizing the total loss, updating the neural network parameters of the student model until converging and fixing the neural network parameters of the student model, obtaining a target model, and completing training, wherein the method specifically comprises the following substeps:
s61, repeating the first step to the fifth step, and minimizing the loss function by using a gradient descent algorithmNeural network parameters for student model +.>Nonlinear transformation neural network parameters->Updating;
s62, when model loss functionLowering to threshold->Or stack ofAfter the times of generation reach a certain number of times (10000 steps), training is finished; fixing the student model parameters->The student model is the target model.
Corresponding to the embodiment of the industrial anomaly detection model training method based on multi-model fusion, the invention also provides an embodiment of the industrial anomaly detection model training device based on multi-model fusion.
Referring to fig. 2, an industrial anomaly detection model training device based on multi-model fusion according to an embodiment of the present invention includes one or more processors configured to implement the industrial anomaly detection model training method based on multi-model fusion in the above embodiment.
The embodiment of the industrial anomaly detection model training method based on multi-model fusion can be applied to any equipment with data processing capability, and the equipment with data processing capability can be equipment or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 2, a hardware structure diagram of an apparatus with any data processing capability where the industrial anomaly detection model training device based on multi-model fusion is located is shown in fig. 2, and in addition to a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 2, the apparatus with any data processing capability where the apparatus is located in an embodiment generally includes other hardware according to an actual function of the apparatus with any data processing capability, which is not described herein again.
The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The embodiment of the invention also provides a computer readable storage medium, wherein a program is stored on the computer readable storage medium, and when the program is executed by a processor, the industrial anomaly detection model training method based on multi-model fusion in the embodiment is realized.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may also be an external storage device, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Although the foregoing detailed description of the invention has been provided, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing examples, and that certain features may be substituted for those illustrated and described herein. Modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (5)
1. The industrial anomaly detection model training method based on multi-model fusion is characterized by comprising the following steps of:
step one, preprocessing after acquiring sensor data;
step two, respectively inputting the sensor characteristic tensor obtained by preprocessing into a plurality of teacher models and student models, and obtaining the characteristics output by each network layer in the models, wherein the characteristics comprise middle layer tensors and task layer vectors;
step three, mapping the middle layer tensor of the teacher model and the middle layer tensor of the student model into a teacher public space tensor and a student public space tensor respectively;
step four, obtaining and averaging all the teacher public space tensors in a weighted manner according to the attention coefficient of each teacher public space tensor to obtain a teacher weighted tensor corresponding to the student public space tensor, and transversely splicing all the teacher model task layer vectors into one-dimensional teacher task layer splicing vector;
step five, comparing the public space tensor of the students with the corresponding teacher weighting tensor to obtain distillation loss; comparing the task layer vector of the student model with the teacher task layer splicing vector to obtain task loss; comparing the label marked by the data set with the task layer vector of the student model to obtain a prediction loss; obtaining a total loss based on the distillation loss, the mission loss and the predicted loss;
step six, repeating the step one to the step five, minimizing the total loss, updating the neural network parameters of the student model until the neural network parameters of the student model are converged and fixed, obtaining a target model, and finishing training;
the first step is specifically as follows: converting sensor data into sensor feature tensors using a single-layer LSTM networkWherein->Is the time window size of the sensor data, i.e. the data length,/->Hidden layer dimensions that are sensor feature tensors;
the second step specifically comprises the following substeps:
s21, tensor of sensor characteristicsRespectively input pre-trained +.>A teacher model for->A plurality of teacher models, each model having +.>A plurality of intermediate layers; for->No. H of the personal model>Intermediate layers whose output intermediate layer tensor is +.>The method comprises the steps of carrying out a first treatment on the surface of the Calculating to obtain the total->Middle layer tensors of the individual teacher models;
s22, tensor of sensor characteristicsInputting a student model, for the +.>Intermediate layer, calculated as->Intermediate layer tensor of student model of layer->;
S23, for the firstThe final layer of each teacher model is used for calculating and obtaining teacher task vector +.>The method comprises the steps of carrying out a first treatment on the surface of the For the final layer of the student model, the student task vector is calculated>The method comprises the steps of carrying out a first treatment on the surface of the The student task vector->The dimension of the task vector is equal to the sum of the dimensions of all teacher model tasks, and the newly-appearing category number in the data set is added;
the third step specifically comprises the following substeps:
s31, converting the middle layer tensor of the teacher model into a teacher public space tensor with the same dimension, for the firstNo. H of the teacher model>Individual layers with corresponding tensor of teacher's public space>Wherein->Representing a nonlinear transformation implemented by a convolutional neural network layer;
s32, if nonlinear transformationParameter->Is fixed, the total ∈is calculated>Personal teacher public space tensor->The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, update the nonlinear transformation by step S33, step S34->Parameter->;
S33, regarding the firstNo. H of the teacher model>Teacher public space tensor corresponding to each layer>By non-linear transformationMapping it to middle layer tensor +.>Teacher middle layer reconstruction tensor with same dimension>;
S34, comparing the intermediate layer tensor of the teacher modelReconstructing tensor with teacher interlayer>Calculate reconstruction error +.>:
Wherein,,reconstructing a loss function; minimizing +.>Update nonlinear transformation->Parameters of (2)Until reconstruction error->Less than threshold->Or the iteration step number is satisfied, the parameter is fixed +.>;
S35, converting the intermediate layer tensor of the student model into student public air with the same dimensionTensor of the firstThe number of layers of the composite material,the method comprises the steps of carrying out a first treatment on the surface of the Wherein->Is a nonlinear transformation realized by a neural network layer, and the parameters are +.>The method comprises the steps of carrying out a first treatment on the surface of the The student public space tensor with the same dimension is consistent with the teacher public space tensor in dimension in the step S31;
the fourth step comprises the following substeps:
s41, based on the firstLayered student public space tensor->First->Personal teacher model->Teacher public space tensor of layer>Through the attention mechanism, the attention coefficient of the tensor of the teacher public space is obtained>The expression is:
s42, according to the attention coefficientWeighted average of all teacher public space tensors to obtain +.>Layer-corresponding teacher-weighted tensor->The expression is:
s43, splicing all task layer vectors of the teacher model into one-dimensional teacher task layer splicing vectorWherein if +_appears in the annotation dataset>New abnormal categories are spliced on the teacher task layer splicing vector with a length of +.>To obtain a new teacher task layer splicing vector, and the expression is:
2. The industrial anomaly detection model training method based on multi-model fusion according to claim 1, wherein the fifth step specifically comprises the following sub-steps:
s51, comparing the student public space tensor of the kth layer of the student modelThe kth teacher weighted tensor corresponding to the sameObtaining distillation loss->The expression is:
wherein,,for the number of intermediate layers of the student model, +.>As a mean square error loss function, the expression is:
s52, comparing the task layer vectors of the student model, namely the output vectorsAnd the corresponding teacher task layer splicing vector +.>Obtaining a soft target loss function>The expression is:
s53, comparing the vector output by the student model with a small quantity of marked data setsOne-hot representation corresponding to labeling correct category +.>Obtaining predicted loss->The expression is:
wherein,,、/>respectively correct category of independent heat indicates +.>Bit-corresponding value, student model task vector +.>Probability value of bit prediction->For the total number of categories, its value and student model task vector +.>Is uniform in length;
s54, distillation lossLoss of->、/>Weighted summation, resulting in a final total loss +.>I.e. the total loss function expression is:
3. The industrial anomaly detection model training method based on multi-model fusion according to claim 2, wherein the step six specifically comprises the following sub-steps:
s61, repeating the first step to the fifth step, and minimizing the loss function by using a gradient descent algorithmNeural network parameters for student model +.>Nonlinear transformation neural network parameters->Updating;
4. An industrial anomaly detection model training device based on multi-model fusion, comprising one or more processors configured to implement the industrial anomaly detection model training method based on multi-model fusion of any one of claims 1 to 3.
5. A computer-readable storage medium, having stored thereon a program which, when executed by a processor, implements an industrial anomaly detection model training method based on multi-model fusion as claimed in any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310123067.3A CN116028891B (en) | 2023-02-16 | 2023-02-16 | Industrial anomaly detection model training method and device based on multi-model fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310123067.3A CN116028891B (en) | 2023-02-16 | 2023-02-16 | Industrial anomaly detection model training method and device based on multi-model fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116028891A CN116028891A (en) | 2023-04-28 |
CN116028891B true CN116028891B (en) | 2023-07-14 |
Family
ID=86091403
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310123067.3A Active CN116028891B (en) | 2023-02-16 | 2023-02-16 | Industrial anomaly detection model training method and device based on multi-model fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116028891B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117668622B (en) * | 2024-02-01 | 2024-05-10 | 山东能源数智云科技有限公司 | Training method of equipment fault diagnosis model, fault diagnosis method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408570A (en) * | 2021-05-08 | 2021-09-17 | 浙江智慧视频安防创新中心有限公司 | Image category identification method and device based on model distillation, storage medium and terminal |
CN114170478A (en) * | 2021-12-09 | 2022-03-11 | 中山大学 | Defect detection and positioning method and system based on cross-image local feature alignment |
CN114240892A (en) * | 2021-12-17 | 2022-03-25 | 华中科技大学 | Unsupervised industrial image anomaly detection method and system based on knowledge distillation |
CN115346207A (en) * | 2022-08-03 | 2022-11-15 | 北京交通大学 | Method for detecting three-dimensional target in two-dimensional image based on example structure correlation |
CN115471645A (en) * | 2022-11-15 | 2022-12-13 | 南京信息工程大学 | Knowledge distillation anomaly detection method based on U-shaped student network |
CN115526332A (en) * | 2022-08-17 | 2022-12-27 | 阿里巴巴(中国)有限公司 | Student model training method and text classification system based on pre-training language model |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110162799B (en) * | 2018-11-28 | 2023-08-04 | 腾讯科技(深圳)有限公司 | Model training method, machine translation method, and related devices and equipment |
US11487944B1 (en) * | 2019-12-09 | 2022-11-01 | Asapp, Inc. | System, method, and computer program for obtaining a unified named entity recognition model with the collective predictive capabilities of teacher models with different tag sets using marginal distillation |
CN111160409A (en) * | 2019-12-11 | 2020-05-15 | 浙江大学 | Heterogeneous neural network knowledge reorganization method based on common feature learning |
CN113052768B (en) * | 2019-12-27 | 2024-03-19 | 武汉Tcl集团工业研究院有限公司 | Method, terminal and computer readable storage medium for processing image |
CN111611377B (en) * | 2020-04-22 | 2021-10-29 | 淮阴工学院 | Knowledge distillation-based multi-layer neural network language model training method and device |
US20220076136A1 (en) * | 2020-09-09 | 2022-03-10 | Peyman PASSBAN | Method and system for training a neural network model using knowledge distillation |
CN112116030B (en) * | 2020-10-13 | 2022-08-30 | 浙江大学 | Image classification method based on vector standardization and knowledge distillation |
CN112418343B (en) * | 2020-12-08 | 2024-01-05 | 中山大学 | Multi-teacher self-adaptive combined student model training method |
CN112801209B (en) * | 2021-02-26 | 2022-10-25 | 同济大学 | Image classification method based on dual-length teacher model knowledge fusion and storage medium |
CN114241282B (en) * | 2021-11-04 | 2024-01-26 | 河南工业大学 | Knowledge distillation-based edge equipment scene recognition method and device |
CN114067819B (en) * | 2021-11-22 | 2024-06-21 | 南京工程学院 | Speech enhancement method based on cross-layer similarity knowledge distillation |
CN114936605A (en) * | 2022-06-09 | 2022-08-23 | 五邑大学 | Knowledge distillation-based neural network training method, device and storage medium |
CN115481316A (en) * | 2022-09-01 | 2022-12-16 | 贵州大学 | Multi-model fusion knowledge distillation recommendation model |
CN115690708A (en) * | 2022-10-21 | 2023-02-03 | 苏州轻棹科技有限公司 | Method and device for training three-dimensional target detection model based on cross-modal knowledge distillation |
-
2023
- 2023-02-16 CN CN202310123067.3A patent/CN116028891B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408570A (en) * | 2021-05-08 | 2021-09-17 | 浙江智慧视频安防创新中心有限公司 | Image category identification method and device based on model distillation, storage medium and terminal |
CN114170478A (en) * | 2021-12-09 | 2022-03-11 | 中山大学 | Defect detection and positioning method and system based on cross-image local feature alignment |
CN114240892A (en) * | 2021-12-17 | 2022-03-25 | 华中科技大学 | Unsupervised industrial image anomaly detection method and system based on knowledge distillation |
CN115346207A (en) * | 2022-08-03 | 2022-11-15 | 北京交通大学 | Method for detecting three-dimensional target in two-dimensional image based on example structure correlation |
CN115526332A (en) * | 2022-08-17 | 2022-12-27 | 阿里巴巴(中国)有限公司 | Student model training method and text classification system based on pre-training language model |
CN115471645A (en) * | 2022-11-15 | 2022-12-13 | 南京信息工程大学 | Knowledge distillation anomaly detection method based on U-shaped student network |
Non-Patent Citations (1)
Title |
---|
基于组合神经网络的教师评价模型研究;刘彩红;唐万梅;;重庆师范大学学报(自然科学版)(第04期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN116028891A (en) | 2023-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tkachenko et al. | Model and principles for the implementation of neural-like structures based on geometric data transformations | |
Oh et al. | A tutorial on quantum convolutional neural networks (QCNN) | |
Jia et al. | Quantum neural network states: A brief review of methods and applications | |
US9361586B2 (en) | Method and system for invariant pattern recognition | |
Furukawa | SOM of SOMs | |
Yuan et al. | Quantum image edge detection algorithm | |
CN116028891B (en) | Industrial anomaly detection model training method and device based on multi-model fusion | |
CN113821668A (en) | Data classification identification method, device, equipment and readable storage medium | |
CN116206158A (en) | Scene image classification method and system based on double hypergraph neural network | |
Lee et al. | Application of domain-adaptive convolutional variational autoencoder for stress-state prediction | |
Egbo et al. | Forecasting students’ enrollment using neural networks and ordinary least squares regression models | |
Rai | Advanced deep learning with R: Become an expert at designing, building, and improving advanced neural network models using R | |
Chen et al. | Total variation based tensor decomposition for multi‐dimensional data with time dimension | |
CN116128575A (en) | Item recommendation method, device, computer apparatus, storage medium, and program product | |
Zhang et al. | The Role of Knowledge Creation‐Oriented Convolutional Neural Network in Learning Interaction | |
Christiansen et al. | Optimization of neural networks for time-domain simulation of mooring lines | |
JP7118882B2 (en) | Variable transformation device, latent parameter learning device, latent parameter generation device, methods and programs thereof | |
CN116992937A (en) | Neural network model restoration method and related equipment | |
CN113496119B (en) | Method, electronic device and computer readable medium for extracting metadata in table | |
CN114511092A (en) | Graph attention mechanism implementation method based on quantum circuit | |
JP7047665B2 (en) | Learning equipment, learning methods and learning programs | |
CN112232261A (en) | Method and device for fusing image sequences | |
CN114998990B (en) | Method and device for identifying safety behaviors of personnel on construction site | |
Sundararaghavan et al. | Methodology for estimation of intrinsic dimensions and state variables of microstructures | |
Zhang et al. | Calibrated multivariate regression networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |