CN116028891B

CN116028891B - Industrial anomaly detection model training method and device based on multi-model fusion

Info

Publication number: CN116028891B
Application number: CN202310123067.3A
Authority: CN
Inventors: 刘通; 郏维强; 王玉柱; 韩松岭; 张梦璘
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-02-16
Filing date: 2023-02-16
Publication date: 2023-07-14
Anticipated expiration: 2043-02-16
Also published as: CN116028891A

Abstract

The invention discloses an industrial anomaly detection model training method and device based on multi-model fusion, wherein the method comprises the following steps: step one, preprocessing after acquiring sensor data; step two, respectively inputting the sensor characteristic tensor obtained by pretreatment into a plurality of teacher models and student models, and obtaining the characteristics of each network layer in the models; mapping the middle layer tensor in the characteristics into a public space tensor; step four, carrying out weighted average on public space tensors of all teacher models to obtain teacher weighted tensors corresponding to the public space tensors of students, and transversely splicing task layer vectors of all the teacher models into a teacher task layer splicing vector; step five, obtaining distillation loss, task loss and prediction loss of the model, and obtaining total loss by weighting and summing; and step six, repeating the steps, minimizing total loss, updating the parameters of the neural network of the student model until convergence, finally fixing the parameters of the neural network of the student model, obtaining a target model, and finishing training.

Description

Industrial anomaly detection model training method and device based on multi-model fusion

Technical Field

The invention relates to the field of industrial equipment anomaly detection, in particular to an industrial anomaly detection model training method and device based on multi-model fusion.

Background

In the industrial field, correctly identifying the type of equipment anomaly helps the operation and maintenance personnel to lock the problem more quickly, so as to take corresponding measures in time. With the widespread use of industrial sensors, a large number of monitoring data for critical devices can be collected. The data-driven abnormality detection method has been developed, and by monitoring sensor data in real time, it is possible to dynamically identify whether an abnormality occurs in the apparatus, and to identify the type of the abnormality.

The deep learning neural network-based industrial abnormality detection method is gaining attention, and the deep learning-based industrial abnormality detection method has the following advantages: 1. the dependence on the characteristic engineering is less, and the end-to-end training can be realized; 2. the model has flexible structure and strong fitting capability, and can extract complex modes in data; however, the deep learning method has higher requirements on the labeled data set, and a larger amount of labeled data is often required to achieve a better prediction effect.

In the field of industrial anomaly detection, the difficulty of data annotation is high, and marked data is generally difficult to obtain; in addition, industrial data relates to data security and business confidentiality, equipment operation data of different factories and departments cannot be shared, and original data is difficult to obtain; in addition, the industrial equipment structure and the operation environment are complex, and it is difficult to grasp all abnormal types at first; there is therefore a need for an iterative model to take into account the newly discovered and defined anomaly types.

Typically, multiple models are trained for the same model of device, different factories, or the same factory over different historic periods; the prediction effect can be effectively improved by utilizing the existing model, wherein the effect of the integrated model can be improved by integrating a plurality of sub-models through traditional integrated learning; however, the ensemble learning method has the following problems: 1. all the submodels need to participate in calculation, and when the number of the submodels is large, the calculation pressure is obviously increased; 2. all sub-models are generally required to classify several identical categories, whereas in the field of industrial anomaly detection new anomaly types often occur, the anomaly categories supported by the different period models differ.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention provides an industrial anomaly detection model training method and device based on multi-model fusion, and the specific technical scheme is as follows:

an industrial anomaly detection model training method based on multi-model fusion comprises the following steps:

step one, preprocessing after acquiring sensor data;

step two, respectively inputting the sensor characteristic tensor obtained by preprocessing into a plurality of teacher models and student models, and obtaining the characteristics output by each network layer in the models, wherein the characteristics comprise middle layer tensors and task layer vectors;

step three, mapping the middle layer tensor of the teacher model and the middle layer tensor of the student model into a teacher public space tensor and a student public space tensor respectively;

step four, obtaining and averaging all the teacher public space tensors in a weighted manner according to the attention coefficient of each teacher public space tensor to obtain a teacher weighted tensor corresponding to the student public space tensor, and transversely splicing all the teacher model task layer vectors into one-dimensional teacher task layer splicing vector;

step five, comparing the public space tensor of the students with the corresponding teacher weighting tensor to obtain distillation loss; comparing the task layer vector of the student model with the teacher task layer splicing vector to obtain task loss; comparing the label marked by the data set with the task layer vector of the student model to obtain a prediction loss; obtaining a total loss based on the distillation loss, the mission loss and the predicted loss;

and step six, repeating the step one to the step five, minimizing the total loss, updating the neural network parameters of the student model until the neural network parameters of the student model are converged and fixed, obtaining a target model, and finishing training.

Further, the first step specifically comprises: converting sensor data into sensor feature tensors using a single-layer LSTM network

Wherein->

Is the time window size of the sensor data, i.e. the data length,/->

Hidden layer dimensions for sensor feature tensors.

Further, the second step specifically includes the following substeps:

s21, tensor of sensor characteristics

Respectively input pre-trained +.>

A teacher model for->

A plurality of teacher models, each model having +.>

A plurality of intermediate layers; for->

No. H of the personal model>

Intermediate layers, the output intermediate layer tensor is

The method comprises the steps of carrying out a first treatment on the surface of the Calculating to obtain the total->

Middle layer tensors of the individual teacher models;

s22, tensor of sensor characteristics

Inputting a student model, for the +.>

Intermediate layer, calculated as->

Intermediate layer tensor of student model of layer->

；

S23, for the first

The final layer of each teacher model is used for calculating and obtaining teacher task vector +.>

The method comprises the steps of carrying out a first treatment on the surface of the For the final layer of the student model, the student task vector is calculated>

The method comprises the steps of carrying out a first treatment on the surface of the The student task vector->

Is equal to the sum of all teacher model task vector dimensions plus the number of categories that are newly present in the dataset.

Further, the third step specifically includes the following substeps:

s31, converting the middle layer tensor of the teacher model into a teacher public space tensor with the same dimension, for the first

No. H of the teacher model>

Individual layers with corresponding tensor of teacher's public space>

Wherein->

Representing a nonlinear transformation implemented by a convolutional neural network layer;

s32, if nonlinear transformation

Parameter->

Is a solidIf yes, calculate to get the total->

Personal teacher public space tensor->

The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, update the nonlinear transformation by step S33, step S34->

Parameter->

；

S33, regarding the first

No. H of the teacher model>

Teacher public space tensor corresponding to each layer>

By nonlinear transformation->

Mapping it to middle layer tensor +.>

Teacher middle layer reconstruction tensor with same dimension>

；

S34, comparing the intermediate layer tensor of the teacher model

Reconstructing tensor with teacher interlayer>

Calculate reconstruction error +.>

：

Wherein,,

reconstructing a loss function; minimizing +.>

Update nonlinear transformation->

Parameter of->

Until reconstruction error->

Less than threshold->

Or the iteration step number is satisfied, the parameter is fixed +.>

；

S35, converting the middle layer tensor of the student model into a student public space tensor with the same dimension, for the first

Individual layers (I)>

The method comprises the steps of carrying out a first treatment on the surface of the Wherein->

Is a nonlinear transformation realized by a neural network layer, and the parameters are +.>

The method comprises the steps of carrying out a first treatment on the surface of the The dimensions of the student public space tensors are the same as those of the teacher public space tensor in the step S31.

Further, the step four specifically includes the following substeps:

s41, based on the first

Layered student public space tensor->

First->

Personal teacher model->

Layer teacher public space tensor

Through the attention mechanism, the attention coefficient of the tensor of the teacher public space is obtained>

The expression is:

；

s42, according to the attention coefficient

Weighted average of all teacher public space tensors to obtain +.>

Layer-corresponding teacher-weighted tensor->

The expression is:

；

s43, splicing all task layer vectors of the teacher model into one-dimensional teacher task layer splicing vector

Wherein if +_appears in the annotation dataset>

New abnormal categories are spliced on the teacher task layer splicing vector with a length of +.>

To obtain a new teacher task layer splicing vector, and the expression is:

，

wherein,,

for vector concatenation operations, ++>

For length +.>

Is zero vector,/->

The number of anomaly categories that are newly present in the dataset.

Further, the fifth step specifically includes the following substeps:

s51, comparing the student public space tensor of the kth layer of the student model

The kth teacher-weighted tensor corresponding thereto>

Obtaining distillation loss->

The expression is:

wherein,,

as a loss function;

wherein,,

for the number of intermediate layers of the student model, +.>

As a mean square error loss function, the expression is:

；

s52, comparing the task layer vectors of the student model, namely the output vectors

And the corresponding teacher task layer splicing vector +.>

Obtaining a soft target loss function>

The expression is:

；

s53, comparing the vector output by the student model with a small quantity of marked data sets

One-hot representation corresponding to labeling correct category +.>

Obtaining predicted loss->

The expression is:

，

wherein,,

、/>

respectively correct category of independent heat indicates +.>

Bit-corresponding value, student model task vector +.>

Probability value of bit prediction->

For the total number of categories, its value and student model task vector +.>

Is uniform in length;

s54, distillation loss

Loss of->

、/>

Weighted summation to obtain the final total loss

I.e. the total loss function expression is:

，

wherein,,

、/>

is a super parameter.

Further, the sixth step specifically includes the following substeps:

s61, repeating the first step to the fifth step, and minimizing the loss function by using a gradient descent algorithm

Neural network parameters for student model +.>

Nonlinear transformation neural network parameters->

Updating;

s62, when model loss function

Lowering to threshold->

Or after the iteration reaches the preset times, finishing training; fixing the student model parameters->

The student model is the target model.

An industrial anomaly detection model training device based on multi-model fusion comprises one or more processors, and is used for realizing the industrial anomaly detection model training method based on multi-model fusion.

A computer readable storage medium having stored thereon a program which, when executed by a processor, implements the method for training an industrial anomaly detection model based on multi-model fusion.

Compared with other methods, the method has the following advantages:

1. by adopting a multi-model fusion method, information of a plurality of teacher models is fused, and the existing models can be reused to obtain models capable of identifying more abnormal categories;

2. the training can be completed through a large amount of non-labeling data under the condition of not contacting with training data of a teacher model, and the dependence on an industrial labeling data set during training of an industrial anomaly detection model is reduced;

3. compared with the traditional integrated learning method, all teacher models are not required to participate in calculation during prediction, and once training is completed, only a single student model is required to be used for prediction, so that the consumption of calculation resources can be reduced;

4. the target model can realize the identification and prediction of the new abnormal type through a small amount of marked new abnormal type training data.

Drawings

FIG. 1 is a schematic flow chart of an industrial anomaly detection model training method based on multi-model fusion;

fig. 2 is a schematic structural diagram of an industrial anomaly detection model training device based on multi-model fusion according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and technical effects of the present invention more apparent, the present invention will be further described in detail with reference to the drawings and examples of the specification.

In this embodiment, the detection of the industrial abnormal signal needs to correctly identify the type of the equipment fault according to the sensor signal, and in this scenario, the first teacher model may identify 6 different types of abnormalities, and the other type is "normal state" (7 types in total); the second teacher model can identify 7 different types of anomalies, another is "normal state" (8 categories total); the existing dataset contains an abnormal type which does not appear in multiple types, and the problems of the scene can be abstracted into multi-classification problems; the data sets of the different periods are not already available; the data currently available are large amounts of unlabeled sensor data collected recently, and small amounts of manually labeled data.

Based on the above examples, as shown in fig. 1, the method for training the industrial anomaly detection model based on multi-model fusion provided by the invention comprises the following steps:

step one, preprocessing is performed after sensor data are acquired.

Wherein, the sensor data specifically is: assuming that there is

A plurality of sensors selected to +.>

The sensor data for the length and the time window size is a 2D time series matrix +.>

Wherein each column is data over a time step, for each time step:

；

wherein each row of the array is data acquired by a single sensor within a time window, < >>

Is->

The individual sensors are at the moment->

Is a reading of (2); likewise, for the sensor +.>

The time sequence within the selected time window is:

。

one embodiment of the invention is to use a singleThe LSTM network of layer is regarded as the data processing module of the sensor, carry on the preconditioning to the sensor data, namely: sensor data

Inputting the sensor characteristic tensor into the LSTM network, and calculating to obtain the sensor characteristic tensor as

Wherein->

Is the dimension of the LSTM network layer output tensor hidden layer.

And step two, respectively inputting the sensor characteristic tensor obtained by preprocessing into a plurality of teacher models and student models, and obtaining the characteristics output by each network layer in the models, wherein the characteristics comprise middle layer tensors and task layer vectors.

Specifically, the sensor characteristic tensor

Respectively inputting a plurality of teacher models and student models, and calculating middle layer tensors and task layer vectors of the teacher models and the student models; the teacher models are pre-trained models, and the neural network parameters of the models are fixed; the plurality of teacher models are consistent with the input form of the student models, wherein the input form is consistent with the characteristics, the format and the dimension of input data used by the models; the categories output by the teacher model and the student model among the plurality of teacher models may not be identical, but the same categories exist; the parameters of the student model need to be determined through subsequent steps in an iterative manner; the middle layer tensor refers to output results of all the neural network layers except the last layer in the model; the task layer vector refers to a class probability vector of the output of the last layer of the model.

In the embodiment of the invention, the plurality of teacher models are models with two different structures, and comprise a teacher model 1 and a teacher model 2, wherein the teacher model 1 consists of four convolutional neural network CNN layers and a full connection layer, and the output of the models is probability distribution of 7 categories; the teacher model 2 consists of two superimposed long-short-time memory LSTM layers and a full-connection layer, and the output of the model is probability distribution of 8 categories; the student model consists of three Self-attention layers and a full-connection layer; a new data set has a new type of anomaly; the output of the student model is a probability distribution of 16 categories, where 16 categories are the categories in the two teacher models that are not deduplicated, plus the total number of categories for the new category in the dataset.

The second step specifically comprises the following substeps:

s21, tensor of sensor characteristics

Respectively input pre-trained +.>

A teacher model for->

A plurality of teacher models, each model having +.>

A plurality of intermediate layers; for->

No. H of the personal model>

Intermediate layers, the output intermediate layer tensor is

Middle layer tensors of the individual teacher models;

in the embodiment of the invention, for four CNN layers of the teacher model 1, the input sensor characteristic tensor

Calculating to obtain correspondingThe middle layer tensors of the teacher model are respectively:

a first layer:

；

a second layer:

；

third layer:

；

fourth layer:

；

wherein,,

、/>

、/>

、/>

nonlinear transformation corresponding to four CNN layers in the teacher model 1 is respectively carried out; />

、/>

、/>

Respectively teacher model 1->

Layer intermediate layer tensor width, height,Dimension in depth direction, & gt>

=（1,2,3,4）；

For the two LSTM layers of teacher model 2, the tensor is characterized by the sensor

The intermediate layer tensors of the corresponding teacher model are calculated as follows:

a first layer:

；

a second layer:

；

wherein,,

、/>

respectively corresponding nonlinear transformation of two LSTM layers in the teacher model 2;

、/>

respectively teacher model 2->

Dimension in the width and height directions of the tensor of the layer intermediate layer.

S22, tensor of sensor characteristics

Inputting a student model, for the +.>

Intermediate layer, calculated as->

Intermediate layer tensor of student model of layer->

；

In this embodiment, for two Self-saturation layers of the student model, the tensor is characterized by the sensor

The intermediate layer tensors of the corresponding student models are obtained through calculation respectively:

a first layer:

；

a second layer:

；

wherein,,

、/>

respectively carrying out nonlinear transformation corresponding to two Self-saturation layers in the student model; />

、/>

Respectively student models->

S23, for the first

The dimension of the task vector is equal to the sum of the dimensions of all teacher model tasks, and the newly-appearing category number in the data set is added;

in this embodiment, for the teacher model 1, the teacher model 2, and the student models, the corresponding task layer vectors are respectively:

；

the dimension of the task layer vector of the student model is the sum of the dimensions of the task layer vectors of all teacher models, 15 dimensions are added, and the newly-appearing category number and 1 category in the data set are added.

Step three, mapping the middle layer tensor of the teacher model and the middle layer tensor of the student model into a teacher public space tensor and a student public space tensor respectively, wherein the dimensions of the teacher public space tensor and the student public space tensor are the same, and the method comprises the following substeps:

No. H of the teacher model>

Individual layers with corresponding tensor of teacher's public space>

Wherein->

Is a convolution of 1*1 convolution kernelA nonlinear transformation consisting of a neural network layer and a convolutional neural network layer of the teacher model;

taking the middle layer tensor of the second layer of the teacher model 1 as an example, the expression of the corresponding teacher public space tensor is:

，

wherein,,

、/>

the convolution transformation of the second layer tensor and the convolution transformation of the 1*1 convolution kernel of the teacher model 1 are respectively; />

、/>

The corresponding dimension numbers of the public space tensor in the width direction and the height direction are respectively;

s32, if nonlinear transformation

Parameter->

Is fixed, the total ∈is calculated>

Personal teacher public space tensor->

The method comprises the steps of carrying out a first treatment on the surface of the In this embodiment, the public space tensors of all teachers are calculated as follows:

，

wherein, for the transformation of the middle layer tensor of the teacher model into the teacher public space tensor

If the neural network parameters of the teacher model +.>

If not, obtaining the neural network parameters through the following steps of content training

：

S33, regarding the first

No. H of the teacher model>

Teacher public space tensor corresponding to each layer>

Transformation by nonlinear reconstruction->

Mapping it to middle layer tensor +.>

Teacher intermediate layer reconstruction tensor with same dimension

The expression is:

，

wherein the nonlinear reconstruction transformation

Is composed of two layers of convolutional neural networks, +.>

、

Two convolution transforms in the nonlinear reconstruction transform respectively;

s34, comparing the intermediate layer tensor of the teacher model

Reconstructing tensor with teacher interlayer>

Calculate reconstruction error +.>

The expression is:

wherein,,

reconstruction of the loss function by gradient descent method to minimize +.>

Updating nonlinear transformation->

Parameter of->

Until reconstruction error->

Less than threshold->

Or the iteration step number is satisfied, the parameter is fixed +.>

。

S35, converting the middle layer tensor of the student model into a student public space tensor with the same dimension; in an embodiment, the intermediate layer tensor of the student model

、/>

The method is converted into a student public space tensor, and the expression is as follows:

，

intermediate layer tensor of the student model>

、/>

The dimension of the transformed student public space tensor is consistent with that of the teacher public space tensor;

for the first

Individual layers (I)>

The nonlinear transformation is realized by a neural network layer, taking the middle layer tensor of a second layer of a student model as an example:

wherein,,

、/>

the convolution transformation is corresponding to the second layer tensor of the student model and the convolution transformation is corresponding to the 1*1 convolution kernel; />

、/>

The corresponding dimension of the public space tensor in the height direction is wide and is consistent with the dimension of the public space tensor of the teacher.

Step four, obtaining and averaging all the teacher public space tensors in a weighted manner according to the attention coefficient of each teacher public space tensor to obtain a teacher weighted tensor corresponding to the student public space tensor, and transversely splicing all the teacher model task layer vectors into a one-dimensional teacher task layer splicing vector, wherein the method specifically comprises the following substeps:

s41, based on the first

Layered student public space tensor->

First->

Personal teacher model->

Layer teacher public space tensor

The expression is:

；

s42, according to the attention coefficient

Weighted average of all teacher public space tensors to obtain +.>

Layer-corresponding teacher-weighted tensor->

The expression is:

；

Wherein if +_appears in the annotation dataset>

To obtain a new teacher task layer splicing vector, and the expression is:

，

wherein,,

for vector concatenation operations, ++>

For length +.>

All zero vectors of (2), here, +.>

The number of anomaly categories that are newly present in the dataset.

Step five, comparing the public space tensor of the students with the corresponding teacher weighting tensor to obtain distillation loss; comparing the task layer vector of the student model with the teacher task layer splicing vector to obtain task loss; comparing the label marked by the data set with the task layer vector of the student model to obtain the prediction loss for a small amount of marked data sets; based on the distillation loss, the task loss and the predicted loss, obtaining total loss, specifically comprising the following substeps:

The kth teacher-weighted tensor corresponding thereto>

Obtaining distillation loss->

The expression is:

，

wherein,,

for the number of intermediate layers of the student model, +.>

As a mean square error loss function, the expression is:

；

And the corresponding teacher task layer splicing vector +.>

Obtaining a soft target loss function>

The expression is:

；

One-hot representation corresponding to labeling correct category +.>

Obtaining predicted loss->

The expression is:

wherein,,

、/>

respectively correct category of independent heat indicates +.>

Bit-corresponding value, student model task vector +.>

Probability values for bit predictions;

s54, distillation loss

Loss of->

、/>

Weighted summation to obtain the most significantTotal loss of end

I.e. the total loss function expression is:

，

wherein,,

、/>

is a superparameter, let ∈ ->

、/>

。

Step six, repeating the step one to the step five, minimizing the total loss, updating the neural network parameters of the student model until converging and fixing the neural network parameters of the student model, obtaining a target model, and completing training, wherein the method specifically comprises the following substeps:

Neural network parameters for student model +.>

Nonlinear transformation neural network parameters->

Updating;

s62, when model loss function

Lowering to threshold->

Or stack ofAfter the times of generation reach a certain number of times (10000 steps), training is finished; fixing the student model parameters->

The student model is the target model.

Corresponding to the embodiment of the industrial anomaly detection model training method based on multi-model fusion, the invention also provides an embodiment of the industrial anomaly detection model training device based on multi-model fusion.

Referring to fig. 2, an industrial anomaly detection model training device based on multi-model fusion according to an embodiment of the present invention includes one or more processors configured to implement the industrial anomaly detection model training method based on multi-model fusion in the above embodiment.

The embodiment of the industrial anomaly detection model training method based on multi-model fusion can be applied to any equipment with data processing capability, and the equipment with data processing capability can be equipment or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 2, a hardware structure diagram of an apparatus with any data processing capability where the industrial anomaly detection model training device based on multi-model fusion is located is shown in fig. 2, and in addition to a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 2, the apparatus with any data processing capability where the apparatus is located in an embodiment generally includes other hardware according to an actual function of the apparatus with any data processing capability, which is not described herein again.

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The embodiment of the invention also provides a computer readable storage medium, wherein a program is stored on the computer readable storage medium, and when the program is executed by a processor, the industrial anomaly detection model training method based on multi-model fusion in the embodiment is realized.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may also be an external storage device, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Although the foregoing detailed description of the invention has been provided, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing examples, and that certain features may be substituted for those illustrated and described herein. Modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The industrial anomaly detection model training method based on multi-model fusion is characterized by comprising the following steps of:

step one, preprocessing after acquiring sensor data;

step six, repeating the step one to the step five, minimizing the total loss, updating the neural network parameters of the student model until the neural network parameters of the student model are converged and fixed, obtaining a target model, and finishing training;

the first step is specifically as follows: converting sensor data into sensor feature tensors using a single-layer LSTM networkWherein->

Is the time window size of the sensor data, i.e. the data length,/->

Hidden layer dimensions that are sensor feature tensors;

the second step specifically comprises the following substeps:

s21, tensor of sensor characteristics

Respectively input pre-trained +.>

A teacher model for->

A plurality of teacher models, each model having +.>

A plurality of intermediate layers; for->No. H of the personal model>

Intermediate layers whose output intermediate layer tensor is +.>

Middle layer tensors of the individual teacher models;

s22, tensor of sensor characteristics

Inputting a student model, for the +.>

Intermediate layer, calculated as->

Intermediate layer tensor of student model of layer->

；

S23, for the first

the third step specifically comprises the following substeps:

No. H of the teacher model>

Individual layers with corresponding tensor of teacher's public space>

Wherein->

s32, if nonlinear transformation

Parameter->

Is fixed, the total ∈is calculated>

Personal teacher public space tensor->

Parameter->

；

S33, regarding the first

No. H of the teacher model>

Teacher public space tensor corresponding to each layer>

By non-linear transformation

Mapping it to middle layer tensor +.>

Teacher middle layer reconstruction tensor with same dimension>

；

S34, comparing the intermediate layer tensor of the teacher model

Reconstructing tensor with teacher interlayer>

Calculate reconstruction error +.>

：

，

Wherein,,

reconstructing a loss function; minimizing +.>

Update nonlinear transformation->

Parameters of (2)

Until reconstruction error->

Less than threshold->

Or the iteration step number is satisfied, the parameter is fixed +.>

；

S35, converting the intermediate layer tensor of the student model into student public air with the same dimensionTensor of the first

The number of layers of the composite material,

The method comprises the steps of carrying out a first treatment on the surface of the The student public space tensor with the same dimension is consistent with the teacher public space tensor in dimension in the step S31;

the fourth step comprises the following substeps:

s41, based on the first

Layered student public space tensor->

First->

Personal teacher model->

Teacher public space tensor of layer>

The expression is:

，

s42, according to the attention coefficient

Weighted average of all teacher public space tensors to obtain +.>

Layer-corresponding teacher-weighted tensor->

The expression is:

；

Wherein if +_appears in the annotation dataset>

To obtain a new teacher task layer splicing vector, and the expression is:

，

wherein,,

for vector concatenation operations, ++>

For length +.>

Is zero vector,/->

The number of anomaly categories that are newly present in the dataset.

2. The industrial anomaly detection model training method based on multi-model fusion according to claim 1, wherein the fifth step specifically comprises the following sub-steps:

The kth teacher weighted tensor corresponding to the same

Obtaining distillation loss->

The expression is:

wherein->

As a loss function;

wherein,,

for the number of intermediate layers of the student model, +.>

As a mean square error loss function, the expression is:

；

And the corresponding teacher task layer splicing vector +.>

Obtaining a soft target loss function>

The expression is:

；

One-hot representation corresponding to labeling correct category +.>

Obtaining predicted loss->

The expression is:

，

wherein,,

、/>

respectively correct category of independent heat indicates +.>

Bit-corresponding value, student model task vector +.>

Probability value of bit prediction->

For the total number of categories, its value and student model task vector +.>

Is uniform in length;

s54, distillation loss

Loss of->

、/>

Weighted summation, resulting in a final total loss +.>

I.e. the total loss function expression is:

，

wherein,,

、/>

is a super parameter.

3. The industrial anomaly detection model training method based on multi-model fusion according to claim 2, wherein the step six specifically comprises the following sub-steps:

Neural network parameters for student model +.>

Nonlinear transformation neural network parameters->

Updating;

s62, when model loss function

Lowering to threshold->

The student model is the target model.

4. An industrial anomaly detection model training device based on multi-model fusion, comprising one or more processors configured to implement the industrial anomaly detection model training method based on multi-model fusion of any one of claims 1 to 3.

5. A computer-readable storage medium, having stored thereon a program which, when executed by a processor, implements an industrial anomaly detection model training method based on multi-model fusion as claimed in any one of claims 1 to 3.