CN116861250A

CN116861250A - Fault diagnosis model training method and device

Info

Publication number: CN116861250A
Application number: CN202310912536.XA
Authority: CN
Inventors: 王明君; 金泽中; 孙东; 郑华丽; 叶春明
Original assignee: China Tobacco Zhejiang Industrial Co Ltd
Current assignee: China Tobacco Zhejiang Industrial Co Ltd
Priority date: 2023-07-24
Filing date: 2023-07-24
Publication date: 2023-10-10

Abstract

The application discloses a fault diagnosis model training method and device, wherein the method comprises the following steps: receiving an input first training data set, wherein all training samples in the first training data set are provided with labels, one part of the training samples are provided with correct labels, and the other part of the training samples are provided with error labels; the following steps are circularly executed until the current model training round reaches the maximum model training round: converting all training samples in the first training data set into corresponding first feature coding vectors; performing label classification on the first feature coding vector to obtain the probability that the training sample belongs to each type of health state; acquiring the attention weight of the training sample according to the first feature coding vector; calculating a noise attention loss function of the current round training according to the probability, the attention weight and the labels of all training samples; and updating the model parameters according to the noise attention loss function. The application improves the expression capacity and the diagnosis precision of model characteristics.

Description

Fault diagnosis model training method and device

Technical Field

The application relates to the technical field of fault diagnosis, in particular to a fault diagnosis model training method and device.

Background

Bearings are used as key parts of mechanical transmission, and are widely applied to various mechanical equipment, and the health condition of the bearings has important influence on the safety and stability of the mechanical equipment. However, when the device works for a long time under severe environments such as high speed, heavy load and the like, the bearing is inevitably degraded, and cracks, abrasion and the like are generated. Once the fault occurs, the normal operation of the whole equipment is directly affected, the economic loss is caused for enterprises if the fault is light, the accident is caused if the fault is heavy, and the life safety is threatened. Therefore, the method has great engineering significance in order to ensure the normal operation of mechanical equipment, monitor the health condition of the bearing and timely eliminate potential safety hazards.

At present, fault diagnosis methods for rolling bearings are mainly divided into two aspects of analysis model-based and data-based driving. The method based on the analytical model needs to analyze and express the fault diagnosis problem, has high modeling difficulty for a system with higher complexity, has low universality on other systems of the established model, and has certain limitation in practical popularization and use. The fault diagnosis method based on data driving is difficult to mine and extract deeper micro features in fault data due to insufficient feature extraction capability, so that the improvement of diagnosis accuracy is limited.

With the rapid rise and popularization of the Internet, the Internet of things and the like, the current social data has a rapid growth speed compared with that of the prior art in any period. The large data provides sufficient training 'raw materials' for the deep neural network, provides a new opportunity for deep research and application of intelligent fault diagnosis based on data driving, and is widely applied to the field of fault diagnosis because the fault information can be effectively represented by the current fault diagnosis method based on deep learning. In actual industrial activities, however, workers are prone to assigning false label classifications to failure modes in the absence of expertise. Thus, in real industrial data sets, the problem of labeling errors (i.e., label noise) is unavoidable.

However, most of the current fault diagnosis methods based on data driving are too dependent on a complete data set, and when tag noise exists, the model is too fit to noise tag data, so that the feature expression capability of the model is insufficient, and the diagnosis precision is affected.

Disclosure of Invention

The application provides a fault diagnosis model training method and device, which realize dynamic sample division by introducing attention weight based on feature code vector representation so as to regularize a loss function, have higher diagnosis robustness on a label noise sample, reduce gradient representation caused by the label noise sample, avoid the model from being excessively fitted to noise label data, and improve the expression capacity and diagnosis precision of model features.

The application provides a fault diagnosis model training method, which comprises the following steps:

receiving an input first training data set, wherein all training samples in the first training data set are provided with labels, one part of the training samples are provided with correct labels, and the other part of the training samples are provided with error labels;

the following steps are circularly executed until the current model training round reaches the maximum model training round:

converting all training samples in the first training data set into corresponding first feature coding vectors;

performing label classification on the first feature coding vector to obtain the probability that the training sample belongs to each type of health state;

acquiring the attention weight of the training sample according to the first feature coding vector;

calculating a noise attention loss function of the current round training according to the probability, the attention weight and the labels of all training samples;

and updating the model parameters according to the noise attention loss function.

Preferably, the fault diagnosis model training method further includes: performing contrast learning by using the first training data set to obtain a contrast loss function;

obtaining a comprehensive loss function of current round training according to the noise attention loss function and the contrast loss function;

and, in addition, the processing unit,

and updating the model parameters according to the comprehensive loss function.

Preferably, the first training data set is used for contrast learning to obtain a contrast loss function, which specifically includes:

performing two different data enhancement on training samples in the first training data set to form a data enhancement sample set;

and performing contrast learning on sample pairs consisting of any two data enhancement samples in the data enhancement sample set to obtain a contrast loss function.

Preferably, the comparison learning is performed on a sample pair consisting of any two data enhancement samples in the data enhancement sample set, so as to obtain a comparison loss function, which specifically includes:

for any sample pair, firstly, converting two data enhancement samples in the sample pair into a first feature coding vector and a second feature coding vector respectively; then, mapping the first feature code vector and the second feature code vector into space representation vectors respectively; finally, calculating the similarity between the two space representation vectors;

and calculating the mutual information among the samples by using the similarity of all the sample pairs, and calculating a contrast loss function according to the mutual information among all the samples.

Preferably, before calculating the noise attention loss function, further comprising:

judging whether the current model training round reaches a label correction starting round, wherein the label correction starting round is smaller than the maximum model training round;

if yes, carrying out label correction on the labels to form corrected labels, and forming a second training data set by all training samples with the corrected labels;

calculating a noise attention loss function of the current round training according to the probability, the attention weight and the corrected labels corresponding to the training samples;

and updating the first training data set into a second training data set, and training for subsequent rounds by using the second training data set.

The application also provides a fault diagnosis model training device, which comprises a training data receiving module, a first conversion module, a classification module, a weight obtaining module, a first loss function calculating module and a parameter updating module;

the training data receiving module is used for receiving an input first training data set, all training samples in the first training data set are provided with labels, one part of the training samples are provided with correct labels, and the other part of the training samples are provided with error labels;

the first conversion module is used for converting all training samples in the first training data set into corresponding first feature coding vectors;

the classification module is used for carrying out label classification on the first feature coding vector to obtain the probability that the training sample belongs to each type of health state;

the weight obtaining module is used for obtaining the attention weight of the training sample according to the first feature coding vector;

the first loss function calculation module is used for calculating a noise attention loss function of the current round training according to the probability of all training samples, the attention weight and the labels of the training samples;

the parameter updating module is used for updating the model parameters according to the noise attention loss function.

Preferably, the fault diagnosis model training device further comprises a contrast learning module and a second loss function calculation module;

the contrast learning module is used for carrying out contrast learning by utilizing the first training data set to obtain a contrast loss function;

the second loss function calculation module is used for obtaining a comprehensive loss function of the current round training according to the noise attention loss function and the contrast loss function;

and the parameter updating module is used for updating the model parameters according to the comprehensive loss function.

Preferably, the contrast learning module comprises a data enhancement module and a sample pair contrast learning module;

the data enhancement module is used for carrying out two different data enhancement on all training samples in the first training data set to form a data enhancement sample set;

the sample pair contrast learning module is used for carrying out contrast learning on any two data enhancement samples in the data enhancement sample set so as to obtain a contrast loss function.

Preferably, the sample pair contrast learning module comprises a second conversion module, a mapping module, a similarity calculation module and a contrast loss function calculation module;

the second conversion module is used for respectively converting two data enhancement samples in the sample pair into a first feature coding vector and a second feature coding vector;

the mapping module is used for mapping the first feature coding vector and the second feature coding vector into space representation vectors respectively;

the similarity calculation module is used for calculating the similarity between the two space representation vectors;

the contrast loss function calculation module is used for calculating the mutual information among the samples by using the similarity of all the sample pairs and calculating the contrast loss function according to the mutual information among all the samples.

Preferably, the fault diagnosis model training device further comprises a judging module, a label correcting module and a data set updating module;

the judging module is used for judging whether the current model training round reaches the label correction starting round or not;

the label correction module is used for correcting the label to form a corrected label when the current model training round reaches the label correction starting round, and all training samples with the corrected label form a second training data set;

the data set updating module is used for updating the first training data set into a second training data set, and training of subsequent rounds is carried out by utilizing the second training data set;

the first loss function calculation module is used for calculating a noise attention loss function of the current round training according to the probability and the attention weight of all training samples and corrected labels corresponding to the training samples.

Other features of the present application and its advantages will become apparent from the following detailed description of exemplary embodiments of the application, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a flow chart of a preferred embodiment of the fault diagnosis model training method provided by the present application;

FIG. 2 is a block diagram of one embodiment of a feature encoding module provided by the present application;

FIG. 3 is a schematic flow chart of contrast learning provided by the application;

FIG. 4 is a block diagram of one embodiment of a fault diagnosis model training system provided by the present application;

FIG. 5 is a graph comparing the noise ratios of the labels before and after label correction provided by the present application;

FIG. 6 is a graph showing the classification effect of the model training method of the present application compared with other model training methods;

fig. 7 is a block diagram of the fault diagnosis model training apparatus provided by the present application.

Detailed Description

Various exemplary embodiments of the present application will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present application unless it is specifically stated otherwise.

The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the application, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, the techniques, methods, and apparatus should be considered part of the specification.

In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of exemplary embodiments may have different values.

The application provides a fault diagnosis model training method and device, which are used for realizing dynamic distribution of weights and dynamic division of samples by introducing attention weights expressed based on feature code vectors so as to regularize a loss function, reduce gradient expression caused by label noise samples and avoid the model from being fitted to noise label data, thereby effectively solving the problem of fault diagnosis with labeling error data and improving the expression capacity and diagnosis precision of model features. In addition, the application optimizes the feature space by comparing and learning the mapping distance of the zoomed-in positive sample pair and the zoomed-out negative sample feature distance so as to further enhance the feature expression capability of the model and reduce the negative influence of the label noise. In addition, in the later training stage, the method integrates model prediction and data original noise labels to execute label correction, builds a training data set with low noise rate level, further improves model generalization capability and obtains better feature representation.

As shown in fig. 1, the fault diagnosis model training method provided by the application includes:

s110: an input first training data set is received, all training samples (failure signal samples) in the first training data set having labels, wherein one part of the training samples has a correct label and the other part of the training samples has a wrong label. As an example, the first training data set is a vibration signal of different states of the rolling bearing under the same working condition with a labeling error.

The first training data set is expressed asWherein x is ^[i] In order to train the sample,for training sample x ^[i] Is a label of (a). Notably, the->Either a correct tag or a wrong tag (i.e., tag noise).

S120: all training samples in the first training data set are converted into corresponding first feature-encoded vectors.

Specifically, the characteristic code module f (. Theta.) _f ) Converting training samples into corresponding first feature code vectors, wherein θ _f For its module parameter set, please refer to fig. 4.f (·θ) _f ) Mapping an information sequence of an input training sample (e.g. a vibration signal sequence of a rolling bearing) to a high-dimensional feature embedding space, i.e. fx→R ^f Training sample x ^[i] The corresponding first feature encoding vector is denoted as h ^[i] 。

As an embodiment, the feature encoding module adopts a Resnet structure, as shown in fig. 2, and the Resnet structure adds a residual block in a traditional deep neural network, so as to simplify the complexity of the network and solve the problem of network degradation. With the residual block a more efficient depth network can be trained, and inputs can be propagated more quickly forward through the identity mapping connections in the residual block.

S130: and carrying out label classification on the first feature coding vector to obtain the probability that the training sample belongs to each type of health state.

Specifically, the module c (·; θ) is decoded by tag classification _c ) The first feature code vector is labeled, please refer to fig. 4.c (· θ) _c ) A first feature code vector h obtained by embedding the high-dimensional embedding space ^[i] As input, it is then mapped to the device state of health space, outputting a state of health space probability distribution p=softmax (v), i.e. c: f (x) →r ^M Wherein v is the classification value output by the tag classification decoding module. For a given training sample x ^[i] Which belongs to the k-th health state probability p ^[i] The method comprises the following steps:

s140: and acquiring the attention weight of the training sample according to the first feature code vector.

Early model learning phenomena indicated that deep learning models tended to memorize correctly labeled samples before fitting incorrectly labeled samples. Thus, in the early learning phase, correctly labeled samples are more likely to have a better representation of learning features than incorrectly labeled samples. In order to make the deep neural network generate attention weight reflecting learning representation quality, the application uses a feature coding module f (·; θ) _f ) Attention weighted branches are introduced later, please refer to fig. 4. The attention weight branch consists of all connection layers and outputs a scalar e ^[i] So that e ^[i] ＝Wh ^[i] +b and scaling it to 0 to 1 using a sigmoid function, whereRepresenting branch weights, ++>Representing the corresponding deviation.

For each training sample x ^[i] Attention weight of outputThe method comprises the following steps:

s150: a noise attention loss function for the current round of training is calculated.

As one embodiment, the noise attention loss function for the current round of training is calculated from the probabilities of all training samples, the attention weights, and the labels of the training samples.

Cross entropy L is commonly used in classification tasks _ce Model prediction and label fitting are measured as an empirical risk loss function to perform model parameter update optimization during back propagation:

where N is the batch size of model training.

In order for the attention weights to automatically capture the characterized differences, the present application introduces them as regularization terms into (3), and thus proposes a method of generating attention terms L _a And lifting item L _b Constituent noise attention loss function L _NAL :

Where λ is an adjustable hyper-parameter. Wherein attention item L _a The prediction term in the formula (3) is formed by weighting and integrating model prediction and sample original tag data:for correctly labeling the sample, the attention weight outputted by the model +.>Trend toward 1, where L _a Will be degenerated into the cross entropy loss function of equation (3). Whereas for a mislabeled sample its attention weight will go towards 0, reducing the gradient representation it causes.

On the one hand, in the early learning stage, the model has not been fitted to the annotation errorsThe sample (i.e. the label noise sample) still cannot be effectively characterized by the labeling error sample compared with the labeling correct sample, so that the fitting degree of the predicted health state and the corresponding labeling health state is poor. By minimizing L _a The model can effectively output attention weight for the error labeling sampleApproaching 0. On the other hand, the model is fitted with the correct labeling sample in advance, so thatTrend towards 0, when the attention weighting value is for minimizing L _a And has no influence. The present application thus improves the term L by introducing _b To avoid the model outputting weight value of 0 and L for all samples _b Can be regarded as a binary cross entropy loss function, i.e. for all inputs +.>The target predicted values are all 1, so that for labeling the correct samples, the weight values can effectively approach 1.

Attention loss function L for noise as set forth in equation (4) _NAL For simplicity, the present application rewrites it asAnd gradient analysis was performed to further illustrate its effectiveness:

in which a scaling factor is set

Compared with the cross entropy loss function L _ce ，L _NAL By introducing scaling factorsGradient re-weighting is performed to reduce the effect of tag noise sample data. Wherein->At->Monotonically increasing and having ∈> For correctly labeling samples, its cross entropy gradient term (p _j -1) tending to 0 after an early learning phase, tends to overfit the model to the falsely labeled samples. By introducing a scaling factor->(note that for labeling the wrong sample, the scaling factor tends to be 0 under the action of attention weight), the gradient representation caused by labeling the wrong sample can be effectively reduced, and the dominant gradient update is prevented.

S1100: and updating the model parameters according to the comprehensive loss function. If the current model training round reaches the maximum model training round, S120 is returned to after S1100 is executed. Otherwise, the training is ended.

As an embodiment, the noise attention loss function is taken as a comprehensive loss function, and the model parameters are updated according to the comprehensive loss function.

On the basis of the above, preferably, the fault diagnosis model training method further includes:

s160: and performing contrast learning by using the first training data set to obtain a contrast loss function.

S170: and obtaining the comprehensive loss function of the current round training according to the noise attention loss function and the contrast loss function.

In step S1100, the model parameters are updated according to the integrated loss function.

As an embodiment, in S160, performing contrast learning by using the first training data set to obtain a contrast loss function specifically includes:

s1601: and performing two different data enhancement on the training samples in the first training data set to form a data enhancement sample set.

As one embodiment, training samples of batch size N are randomly sampled from a first training data setAs shown in fig. 4, two different data enhancement methods t are performed for each training sample in the batch _a ，t _b To obtain a data enhanced sample set +.>(2N total) including a first data enhancement method t _a Obtained first data enhancement sample set and second data enhancement method t _b The obtained second data enhances the sample set.

As an example, as shown in fig. 3, training sample x is subjected to two different data enhancement methods t _a ，t _b The data enhancement sample obtained later is x _a ，x _b 。

S1602: and performing contrast learning on sample pairs consisting of any two data enhancement samples in the data enhancement sample set to obtain a contrast loss function.

Enhancing samples for a given data(from training sample x ^[k] Warp data enhancement method t _a Obtained) which can form a sample pair with the remaining 2N-1 samples in the data enhanced sample set, wherein +.>Is a positive sample pair (where,from training samples x ^[k] Warp data enhancement method t _b Obtained) with the remaining 2N-2 samples (from training sample x ^[k] Other training samples pass data enhancement method t _a Or t _b Obtained) constitute negative pairs of samples.

As shown in fig. 4, the comparison learning is performed on a sample pair consisting of any two data enhancement samples in the data enhancement sample set, so as to obtain a comparison loss function, which specifically includes:

p1: for any sample pair, firstly, converting two data enhancement samples in the sample pair into a first feature coding vector and a second feature coding vector respectively; then, mapping the first feature code vector and the second feature code vector into space representation vectors respectively; finally, the similarity between the two spatial representation vectors is calculated.

As one embodiment, the negative sample pairFor example, please refer to S120, and the feature encoding module f (·; θ) is utilized _f ) Extracting characteristic representation from two data enhancement samples to obtain corresponding characteristic coding vectors:

then, the layer g (. Theta.) is projected _g ) Mapping the first feature code vector and the second feature code vector to a unit hypersphere space to obtain corresponding space representation vectors:

finally, in the unit hypersphere vector space, the cosine similarity is adopted to measure the similarity degree of the two space representation vectors. For each pair of feature pairs (k, j), where k e 1,2,3, N, j e {1,2,3,., N, the remaining chords are similarDegree calculation formulaThe method comprises the following steps:

as an example, as shown in fig. 3, a positive sample pair (x _a ，x _b ) A feature encoded module f (; θ _f ) After that, a first characteristic coding vector v is obtained _a And a second feature code vector v _b Projected layer g (. Theta.; theta) _g ) Then respectively obtaining space representation vectors z _a ，z _b And finally, calculating the similarity of the two.

P2: and calculating the mutual information among the samples by using the similarity of all the sample pairs, and calculating a contrast loss function according to the mutual information among all the samples.

As one embodiment, the sample is enhanced for any first dataIts inter-sample mutual information->The method comprises the following steps:

enhancing samples for arbitrary second dataIts inter-sample mutual information->The method comprises the following steps:

in the formulas (9) and (10),for the indication function, when j is not equal to i, the value is 1, otherwise, 0.τ is the comparative loss temperature coefficient.

As one embodiment, infoNCE is used as the loss function L _c ：

Thus, in S170, the integrated loss function L is:

L＝L _NAL +λ _c L _c (12)

wherein lambda is _c To compare the loss balance coefficients.

In the fault diagnosis model training method of the present application, the overall training objective is to apply a gradient descent method so that the loss function (e.g., the integrated loss function L here) is minimized. Gradient descent may be performed using an Adam optimizer.

Through contrast learning loss, samples with the same health state in the high-dimensional feature embedding space can be effectively enabled to be more compact in spatial distribution, and distances among samples with different health states are further lengthened, so that feature enhancement is achieved, and accuracy of the integrated pseudo tag in tag correction (please see the following description) is improved.

On the basis of the above, preferably, before S150, the method further includes:

s180: and judging whether the current model training round reaches the label correction starting round or not. If yes, then execution S190, performing tag correction at each round; otherwise, S150 is performed.

S190: and carrying out label correction on the labels of the training samples to form corrected labels, and forming a second training data set by all the training samples with the corrected labels. Then S150 is performed, and in S150, a noise attention loss function for the current round of training is calculated from the probabilities of all training samples, the attention weights, and the corrected labels corresponding to the training samples.

Specifically, the Label is corrected by a Label Correction (LC) module to obtain a second training data set with a lower Label noise rate:

wherein f (y, y _[t] ) For the labels of training samples in the training round of model t, f (y, y _[t-1] ) Training labels of training samples in a round for a t-1 model;training labels of samples for a first training dataset; e (E) _s Enabling a round for tag correction; m is the difference between the current model training round and the label correction starting round; alpha is the update momentum of the pseudo tag and takes on the value of 0, 1. t is greater than or equal to E _s The first term is a weight alpha with exponential decay ^m The original noise label of the model label can enable the model label data to iterate more gently, and the problem of cognitive deviation is relieved. The second term is an integrated predictive term consisting of an exponential moving average of the predictive values, E in iterative rounds _s At +m, its integration iteration term is +.> And alpha is increased along with the increase of the iteration rounds of the model ^m Gradually approaching 0, so that the model prediction target finally depends on the integrated prediction item to construct a more complete training label set and introduce the training label set into L _NAL Substitute original noise tag->Model training is performed.

In the preferred embodiment, the fault diagnosis model training method further includes:

s1110: the first training data set is updated to the second training data set. And then returns to S120 (in the preferred embodiment of contrast learning, also returns to S160) for subsequent rounds of training using the second training data set.

It should be noted that the present application does not limit the sequence of S1100 and S1110, and the two steps may be performed simultaneously.

Fig. 5 shows a graph comparing the tag noise ratio of the first training data set (a) and the second training data set (b) obtained through tag correction in the case of 90% tag noise ratio, wherein the values on the diagonal represent the scores with the correct tags. As can be seen from fig. 5, the tag noise rate is significantly reduced after tag correction.

Based on the model training method combining weight distribution, label correction and contrast learning in the present application, the classification effect after training (please see fig. 6 (d)) is compared with other training methods as shown in fig. 6. Fig. 6 is a diagram of classifying fault signals using a five-classification method. Fig. 6 (a) shows the classification effect by the cross entropy loss function CE, and it can be seen from the figure that the classification results of a plurality of fault types cross each other, and no clear boundary can be obtained. Fig. 6 (b) is a classification effect using a symmetrical cross entropy loss function SCE, which combines cross entropy with anti-cross entropy, and it can be seen from the figure that class 0 and class 4 faults can be distinguished from the other three classes, but the other three classes cross each other. FIG. 6 (c) is a classification effect employing early regularization ELR, which trains the integration of predicted values of multiple previous iteration cycles of the model as regularization terms to introduce a loss function, as can be seen from the graph, with five types having essentially their own boundaries, but with a small amount of intersection between class 2 and class 3. As can be seen from fig. 6 (d), in the model training method of the present application, five types of faults have clear boundaries.

After model training is completed, the fault diagnosis model comprises the feature coding module and the tag classification decoding module, after fault signals are input into the fault diagnosis model, the model firstly converts the fault signals into feature coding vectors, and then the feature coding vectors are input into the tag classification decoding module, so that the probability that the fault signals belong to each type of health state is obtained.

Based on the fault diagnosis model training method, the application further provides a fault diagnosis model training device. As shown in fig. 7, the fault diagnosis model training apparatus includes a training data receiving module 710, a first converting module 720, a classifying module 730, a weight obtaining module 740, a first loss function calculating module 750, and a parameter updating module 760.

The training data receiving module 710 is configured to receive an input first training data set, where all training samples in the first training data set have labels, and one part of the training samples has correct labels and another part of the training samples has incorrect labels.

The first conversion module 720 is configured to convert all training samples in the first training data set into corresponding first feature encoding vectors.

The classification module 730 is configured to perform label classification on the first feature code vector to obtain probabilities that the training samples belong to each type of health status.

The weight obtaining module 740 is configured to obtain the attention weight of the training sample according to the first feature code vector.

The first loss function calculation module 750 is configured to calculate a noise attention loss function of the current round training according to probabilities of all training samples, attention weights, and labels of the training samples.

The parameter updating module 760 is configured to update the model parameters according to the noise attention loss function.

Preferably, the fault diagnosis model training apparatus further includes a contrast learning module 770 and a second loss function calculation module 780.

The contrast learning module 770 is configured to perform contrast learning using the first training data set to obtain a contrast loss function.

The second loss function calculation module 780 is configured to obtain a comprehensive loss function of the current training round according to the noise attention loss function and the contrast loss function.

And, the parameter updating module 760 is configured to update the model parameters according to the comprehensive loss function.

Preferably, the contrast learning module 770 includes a data enhancement module 7701 and a sample pair contrast learning module 7702.

The data enhancement module 7701 is configured to perform two different data enhancement on all training samples in the first training data set to form a data enhancement sample set.

The sample pair contrast learning module 7702 is configured to perform contrast learning on any two data enhancement samples in the data enhancement sample set to obtain a contrast loss function.

Preferably, the sample pair contrast learning module 7702 includes a second transformation module, a mapping module, a similarity calculation module, and a contrast loss function calculation module.

The second conversion module is used for converting two data enhancement samples in the sample pair into a first feature coding vector and a second feature coding vector respectively.

The mapping module is used for mapping the first feature coding vector and the second feature coding vector into space representation vectors respectively.

The similarity calculation module is used for calculating the similarity between the two space representation vectors.

Preferably, the fault diagnosis model training apparatus further includes a judging module 790, a tag correcting module 7100, and a dataset updating module 7110.

The determination module 790 is configured to determine whether the current model training round has reached a label correction enabled round.

The tag correction module 7100 is configured to perform tag correction on the tag when the current model training round reaches the tag correction enabling round, form corrected tags, and form a second training data set from all training samples having corrected tags.

The data set update module 7110 is configured to update the first training data set to a second training data set, and perform training of a subsequent round by using the second training data set.

The first loss function calculation module 750 is configured to calculate a noise attention loss function of the current round training according to probabilities of all training samples, attention weights, and corrected labels corresponding to the training samples.

According to the application, weight distribution and label correction are carried out according to early learning characteristics of the model, and contrast learning is introduced to enhance the characterization capability of the model, so that an additional training subset is not needed, and the model can still maintain good generalization performance even under higher noise rate. Firstly, according to the characteristic that a model can fit and label a correct sample first, so that the characteristic representation capability of the sample is consistent with the labeling accuracy of a label of the sample, attention weight branches are designed, and the attention weight branches are introduced into a loss function to divide the sample so as to implement regularization, namely, a larger weight is given to the correct labeling sample, and the weight value of the incorrect labeling sample is reduced, so that the correct labeling sample is effectively ensured to keep dominant in the gradient updating process of the model. And secondly, the tag correction module performs tag correction by integrating the prediction item and the original tag data, and a more complete training data set is constructed. And finally, designing a comparison learning module, so that the constraint of the feature similarity of the comparison learning module and the structural similarity of the model classification branches is applied to a shared feature extraction network, and judging information in fault signals is fully mined.

The application enhances the robustness of the model by introducing the attention weight branch, starts from the two aspects of label correction and contrast learning enhancement discrimination capability, increases available samples of the model, optimizes discrimination boundaries of various health-state samples under a high-dimensional embedding space, further improves generalization capability of the model, and achieves the average diagnosis precision of 98.0% and 98.2% under various noise levels.

While certain specific embodiments of the application have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the application. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the application. The scope of the application is defined by the appended claims.

Claims

1. A fault diagnosis model training method, comprising:

calculating a noise attention loss function of current round training according to the probability of all training samples, the attention weight and the labels of the training samples;

and updating model parameters according to the noise attention loss function.

2. The fault diagnosis model training method according to claim 1, further comprising: performing contrast learning by using the first training data set to obtain a contrast loss function;

and, in addition, the processing unit,

and updating the model parameters according to the comprehensive loss function.

3. The method for training a fault diagnosis model according to claim 2, wherein the performing contrast learning using the first training data set to obtain a contrast loss function specifically comprises:

and performing contrast learning on sample pairs consisting of any two data enhancement samples in the data enhancement sample set to obtain the contrast loss function.

4. The method for training a fault diagnosis model according to claim 3, wherein the performing contrast learning on a sample pair consisting of any two data enhancement samples in the data enhancement sample set, to obtain the contrast loss function, specifically comprises:

for any sample pair, firstly, converting two data enhancement samples in the sample pair into the first feature coding vector and the second feature coding vector respectively; then, mapping the first feature code vector and the second feature code vector into spatial representation vectors respectively; finally, calculating the similarity between the two space representation vectors;

5. The fault diagnosis model training method according to claim 1 or 2, further comprising, before calculating the noise attention loss function:

judging whether the current model training round reaches a label correction starting round or not, wherein the label correction starting round is smaller than the maximum model training round;

calculating a noise attention loss function of current round training according to probabilities of all training samples, attention weights and corrected labels corresponding to the training samples;

and updating the first training data set into the second training data set, and performing subsequent rounds of training by using the second training data set.

6. The fault diagnosis model training device is characterized by comprising a training data receiving module, a first conversion module, a classification module, a weight obtaining module, a first loss function calculating module and a parameter updating module;

the first loss function calculation module is used for calculating a noise attention loss function of current round training according to the probability of all training samples, the attention weight and the labels of the training samples;

7. The fault diagnosis model training apparatus of claim 6, further comprising a contrast learning module and a second loss function calculation module;

the second loss function calculation module is used for obtaining a comprehensive loss function of current round training according to the noise attention loss function and the contrast loss function;

8. The fault diagnosis model training apparatus of claim 7, wherein the contrast learning module comprises a data enhancement module and a sample pair contrast learning module;

the sample pair contrast learning module is used for carrying out contrast learning on any two data enhancement samples in the data enhancement sample set so as to obtain the contrast loss function.

9. The fault diagnosis model training apparatus of claim 8, wherein the sample pair contrast learning module comprises a second transformation module, a mapping module, a similarity calculation module, and a contrast loss function calculation module;

the second conversion module is used for respectively converting two data enhancement samples in the sample pair into the first feature coding vector and the second feature coding vector;

the contrast loss function calculation module is used for calculating the mutual information among the samples by utilizing the similarity of all the sample pairs and calculating the contrast loss function according to the mutual information among all the samples.

10. The device according to claim 6 or 7, further comprising a judgment module, a tag correction module, and a data set update module;

the label correction module is used for carrying out label correction on the label when the current model training round reaches the label correction starting round to form corrected labels, and all training samples with the corrected labels form a second training data set;

the data set updating module is used for updating the first training data set into the second training data set, and training of subsequent rounds is carried out by utilizing the second training data set;