CN113505817A

CN113505817A - Self-adaptive weighting training method for bearing fault diagnosis model samples under unbalanced data

Info

Publication number: CN113505817A
Application number: CN202110672571.XA
Authority: CN
Inventors: 郑子勋; 朱永生; 郑懿焜; 任智军; 林昙涛
Original assignee: Zhejiang Ute Bearing Co ltd
Current assignee: Zhejiang Ute Bearing Co ltd
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2021-10-15

Abstract

The invention provides a self-adaptive weighting training method for a bearing fault diagnosis model sample under unbalanced data, which comprises the following steps of: s1, obtaining a training sample; s2, constructing a bearing fault diagnosis model; s3, inputting the training samples into a bearing fault diagnosis model, and performing feature extraction on each training sample; s4, predicting the label of each training sample according to the characteristic of each training sample; s5, calculating errors between the predicted labels and the real labels and weights of all training samples; s6, weighting the error according to the weight of each training sample and then reversely propagating to update the model; s7, repeating the steps S3-S6 until the bearing fault diagnosis model converges or the specified number of iteration steps is reached. According to the scheme, the sample self-adaptive weighting of the intelligent bearing fault diagnosis model under the unbalanced data in the training process is realized, the training process can be focused on unconverged samples, the leading of most types in the unbalanced data to training is weakened, and the performance of the intelligent diagnosis model under the unbalanced data is improved.

Description

Self-adaptive weighting training method for bearing fault diagnosis model samples under unbalanced data

Technical Field

The invention belongs to the technical field of bearing fault diagnosis, and particularly relates to a self-adaptive weighting training method for an intelligent bearing fault diagnosis model sample under unbalanced data.

Background

The bearing is a core basic component of mechanical equipment, is the heart of a rotary support system, and the reliability of the bearing is important to ensure the performance, the function and the efficiency of a host machine. Once a bearing in a mechanical device fails, the problems of reduced safety, degraded performance and the like of the device can be caused, and in severe cases, the function of the device can be lost, so that the device is shut down accidentally, and huge production loss and cost loss are caused. Therefore, it is necessary to develop a fault diagnosis method capable of effectively identifying the health state of the bearing.

In the field of fault diagnosis, the identification of the health state of a bearing is mainly classified into a signal processing-based method and a data-driven method. The method based on signal processing utilizes various signal processing technologies, such as fast Fourier transform, wavelet decomposition, empirical mode decomposition and the like, to analyze the collected bearing signals and judge whether key sensitive characteristics corresponding to different bearing faults exist in the signals. Because the key sensitive characteristics only face typical faults of the bearing under a single condition, factors such as fault distribution attributes and size attributes are not considered, a bearing signal transmission path acquired from mechanical equipment is complex, the signal-to-noise ratio is low, and the generalization capability of a fault diagnosis method based on signal processing is severely restricted in practical application. With the development of sensor technology and communication technology, the monitoring of modern mechanical equipment on bearings enters a big data era, the data contains rich bearing operation information and state information, if key sensitive characteristics of different faults under different conditions (different rotating speeds, different loads, different fault distributions and different fault sizes) are directly mined from the big data, the characteristics which are most similar under the same fault and different conditions and have the most distinguishing characteristics among different faults can be extracted, and the problem of insufficient characteristic generalization capability based on a signal processing method can be solved, so that the data-driven fault diagnosis method becomes the most popular and advanced technology in the field of current fault diagnosis.

However, the current data-driven methods are mostly based on the assumption of data balance of different health states of the bearing, which is unreasonable in practice: on one hand, the bearing runs in a healthy state for a long time, and the running time with faults is short, so that the collected bearing running data is mostly bearing healthy state data, and the fault data is less; on the other hand, under the influence of the properties of each component of the bearing, the probability of the fault of each component in the operation process is different, so that the data volume of different faults in the bearing operation data is inconsistent. In summary, there is a serious imbalance between the operation data of the bearings of the mechanical equipment, which directly affects the application effect of the data driving method in practice.

In order to eliminate the influence of unbalanced data training on the fault diagnosis model, three types of methods are proposed at present, which are respectively: data level, model level, algorithm level. The data-level method restores the balance of the bearing monitoring data by performing undersampling on most classes and oversampling on few classes, and then trains a fault diagnosis model, and patents published about the method include: CN111639461A, CN112257767A, CN111240279A and CN 112395558A. The model-level method weakens the negative influence of unbalance data on a fault diagnosis model by constructing a model insensitive to class unbalance, and partial patents published about the method comprise: CN 112364706A. The algorithm-level method inhibits the dominance of more types of data on the training of the fault diagnosis model by weighting samples of different types, and further eliminates the influence of unbalanced data on the model, and the published patents about the method comprise the following steps: CN111881159A, CN111860658A, CN 111340107A.

However, the above three methods still have various problems:

(1) because the data level method only depends on the existing data, when the fault data is less, the information gain brought by the oversampling method is limited; when the data unbalance degree is large, the data still has a certain unbalance problem after undersampling;

(2) because the model-level method only depends on the self-optimization attribute of the model to form the adjustment to the unbalanced data training process, when the data imbalance degree is larger, most dominant effects to the training are easier to occur and are larger than the self-adjustment capability of the model.

In summary, the data-level and model-level methods are only suitable for situations with a small degree of imbalance, and have a limited ability to cope with actual large data scenes. The algorithm-level method is a basic scheme for solving the unbalanced learning problem because the training process of the model is directly adjusted, but the current algorithm-level method has the problems that the sample weighting weight needs to be manually set and the weight selection is difficult, and is not beneficial to the rapid optimization and the rapid application of the fault diagnosis model. Therefore, a new method for adaptively adjusting the weights along with the training process is needed to facilitate the training of the fault diagnosis model under the unbalanced data.

Disclosure of Invention

The invention aims to solve the problems and provides a self-adaptive weighting training method for a bearing fault diagnosis model sample under unbalanced data.

In order to achieve the purpose, the invention adopts the following technical scheme:

a self-adaptive weighting training method for a bearing fault diagnosis model sample under unbalanced data comprises the following steps:

s1, obtaining a training sample;

s2, constructing a bearing fault diagnosis model;

s3, inputting the training samples into a bearing fault diagnosis model, and performing feature extraction on each training sample;

s4, predicting the label of each training sample according to the characteristic of each training sample;

s5, calculating errors between the predicted labels and the real labels and weights of all training samples;

s6, weighting the error according to the weight of each training sample and then reversely propagating to update the model;

s7, repeating the steps S3-S6 until the bearing fault diagnosis model converges or the specified number of iteration steps is reached.

In the adaptive weighted training method for the bearing fault diagnosis model samples under the unbalanced data, in step S1, the method simultaneously obtains training samples and verification samples, and further includes, after step S7:

and S7, inputting a verification sample into the bearing fault diagnosis model to verify the validity of the model.

In the adaptive weighted training method for the bearing fault diagnosis model sample under the unbalanced data, in step S1, a data acquisition system and/or a sensor acquisition system acquires a raw signal during the operation of the bearing, and divides the raw signal into the training sample and the verification sample.

In the adaptive weighted training method for the bearing fault diagnosis model sample under the unbalanced data, in step S1, the original signal is intercepted with a determined length and is divided into the training sample and the verification sample.

In the adaptive weighting training method for the bearing fault diagnosis model sample under the unbalanced data, before step S2, data preprocessing including fast fourier transform and normalization is performed on the training sample and the verification sample.

In the self-adaptive weighting training method for the bearing fault diagnosis model sample under the unbalanced data, the bearing fault diagnosis model comprises a feature extraction module and a classification module; in step S3, feature extraction is performed by the feature extraction module, and the calculation process of feature extraction is as follows:

inputting all training samples into a fault diagnosis model, and extracting the characteristics of the training samples through nonlinear transformation of a neural network

F＝h(X)

Wherein: x ═ X₁；x₂；...；x_N]Represents a set of training samples, and X ∈ R^N×d(ii) a h represents the nonlinear mapping of the fault diagnosis model from input to features; f ═ F₁；f₂；...；f_N]Represents a set of training sample feature compositions, and F ∈ R^N ^×k。

In the above adaptive weighted training method for bearing fault diagnosis model samples under unbalanced data, in step S4, the classification module classifies the features, and step S4 specifically includes:

inputting all the characteristics into the classification module to obtain the prediction of the training sample label

Wherein: Ψ denotes the activation function or functions which,

represents a prediction of training sample labels by the classification module, an

In the adaptive weighting training method for the bearing fault diagnosis model samples under the unbalanced data, in step S5, the weight of each training sample is calculated based on the directional similarity.

In the adaptive weighting training method for the bearing fault diagnosis model samples under the unbalanced data, in step S5, the weight of each training sample is calculated by:

calculating directional similarity between training sample features:

A＝(F·F^T)/.(||F||·||F||^T)

wherein: the | | | F | | | represents the vector formed by the modules of the characteristics of each sample, and | | F | | | belongs to R^N×1；||F||^TRepresenting a transpose of the corresponding vector; a represents the directional similarity matrix between the sample features, and A ∈ R^N×N(ii) a V. representing the matrix corresponding bit division;

calculating the directional similarity between each sample feature and the classification weight:

B＝(F·W)/.(||F||·||W||)

wherein: b represents a directional similarity matrix between each sample feature and each classification weight, and B belongs to R^N×M；

Converting the sample real label into 'one-hot coding' form

y_i＝onehot(y_i)

Wherein: y is_iRepresents a one-hot encoding of the sample label, and y_i∈R^1×M(ii) a onehot represents a one-hot transcoding function;

integrating the one-hot codes of all samples to form a training sample real label one-hot code matrix: y ═ Y₁；y₂；...；y_N]；

Calculating the direction similarity mean value of each sample characteristic and other similar sample characteristics:

A_mean＝mean_row(A⊙(Y·Y^T-eye(N)))

wherein: a. the_meanRepresenting a mean matrix of directional similarities between each sample feature and other sample features of the same class, and A_mean∈R^N×1(ii) a mean _ row represents the mean of non-0 elements in a row of the matrix; eye (N) represents an identity matrix of size N × N; an indication of a corresponding element multiplication;

calculating the distribution mean value of the direction similarity mean value of each sample characteristic of each category and other sample characteristics of the same category:

A_{mean_mean}＝(Y·Y^T·A_mean)/.sum_row(Y·Y^T)

wherein: a. the_{mean_mean}The distribution mean value of the direction similarity mean value of each sample characteristic of each category and other sample characteristics of the same category, and A_{mean_mea}n∈R^N×1；

Calculating the directional similarity of each sample feature and the corresponding class classification weight:

B_sum＝sum_row(B⊙Y)

wherein: b is_sum∈R^N×1Sum _ row represents summing the non-0 elements in a row of the matrix;

calculating the error weight of each training sample based on the calculation result of the direction similarity:

λ＝sigmoid(A_mean+A_{mean_mean}×(sigmoid(B_sum-a)-0.5)-b)

wherein: a and b are hyper-parameters.

In the above adaptive weighting training method for bearing fault diagnosis model samples under unbalanced data, in step S5, the error between the predicted label and the real label is calculated by the following formula

l denotes an error function, y_iA real label representing the sample i,

a prediction tag representing sample i;

in step S6, each sample error is weighted by the following formula

Wherein: lambda [ alpha ]_iRepresents the error weight, L, of sample i_iRepresenting the error of a sample i, N representing the number of training samples, and M representing the number of bearing health state categories;

and in step S6, the weighting result is input to a back propagation algorithm to update the model.

The invention has the advantages that: the method and the device realize the self-adaptive weighting of the samples in the training process of the intelligent bearing fault diagnosis model under the unbalanced data, can focus the training process on the unconverged samples, weaken the leading of most types in the unbalanced data to the training, and improve the performance of the intelligent diagnosis model under the unbalanced data.

Drawings

FIG. 1 is a flowchart of a bearing fault diagnosis model sample adaptive weighting training method under unbalanced data according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, the embodiment discloses a self-adaptive weighting training method for a bearing fault diagnosis model under unbalanced data, which includes the following steps:

1) and acquiring bearing operation process data, namely original signals in the bearing operation process by using a data acquisition system and a sensor acquisition system. The sensor acquisition system includes various sensors.

2) The raw signal without any processing is cut out with a determined length D and divided into training samples D_train＝{x_i，y_iE (i e {1, 2.., N }) and a verification sample D_test＝{x_j，y_j}(j∈{1，2，...，L})，x_iRepresents a training sample, y_iE {1, 2.. multidata, M } represents a label corresponding to the training sample, x_jRepresenting the validation sample, y_jE {1, 2.. multidot.M } represents a label corresponding to the verification sample, N represents the number of training samples, M represents the number of bearing health state categories, and L represents the number of verification samples.

3) Data preprocessing including fast fourier transform and normalization is performed on all training and validation samples. The preprocessing step can be executed according to the difficulty of the diagnosis task.

4) A bearing fault diagnosis model comprising a feature extraction module and a classification module is constructed based on a deep learning theory, and the available deep learning theory comprises but is not limited to: convolutional neural networks, autoencoders, cyclic neural networks, and the like. Referring to fig. 1, the feature extraction module includes an input layer, a convolutional layer, a pooling layer, a full link layer, and the like, and the classification module includes a feature layer, and the like.

4.1) the calculation process of feature extraction by the feature extraction module is as follows:

all training samples D_trainInputting a fault diagnosis model, and obtaining the characteristics of a training sample through nonlinear transformation of a neural network

F＝h(X)

Wherein: x ═ X₁；x₂；...；x_N]Represents a set of training sample components, an

h represents the nonlinear mapping of the fault diagnosis model from input to features; f ═ F₁；f₂；...；f_N]Representing a set of the recognition features provided by the bearing fault diagnosis model on the training samples, an

。

4.2) the calculation process of the classification module is as follows:

first, the weights of the classification modules are normalized

Wherein: w is the classification weight matrix of the classification module, an

(ii) a W represents a vector composed of the moduli of the respective weight matrices, and

。

inputting all the characteristics into a classification module to obtain the prediction of the fault diagnosis model on the training sample label

Wherein: Ψ represents an activation function, including but not limited to sigmoid, softmax, etc.;

represents the prediction of training sample labels by the classification module, and represents the prediction in the form of' one-hot coding

Represents the prediction of the sample i label, represented in "one-hot coded" form,

f_irepresenting the identification characteristics of the bearing fault diagnosis model on the training sample,

。

computing directional similarities between training sample features

A＝(F·F^T)/.(||F||·||F||^T)

Wherein: f^TRepresents the transpose of the feature matrix, | F | | | represents the vector composed of the modulus of each sample feature, and

；||F||^Trepresenting a transpose of the corresponding vector; a represents a directional similarity matrix between sample features, and

(ii) a V. represents the matrix corresponding bit division.

Calculating directional similarity between each sample feature and the classification weight

B＝(F·W)/.(||F||·||W||)

Wherein: b represents a directional similarity matrix between each sample feature and each classification weight, and

。

converting the sample real label into 'one-hot coding' form

y_i＝onehot(y_i)

Wherein: y is_iRepresents a one-hot encoding of the sample label, and

(ii) a onehot represents the one-hot transcoding function.

Integrating the one-hot codes of all training samples to form a training sample real label one-hot code matrix: y ═ Y₁；y₂；...；y_N]。

Calculating the direction similarity mean value of each sample characteristic and other similar sample characteristics

A_mean＝mean_row(A⊙(Y·Y^T-eye(N)))

Wherein: a. the_meanRepresenting a mean matrix of the directional similarities of each sample feature with other sample features of the same class, an

(ii) a mean _ row represents the mean of non-0 elements in a row of the matrix; eye (N) represents an identity matrix of size N × N; an indication of a corresponding element multiplication.

Calculating the distribution mean value of the direction similarity mean value of each sample characteristic of each category and other sample characteristics of the same category

A_{mean_mean}＝(Y·Y^T·A_mean)/.sum_row(Y·Y^T)

Wherein: a. the_{mean_mean}The distribution mean value of the direction similarity mean value of each sample characteristic of each category and other sample characteristics of the same category is expressed, and

。

calculating the directional similarity of each sample feature and the corresponding class classification weight

B_sum＝sum_row(B⊙Y)

Wherein: b is_sumIndicating the directional similarity of each sample feature to the corresponding class classification weight,

(ii) a sum _ row represents summing the non-0 elements in a row of the matrix.

Calculating error weights of all samples:

λ＝sigmoid(A_mean+A_{mean_mean}×(sigmoid(B_sum-a)-0.5)-b)

wherein: a and b are hyper-parameters. According to the calculation process of the weight, it can be seen that: when the direction similarity of the sample and the similar sample is smaller, the error weight of the sample is larger; the sample weight is larger when the directional similarity of the sample to the corresponding weight is smaller.

Calculating errors between the classifier predictive tags and the authentic tags

l represents an error function including, but not limited to, mean square error, root mean square error, cross entropy, etc.; y is_iA real label representing the sample i,

a prediction tag representing sample i.

Taking root mean square as an example, the error between the predicted label and the real label of the classification module is:

weighting the error of each sample, and solving the total error of the group of training samples:

wherein: mean represents the averaging of all elements of the matrix.

4.3) inputting the weighting result into a back propagation algorithm to update the model.

5) And (4) circularly executing the step (4) until the model converges or reaches a specified circulation number.

6) Will verify the sample D_testInputting the data into the model, and verifying the validity of the model.

Through the training process, the sample training weight is obtained through calculation according to the convergence characteristics of the sample, the classification characteristics of the sample and the classification characteristics of the class are considered in the calculation, and compared with the traditional method adopting fixed weight, the self-adaptive capacity is higher, the training of the fault diagnosis model focuses more on the unconverged sample, so the convergence speed is higher, and the dominance of most classes in the unbalanced data on the training is weakened particularly.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Although the terms training samples, validation samples, bearing fault diagnosis models, predictive labels, authenticity labels, etc. are used more often herein, the possibility of using other terms is not excluded. These terms are used merely to more conveniently describe and explain the nature of the present invention; they are to be construed as being without limitation to any additional limitations that may be imposed by the spirit of the present invention.

Claims

1. A self-adaptive weighting training method for a bearing fault diagnosis model sample under unbalanced data is characterized by comprising the following steps:

s1, obtaining a training sample;

s2, constructing a bearing fault diagnosis model;

2. The method for adaptively weighting and training samples of the diagnosis model of unbalance data under bearing fault according to claim 1, wherein in step S1, the training samples and the verification samples are obtained simultaneously, and step S7 is followed by further comprising:

3. The method for adaptive weighted training of samples of imbalance data lower bearing fault diagnosis models according to claim 2, wherein in step S1, a raw signal during the operation of the bearing is collected by the data collection system and/or the sensor collection system, and the raw signal is divided into the training samples and the verification samples.

4. The imbalance data lower bearing fault diagnosis model sample adaptive weighting training method according to claim 3, wherein in step S1, the original signal is intercepted with a determined length and divided into the training sample and the verification sample.

5. The method for adaptively weighting and training the samples of the imbalance data lower bearing fault diagnosis model according to claim 4, wherein before the step S2, the training samples and the verification samples are subjected to data preprocessing including fast Fourier transform and normalization.

6. The method for sample adaptive weighted training of the bearing fault diagnosis model under the unbalanced data as claimed in claim 5, wherein the bearing fault diagnosis model comprises a feature extraction module and a classification module; in step S3, feature extraction is performed by the feature extraction module, and the calculation process of feature extraction is as follows:

F＝h(X)

Wherein: x ═ X₁；x₂；...；x_N]Represents a set of training sample compositions, an

h represents the nonlinear mapping of the fault diagnosis model from input to features; f ═ F₁；f₂；...；f_N]Represents a set of training sample features, and

7. the method for sample adaptive weighted training of imbalance data lower bearing fault diagnosis models according to claim 6, wherein in step S4, the classification module classifies the features, and step S4 specifically includes:

Wherein: Ψ denotes the activation function or functions which,

W represents the classification weight of the classification module, and

w represents a vector composed of the modules of the classification weights,

8. the method for adaptive weighted training of samples of an imbalance data lower bearing fault diagnosis model according to claim 7, wherein in step S5, the weight of each training sample is calculated based on the direction similarity.

9. The method for adaptively weighting and training the samples of the imbalance data lower bearing fault diagnosis model according to claim 8, wherein in step S5, the weight of each training sample is calculated by:

calculating directional similarity between training sample features:

A＝(F·F^T)/.(||F||·||F||^T)

wherein: i F I represents a vector composed of the modules of the characteristics of each sample, and

||F||^Trepresenting a transpose of the corresponding vector; a represents a directional similarity matrix between the sample features, and

v. representing the matrix corresponding bit division;

B＝(F·W)/.(||F||·||W||)

converting the sample real label into 'one-hot coding' form

y_i＝onehot(y_i)

Wherein: y is_iRepresents a one-hot encoding of the sample label, and

onehot represents a one-hot transcoding function;

A_mean＝mean_row(A⊙(Y·Y^T-eye(N)))

mean _ row represents the mean of non-0 elements in a row of the matrix; eye (N) represents an identity matrix of size N × N; an indication of a corresponding element multiplication;

A_{mean_mean}＝(Y·Y^T·A_mean)/.sum_row(Y·Y^T)

wherein: a. the_{mean_mean}The distribution mean value of the direction similarity mean value of each sample characteristic of each category and other sample characteristics of the same category, and

B_sum＝sum_row(B⊙Y)

wherein:

sum _ row represents summing the non-0 elements in a row of the matrix;

λ＝sigmoid(A_mean+A_{mean_mean}×(sigmoid(B_sum-a)-0.5)-b)

wherein: a and b are hyper-parameters.

10. The imbalance data lower bearing fault diagnosis model sample adaptive weighting training method according to claim 9, wherein in step S5, the error between the prediction label and the real label is calculated by the following formula

l denotes an error function, y_iA real label representing the sample i,

a prediction tag representing sample i;

in step S6, each sample error is weighted by the following formula