CN111723527B

CN111723527B - Method for predicting residual life of gear based on cocktail long-short-term memory neural network

Info

Publication number: CN111723527B
Application number: CN202010599670.5A
Authority: CN
Inventors: 秦毅; 项盛; 陈定粮
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2024-04-16
Anticipated expiration: 2040-06-28
Also published as: CN111723527A

Abstract

The invention relates to a method for predicting the residual life of a gear based on a cocktail long-short-term memory neural network, and belongs to the technical field of automation. The method comprises the following steps: s1: constructing gear health indexes based on variation self-coding; s2: defining a cocktail long-term and short-term memory network C-LSTM; s3: and predicting the residual life of the gear based on the health index constructed by the VAE and the C-LSTM. According to the method, firstly, a health index capable of accurately showing the degradation trend of the health state of the gear is formed based on a variation self-encoder (VAE), then, unknown health indexes are predicted step by step according to a proposed cocktail long-short-term neural network, and the predicted RUL can be obtained when the unknown health index reaches a set threshold.

Description

Method for predicting residual life of gear based on cocktail long-short-term memory neural network

Technical Field

The invention belongs to the technical field of automation, and relates to a method for predicting the residual life of a gear based on a cocktail long-period and short-period memory neural network.

Background

Gears are widely used in industry as a key component, such as wind turbines, automobiles, aircraft engines, etc. Gear faults, such as pitting, peeling and other fatigue damage, often lead to cascading failure reactions of the whole equipment, cause machine shutdown, even cause casualties when serious, and cause huge economic loss and safety crisis. The definition of the Remaining Useful Life (RUL) of a gear is the length of time from the current moment to the end of its useful life, and is one possible strategy to determine equipment maintenance plans and avoid unexpected gear failures. The service life prediction of the in-service gear is carried out, so that the maintenance time of equipment can be effectively determined, the production efficiency is improved, the continuous and efficient production is ensured, the accident rate is reduced, the occurrence of sudden accidents is prevented, and the method has great significance for engineering production.

Because of great significance to industrial production, the method is more and more focused by scholars and researchers, and therefore, an online small sample-oriented intelligent prediction method for the residual service life of the gear is provided.

Disclosure of Invention

Accordingly, the present invention is directed to a method for predicting the remaining life of gears based on a cocktail long-term memory neural network.

In order to achieve the above purpose, the present invention provides the following technical solutions:

the method for predicting the residual life of the gear based on the cocktail long-term and short-term memory neural network comprises the following steps:

s1: constructing gear health indexes based on variation self-coding;

s2: defining a cocktail long-term and short-term memory network C-LSTM;

s3: and predicting the residual life of the gear based on the health index constructed by the VAE and the C-LSTM.

Optionally, the S1 specifically is:

s11: principle of analysis of variational self-encoder

The variational self-coding model is that the frame is composed of a coding layer simulating posterior distribution and a decoding layer simulating anterior distribution by variational Bayesian reasoning; for normally randomly generated data setsComprising a non-observable continuous random variable vector Z; the encoder obtains each group of data x by learning the data ⁱ Proprietary posterior distribution q (z ⁱ |x ⁱ ) The method comprises the steps of carrying out a first treatment on the surface of the The objective function of the variational automatic encoder is written as:

wherein D is _KL (q||p) represents the Kullback-Leibler divergence, a measure of the difference between one probability distribution and another, λ is the weight of the reconstruction error,reconstruction errors of the self-code; q (z) ⁱ |x ⁱ ) Obeys a normal distribution q (z) ⁱ |x ⁱ )＝N(z ⁱ ；μ ⁱ ,(σ ⁱ ) ² )，p(z ⁱ ) A multivariate gaussian distribution N (z) arranged to be central isotropic ⁱ The method comprises the steps of carrying out a first treatment on the surface of the 0, 1); the reconstruction error adopts a mean square error; the objective function of the above is rewritten as:

wherein J is x ⁱ Is used for the number of dimensions of (c),is defined as x ⁱ The j-th element of (a);

s12: construction of health index

1) Extracting time domain and frequency domain features from vibration signals of the gear; then, inputting the extracted features into the VAE for further information fusion and dimension reduction; first, the features extracted above are normalized from each feature by removing the mean and scaling to unity variance:

wherein x is ^i,j The ith data point representing the jth feature,represents x ^i,j Is a normalized value of (2); mu (mu) ^j Sum sigma ^j Mean and standard deviation of the j-th feature;

2) Secondly to ensure that the effect of a single node is neither divergent nor divergentConvergence to make the VAE initial weight follow uniform distribution

3) Finally, defining the network structure of the variation self-coding used for constructing the health value target as 21-10-1-10-21; wherein the self-encoder input layer is 21, corresponding to the extracted feature dimension; the hidden layer is 10; the output layer is 1, and the dimension of the health index is correspondingly output; on the contrary, the decoder reduces the reconstruction error of the input of the encoder and the output of the decoder based on BP rule, thereby achieving the purpose of optimizing the network weight and further training the complete VAE.

Optionally, the S3 specifically is:

s31: collecting vibration signals of the gear with the time length of T at intervals of delta T until the gear fails, wherein the number of sampled vibration signal segments is n;

s32: respectively calculating 21 time-frequency characteristics after noise reduction of the vibration signals, inputting the 21 time-frequency characteristics into a trained VAE, and regularizing the output to obtain an n multiplied by 1 health index matrix X;

s33: selecting a health index matrix x1= [ y ] consisting of the previous m sampling points ₁ ,y ₂ ,…,y _m ] ^T As a training matrix;

s34: reconstruction matrix

S35: taking k rows in front of the matrix U as the input of the neural network, and taking the last row as the output of the neural network to train the network;

s36: taking k outputs of the reciprocal as network inputs to obtain the output of the next moment;

s37: repeating the steps S34-S36 for a certain number of times, and comparing the output signals with the actual health index value after inverse normalization to prove the effectiveness of the method; meanwhile, when the output inverse normalization is performed and then exceeds a set threshold value, the sum delta t+T of the predicted sampling point number multiplied by the vibration signal interval time and the sampling time is the residual life of the gear.

Optionally, in the C-LSTM, according to the level of the information, the level dividing points are respectively defined as L _l 、L _lm 、L _mh 、L _h Wherein L is _l Partitioning points, L, for short-term information _lm Dividing points and L for short and medium term information _mh Dividing points, L for medium-long term information _h Dividing points for long-term information; according to the current input information x _t And recursive information h _t-1 The hierarchical information is calculated as follows:

L _h ＝F ₁ (x _t ,h _t-1 )＝indexmax(softmax(W _f1 x _t +U _f1 h _t-1 +b _f1 )) (11)

L _mh ＝F ₂ (x _t ,h _t-1 )＝indexmax(softmax(W _f2 x _t +U _f2 h _t-1 +b _f2 )) (12)

L _lm ＝F ₃ (x _t ,h _t-1 )＝indexmax(softmax(W _i2 x _t +U _i2 h _t-1 +b _i2 )) (13)

L _l ＝F ₄ (x _t ,h _t-1 )＝indexmax(softmax(W _i1 x _t +U _i1 h _t-1 +b _i1 )) (14)

wherein W, U are weights of input information and historical information respectively, b is a threshold value, and softmax is a softmax function;

to make the parameters learnable, the segmentation point evaluation process is softened,

d _L1 ＝softmax(W _f1 x _t +U _f1 h _t-1 +b _f1 ) (15)

d _L2 ＝softmax(W _f2 x _t +U _f2 h _t-1 +b _f2 ) (16)

d _L3 ＝softmax(W _i2 x _t +U _i2 h _t-1 +b _i2 ) (17)

d _L4 ＝softmax(W _i1 x _t +U _i1 h _t-1 +b _i1 ) (18)

memory cell c _t Updating L based on the hierarchical relationship formed by the hierarchical points _lm 、L _mh The interaction relation between the L and the L exists _l And L is equal to _h On the premise of interaction;

based on L _l 、L _lm 、L _mh And L _h There are 3 ways of updating the interrelationship between:

1) When L _l ≥L _lm ≥L _mh ≥L _h When the information is overlapped with the short-term information, the short-term memory and the medium-term memory are overlapped; the low-level information is directly replaced by the current input, the high-level information is reserved for a long time, the middle-level information comprises middle-low, middle-high and medium-high-level information, the mixture of different proportions is obtained, and the cell memory unit c _t Updated by the following rules:

wherein z is ₁ And z ₂ For the learnable proportion parameters, the proportion of middle-level information in a middle-low level and a middle-high level is represented respectively, k is defined as the number of hidden layer units, and the meaning of other parameters is consistent with LSTM;

2) When L _l ≥L _mh ≥L _lm ≥L _h When the long-term information and the short-term information are overlapped, the short-term memory and the middle-term memory are not overlapped; the low-level information is directly replaced by the current input, the high-level information is reserved for a long time, the middle-low and middle-high are mixed in different proportions, the middle-level information is zeroed due to no information interaction, and the cell memory unit c _t Updated by the following rules:

3) When L _l ≤L _h When the current input level is lower than the historical data level, the current input level is long-termThe information and the short-term information have no overlapping basis; the low-level information is directly replaced by the current input, the high-level information is reserved for a long time, the middle-level information is set to zero due to no information interaction, and the cell memory unit c _t Updated by the following rules:

with the use of multi-level information partitioning and corresponding updating rules, a variant structure of an LSTM neural network, C-LSTM, is derived as follows:

wherein the method comprises the steps ofRepresents a low level +.>Representing middle and low levels,/->Represents middle level->Representing middle-high hierarchy>Represents a high level, and-> Representing a main forgetting door, a secondary forgetting door, a main input door and a secondary input door respectively.

The invention has the beneficial effects that: according to the method, firstly, a health index capable of accurately showing the degradation trend of the health state of the gear is formed based on a variation self-encoder (VAE), then, unknown health indexes are predicted step by step according to a proposed cocktail long-short-term neural network, and the predicted RUL can be obtained when the unknown health index reaches a set threshold.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram of a VAE neural network;

FIG. 2 is a block diagram of a VAE neural network;

FIG. 3 is a block diagram of a C-LSTM neural network;

FIG. 4 is a schematic diagram of an hidden layer of a C-LSTM neural network;

FIG. 5 is when L _l ≥L _lm ≥L _mh ≥L _h A time hidden layer update mechanism;

FIG. 6 is when L _l ≥L _mh ≥L _lm ≥L _h A time hidden layer update mechanism;

FIG. 7 is when L _h ≥L _l A time hidden layer update mechanism;

FIG. 8 is a health index graph of an experimental dataset;

FIG. 9 shows failure threshold values, training values, predicted values and actual values when 90 HIs are predicted;

FIG. 10 shows failure threshold, training value, predicted value and actual value when 70 pieces HIs are predicted;

FIG. 11 shows failure threshold, training value, predicted value and actual value when 50 pieces HIs are predicted;

FIG. 12 is a graph of MAE for model predictive power at various sample points;

FIG. 13 is a comparison of predictive power of different models in predicting 60 health indicators.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.

Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are terms such as "upper", "lower", "left", "right", "front", "rear", etc., that indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but not for indicating or suggesting that the referred device or element must have a specific azimuth, be constructed and operated in a specific azimuth, so that the terms describing the positional relationship in the drawings are merely for exemplary illustration and should not be construed as limiting the present invention, and that the specific meaning of the above terms may be understood by those of ordinary skill in the art according to the specific circumstances.

Gear health index construction based on variation self-coding

1. Principle of variational self-encoder

The VAE inherits the basic structure of the automatic encoder. As shown in fig. 1, the variational self-coding model is essentially a variational bayesian reasoning the framework consisting of a coding layer simulating a posterior distribution and a decoding layer simulating an anterior distribution. For normally randomly generated data setsComprising a continuous random variable vector Z that is unobservable. The encoder obtains each group of data x by learning the data ⁱ Proprietary posterior distribution q (z ⁱ |x ⁱ ). Therefore, the objective function of the variational auto-encoder can be written as:

wherein D is _KL (q||p) represents the Kullback-Leibler divergence, which is a measure of the difference between one probability distribution and another, λ is the weight of the reconstruction error,reconstruction errors from the codes. q (z) ⁱ |x ⁱ ) Obeys a normal distribution q (z) ⁱ |x ⁱ )＝N(z ⁱ ；μ ⁱ ,(σ ⁱ ) ² )，p(z ⁱ ) A multivariate gaussian distribution N (z) arranged to be central isotropic ⁱ The method comprises the steps of carrying out a first treatment on the surface of the 0,1). The reconstruction error adopts a mean square error. The objective function for the above equation can be rewritten as:

wherein J is x ⁱ Is used for the number of dimensions of (c),is defined as x ⁱ The j-th element of (a).

2. Construction of health index

1) 21 time domain and frequency domain features such as peak-to-peak value, mean value, variance and the like are extracted from vibration signals of the gears. These extracted features are then input into the VAE for further information fusion and dimension reduction. First, the features extracted above are normalized from each feature by removing the mean and scaling to unity variance:

wherein x is ^i,j The ith data point representing the jth feature,represents x ^i,j Is a normalized value of (2); mu (mu) ^j Sum sigma ^j Mean and standard deviation of the j-th feature are shown, respectively.

2) Secondly, to ensure that the effect of a single node is neither divergent nor convergent, the VAE initial weights follow a uniform distribution

3) Finally, the network structure of the variant self-coding used for constructing the health value targets is defined as 21-10-1-10-21, and the network structure is shown in figure 2. Wherein the self-encoder input layer is 21, corresponding to the extracted feature dimension; the hidden layer is 10; the output layer is 1, and corresponds to the dimension of the health index to be output. On the contrary, the decoder reduces the reconstruction error of the input of the encoder and the output of the decoder based on BP rule, thereby achieving the purpose of optimizing the network weight and further training the complete VAE.

Cocktail long term memory network (C-LSTM):

the neurons are unordered in the general neural network training process, and the neurons are orderly arranged according to a certain rule based ON a long-short-term memory network (ON-LSTM) of the orderly neurons and have a certain hierarchical structure, so that hierarchical information-sequential information which cannot be learned by the general neural network can be used. To further enhance the ordering of neurons, the cocktail long-short-term memory network (C-LSTM) considers the information interaction between two adjacent levels, further dividing the mixed (middle) level of ON-LSTM into three levels, low-medium, medium-high. The multi-level division enables the neural network to more fully utilize the ordered information, so that the degradation information of the neural network to the health index is more fully mined and used, and the prediction capability of the gear residual life of the model is further improved.

High-level information represents more important information that needs to be kept in the network for a long period of time, so is also called long-term information. In contrast, low-level information represents less important information, which is updated only in the short-term presence in the network at any time, and is also referred to as short-term information. When the high-level information and the low-level information are interacted, the updating mode of the middle-level information is consistent with the LSTM updating mode, the middle-level information and the low-level information are mixed in proportion, the middle-level information and the high-level information are mixed in proportion, and the middle-level information, the short-medium-term information and the medium-long-term information are represented respectively; when no interaction exists, the middle level, the middle and low levels and the middle and high levels are set to zero, and related information does not participate in updating the network.

Therefore, the invention provides a new LSTM with multi-level updating rules, which is named as cocktail LSTM because the information updating mode is well-defined and the layers are provided with mixed zones like cocktails. Compared with ON-LSTM, the C-LSTM can use the sequence information more fully, so that the network structure is updated reasonably. The neural network model is shown in fig. 3.

The hidden layer structure of the neural network is shown in fig. 4.

The flow chart of the ordered neuron multi-layered update is shown in fig. 5-7. FIG. 5 is when L _l ≥L _lm ≥L _mh ≥L _h A time hidden layer update mechanism; FIG. 6 is when L _l ≥L _mh ≥L _lm ≥L _h A time hidden layer update mechanism; FIG. 7 is when L _h ≥L _l Implicit layer update mechanism.

Gear residual life prediction method flow based on health index constructed by VAE and C-LSTM

The gear residual life prediction method based on the combination of VAE and LSTMPP comprises the following steps:

31. and acquiring a gear vibration signal with the time length of T at intervals of delta T until the gear fails, wherein the number of sampled vibration signal segments is n.

32. And respectively calculating 21 time-frequency characteristics of the vibration signals after noise reduction, inputting the 21 time-frequency characteristics into the trained VAE, and regularizing the output to obtain the health index matrix X with n multiplied by 1 dimensions.

33. Selecting a health index matrix x1= [ y ] consisting of the previous m sampling points ₁ ,y ₂ ,…,y _m ] ^T As a training matrix.

34. Reconstruction matrix

35. The first k rows of the matrix U are taken as inputs to the neural network and the last row is taken as an output of the neural network to train the network.

36. And taking the k outputs of the reciprocal as network inputs to obtain the output of the next moment.

37. Repeating the steps 34-36 for a certain number of times, and comparing the output after the inverse normalization with the actual health index value to prove the effectiveness of the method. Meanwhile, when the output inverse normalization is performed and then exceeds a set threshold value, the sum delta t+T of the predicted sampling point number multiplied by the vibration signal interval time and the sampling time is the residual life of the gear.

The above is a proposed neural network model and prediction method, and the following is a part of experimental results that have demonstrated the effectiveness of the method.

The experiment adopts a mode of first-stage transmission acceleration and second-stage transmission deceleration, so that the transmission ratio of the experiment gearbox is just 1:1. The experimental gear is made of 40Cr, the machining precision is 5 grades, the surface hardness is 55HRC, and the modulus is 5. In particular, the number of teeth of the large gear is 31, the number of teeth of the small gear is 25, and the width of the first stage transmission gear is 21mm. The gear life test resulted in 4 life cycle data sets of vibration signals for both conditions as shown in table 1. The sampling interval was set to 60s, the sampling time was set to 10s, and the sampling frequency was set to 50000Hz. The VAE is trained using data set 1 and data set 3 to enable construction of a Health Indicator (HIs). Data set 2 and data set 4 are then encoded with the trained VAEs to construct HIs.

Table 1 table of data related information on gear life

Since most of the samples obtained do not contain gear degradation information for stationary and early failure phases, these data need not be used to predict rules. Thus, the present invention uses only a portion of the samples in the lifecycle data set, calculates HIs using the VAE, and applies it to the predictions of gear RUL. HIs of all four gear data sets obtained is shown in fig. 8. From the figure, the constructed HIs can well reflect the deterioration trend of the gear health condition, and greatly helps to life prediction. In addition, the HI curves of all gears have descending trend, and the failure thresholds are similar, so that unified failure thresholds are set for different experiments, and the robustness of RUL prediction of the gears is improved. And the proposed HI fault threshold fluctuates less than other health indicators such as RMS and frequency center. Thus, HI with VAE construction can effectively improve the robustness of gear RUL predictions.

To illustrate the long-term and short-term predictive capabilities of C-LSTM, we use data set 1 to predict 90, 70 and 50 HI points, respectively, with the prediction results shown in FIGS. 9-11.

It is apparent that as the number of known points increases, the predictive power increases gradually, and the predicted value becomes closer to the actual value. In the above case, the degradation trend of the predicted curve is similar to that of the actual curve, and the gear rule can be estimated well. Then we calculate the Mean Absolute Error (MAE) to evaluate the predictive power of C-LSTM, and we get MAEs by known prediction experiments at different HI points, as shown in FIG. 12. It can be concluded that MAE decreases with increasing number of known HI points. When the number of the predicted points is 50, the percent error of RUL prediction is calculated to be 9%, namely the real RUL is 50min, and for the predictions of 70 HI points, the percent error of RUL prediction is calculated to be 16%. The C-LSTM has long-term prediction capability and good prediction precision. To further demonstrate the long-term predictive capability of C-SLTM, we tried to predict 90 HIs, i.e., one half hour for a true RUL. The error percentage of the RUL prediction was calculated to be 36.4%. From this, it is clear that the C-SLTM still has a certain predictive power even if the number of samples is known to be small.

In addition, to fully demonstrate the superiority of the proposed C-LSTM neural network, four evaluation criteria, i.e., MAE (mean absolute error), NRMSE (standard root mean square error), score (scoring function of american society of electrical and electronic engineers for life prediction performance), mean absolute error percent (MAPE), were compared with the conventional LSTM and its variant model, respectively. Compared with the traditional LSTMs, the method has higher prediction precision, as shown in figure 13.

As can be easily seen from fig. 13, ON-LSTM and C-LSTM can fully capture health information contained in input data due to their mining of layer sequence information, so that local optimization is easier to obtain, and compared with the original LSTN and its variants, high-precision prediction of the remaining service life of the gear is easier to implement. In addition, the C-LATM performs multi-level sequencing ON the nerve units, so that the layer sequence information is fully and reasonably mined and used, and the prediction performance of the C-LATM is improved to a certain extent compared with that of the ON-LSTM.

Multilevel guided ordered neuronal process

A new memory cell update rule is proposed in the C-LSTM through a multi-level hierarchical information steering mechanism. By adopting a new updating mechanism, the C-LSTM has multi-level updating characteristics, so that the use sequence information can be more fully mined, and the method is more suitable for time sequence prediction than the common LSTM and variants thereof.

Through a multi-layered mechanism, the cocktail LSTM proposes a new way of updating memory cells. According to the level of the information, respectively defining the level division points as L _l 、L _lm 、L _mh 、L _h Wherein L is _l Partitioning points, L, for short-term information _lm Dividing points and L for short and medium term information _mh Dividing points, L for medium-long term information _h Points are partitioned for long-term information. According to the current input information x _t And recursive information h _t-1 The hierarchical information is calculated as follows:

wherein W, U are weights of input information and history information respectively, b is a threshold value, and softmax is a softmax function.

d _L1 ＝softmax(W _f1 x _t +U _f1 h _t-1 +b _f1 ) (15)

d _L2 ＝softmax(W _f2 x _t +U _f2 h _t-1 +b _f2 ) (16)

d _L3 ＝softmax(W _i2 x _t +U _i2 h _t-1 +b _i2 ) (17)

d _L4 ＝softmax(W _i1 x _t +U _i1 h _t-1 +b _i1 ) (18)

memory cell c _t Updating the hierarchical relationship formed by the hierarchical points, notably L _lm 、L _mh The interaction relation between the L and the L exists _l And L is equal to _h On the premise of interaction. Taking into account L _l 、L _lm 、L _mh And L _h The interrelationship between them, so there are only 3 ways of updating:

1) When L _l ≥L _lm ≥L _mh ≥L _h In this case, the short-term and medium-term memories overlap with each other on the basis of the overlapping of the long-term and short-term information. The low-level information is directly replaced by the current input, the high-level information is reserved for a long time, and the middle-level information (middle-low, middle-high) is mixed in different proportions, so that the cell memory unit c _t Updated by the following rules:

wherein z is ₁ And z ₂ Respectively, are learnable proportion parameters, respectively represent the proportion of middle-level information in the middle-low level and the middle-high level, k is defined as the number of hidden layer units, and other parameter meanings are consistent with LSTM.

2) When L _l ≥L _mh ≥L _lm ≥L _h In this case, the short-term and medium-term memories are not overlapped with each other on the basis of the overlapping of the long-term and short-term information. The low-level information is directly replaced by the current input, the high-level information is reserved for a long time, the middle-low and middle-high are mixed in different proportions, the middle-level information is set to zero due to no information interaction, and therefore the cell memory unit c _t Updated by the following rules:

the meaning of the parameters is consistent with the above.

3) When L _l ≤L _h When the current input level is lower than the historical data level, there is no overlapping basis for long-term information and short-term information. The low-level information is directly replaced by the current input, the high-level information is reserved for a long time, and the middle-level information is set to zero due to no information interaction, so the cell memory unit c _t Updated by the following rules:

with the partitioning of multi-level information and the use of corresponding update rules, a variant structure of an LSTM neural network, C-LSTM, is derived as follows:

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims

1. The method for predicting the residual life of the gear based on the cocktail long-short-term memory neural network is characterized by comprising the following steps of: the method comprises the following steps:

s1: constructing gear health indexes based on variation self-coding;

s2: defining a cocktail long-term and short-term memory network C-LSTM;

s3: predicting the residual life of the gear based on the health index constructed by the VAE and the C-LSTM;

the S1 specifically comprises the following steps:

s11: principle of analysis of variational self-encoder

wherein D is _KL (q||p) represents the Kullback-Leibler divergence, a measure of the difference between one probability distribution and another, λ is the weight of the reconstruction error,reconstruction errors of the self-code; q (z) ⁱ |x ⁱ ) Obeys a normal distribution q (z) ⁱ |x ⁱ )＝N(z ⁱ ；μ ⁱ ,(σ ⁱ ) ² )，p(z ⁱ ) A multivariate gaussian distribution N (z) arranged to be central isotropic ⁱ The method comprises the steps of carrying out a first treatment on the surface of the 0, 1); the reconstruction error adopts mean square errorDifference; the objective function of the above is rewritten as:

s12: construction of health index

3) Finally, defining the network structure of the variation self-coding used for constructing the health value target as 21-10-1-10-21; wherein the self-encoder input layer is 21, corresponding to the extracted feature dimension; the hidden layer is 10; the output layer is 1, and the dimension of the health index is correspondingly output; on the contrary, the decoder reduces the reconstruction error of the input of the encoder and the output of the decoder based on BP rule, thereby achieving the purpose of optimizing the network weight and further training the complete VAE;

in the C-LSTM, according to the level of the information, the level division points are respectively defined as L _l 、L _lm 、L _mh 、L _h Wherein L is _l Partitioning points, L, for short-term information _lm Dividing points and L for short and medium term information _mh Dividing points, L for medium-long term information _h Dividing points for long-term information; according to the current input information x _t And recursive information h _t-1 The hierarchical information is calculated as follows:

d _L1 ＝softmax(W _f1 x _t +U _f1 h _t-1 +b _f1 ) (15)

d _L2 ＝softmax(W _f2 x _t +U _f2 h _t-1 +b _f2 ) (16)

d _L3 ＝softmax(W _i2 x _t +U _i2 h _t-1 +b _i2 ) (17)

d _L4 ＝softmax(W _i1 x _t +U _i1 h _t-1 +b _i1 ) (18)

3) When L _l ≤L _h When the current input level is lower than the historical data level, a foundation of no overlapping exists between long-term information and short-term information; the low-level information is directly replaced by the current input, the high-level information is reserved for a long time, the middle-level information is set to zero due to no information interaction, and the cell memory unit c _t Updated by the following rules:

2. The method for predicting the remaining life of a gear based on a cocktail long-term memory neural network of claim 1, wherein: the step S3 is specifically as follows:

s32: respectively calculating 21 time-frequency characteristics after noise reduction of the vibration signals, inputting the 21 time-frequency characteristics into a trained VAE, regularizing the output, and obtaining an n multiplied by 1 health index matrix X;

s34: reconstruction matrix