CN109766745B

CN109766745B - Reinforced learning tri-state combined long-time and short-time memory neural network system and training and predicting method

Info

Publication number: CN109766745B
Application number: CN201811393984.9A
Authority: CN
Inventors: 李锋; 陈勇; 田大庆
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2018-11-22
Filing date: 2018-11-22
Publication date: 2022-12-13
Anticipated expiration: 2038-11-22
Also published as: CN109766745A

Abstract

The invention relates to a reinforcement learning three-state combined long-short term memory neural network system, which comprises a long-short term memory neural network, a reinforcement learning unit and a monotonous trend identifier, wherein the monotonous trend identifier judges a trend state of an input time sequence, the reinforcement learning unit selects a long-short term memory neural network with the number of hidden layers and the number of hidden nodes matched with a change rule of the long-short term memory neural network according to the trend state of the input time sequence, the trend state of the input time sequence comprises three states, and each trend state corresponds to the long-short term memory neural network with the number of hidden layers and the number of hidden nodes matched with the change rule of the long-short term memory neural network. The method judges the trend of the input time sequence, selects and executes the optimal action according to the updated Q value set, enhances the network generalization capability, and has higher prediction precision on the degradation trend of the rotating machinery; and the reward is calculated by outputting the error, so that the convergence speed of the network is improved, and the calculation efficiency of the system is improved.

Description

Reinforced learning tri-state combined long-time and short-time memory neural network system and training and predicting method

Technical Field

The invention relates to the technical field of neural networks, in particular to a reinforced learning tri-state combined long-time memory neural network system and a training and predicting method.

Background

The rotating machine is one of the most widely applied components in mechanical equipment in the fields of civil use and national defense, the rotating machine can gradually degrade in the long-term operation process, the residual service life can gradually decrease, and catastrophic accidents are often brought by the occurrence of faults, so that great economic loss and serious social influence are caused. At present, an optical time maintenance system is generally adopted for the maintenance of the rotating machinery by industrial enterprises, namely, the equipment is periodically overhauled no matter whether the equipment has a fault or not, although the maintenance system is strong in planning, the time and the space are occupied, a large number of spare parts need to be stored, a large amount of funds are consumed, and the insufficient maintenance or the over maintenance of the rotating machinery equipment is easily caused by overlong or overlong maintenance period intervals. Therefore, it is an important and urgent subject to plan and target prediction of the state degradation trend of the rotating machine and to take appropriate measures to prevent catastrophic accidents before the rotating machine fails.

In recent years, much research is carried out at home and abroad aiming at the theory of predicting the state degradation trend of the rotary machine, various models, new algorithms and new technologies are proposed and introduced into the research of predicting the state degradation trend, and the prediction is classified into four types after induction and summary: a prediction method based on a physical model, a prediction method based on statistical experience, a prediction method based on knowledge, a prediction method based on data driving. The physical model-based prediction method estimates degradation data of the equipment according to a mathematical formula of physical behaviors of materials in the degradation process of the equipment. The common methods include stress and strain methods, field intensity energy methods, fracture mechanics methods, and the like. However, in practice, a failure physical model of the equipment or the component is difficult to establish, and a certain deviation exists between the established model and the real model, so that the prediction result is poor. The method based on statistical experience is that product failure data is obtained through a large number of life tests, and then according to statistical analysis criteria, a proper life statistical distribution model is selected to carry out 'fitting' on the failure data, so that the characteristic distribution of the product life is obtained. The method is based on probability distribution of similar events, influence of factors such as external load and environment on individuals is not considered in the analysis process, degradation data of mechanical equipment is ignored, and the method has high discreteness, so that the reliability of prediction results is poor. The prediction method based on knowledge is to predict the failure time of the equipment according to the existing knowledge and various reasoning methods. The main methods include expert systems and fuzzy logic. Knowledge-based methods often encounter difficulties in obtaining domain knowledge and converting it to rules, and system models are easily limited by human expert knowledge; fuzzy logic needs to be combined with other methods for prediction, fuzzy rules are not easy to set, learning ability is lacked, and memory ability is not available. The prediction method based on the data is based on various statistical models and machine learning theories, the state degradation trend prediction is carried out through the historical fault data of the equipment and the existing observation data, and the method is independent of any physical or engineering principle and has the greatest advantage. Only characteristic data in the operation process of the equipment need to be collected and stored, and the prediction result only depends on the availability of the data.

The method for predicting the residual life of the rotary machine based on data driving can be subdivided into three categories: the first type is a modern model prediction method, such as Particle Filter (PF); the second type is a numerical analysis prediction method, such as Support Vector Regression (SVR) and the like; the third category is artificial intelligence prediction methods, such as neural networks. However, these methods still have disadvantages, such as, for PF, the resampling stage can cause loss of sample validity and diversity, resulting in sample depletion. The kernel function type and kernel parameters of the SVR are still difficult to accurately set, so that the prediction result is also uncertain. The selection of the number of hidden layers and the number of nodes of the artificial neural network is not guided by a mature theory, and the selection is generally carried out according to experience, so that the prediction accuracy and the calculation efficiency of the model are not ideal.

As a machine learning method for solving sequential decisions, reinforcement learning adopts a continuous 'interaction-trial and error' mechanism to realize continuous interaction of the Agent and the environment, so that an optimal strategy for completing tasks is learned, and a behavior decision mode for improving intelligence of human is conformed. Aiming at the problem that the number of hidden layers and the number of nodes of a neural network are selected according to experience to cause uncontrollable nonlinear approximation capability and generalization performance, the invention provides a Long-time and short-time memory neural network based on Long and Short Time Memory Neural Network (LSTMNN) by combining the advantages of reinforcement learning in the aspect of intelligent decision.

Disclosure of Invention

The invention aims to provide a reinforcement learning three-state combination long-time memory neural network system with high convergence rate, high calculation efficiency and accurate prediction precision and a training and prediction method.

In order to solve the technical problems, the technical scheme of the invention is as follows: a long-short time memory neural network system combining three states of reinforcement learning comprises a long-short time memory neural network, a reinforcement learning unit and a monotone trend recognizer, wherein the long-short time memory neural network comprises an input gate, an output gate, a forgetting gate, a memory unit, a candidate memory unit and a unit output (namely a hidden layer state), the monotone trend recognizer judges the trend state of a time sequence constructed by input, the reinforcement learning unit selects a long-short time memory neural network with the hidden layer number and the hidden node number adaptive to the change rule of the long-short time memory neural network according to the trend state of the input time sequence, the trend state of the input time sequence comprises three states, and each trend state corresponds to the long-short time memory neural network with the hidden layer number and the hidden node number adaptive to the change rule of the long-short time memory neural network.

As a preferred technical solution, the trend state includes an ascending trend state, a descending trend state and a steady trend state.

As a preferred technical solution, the monotonic trend identifier is to identify the input time series x _t ＝[x ₁ ,x ₂ ,…,x _t ] ^T Corresponding point coordinates (1,x) are constructed in a time domain coordinate system ₁ ),(2,x ₂ ),…,(t,x _t ) And obtaining a linear fitting linear equation x = ht + b of the point coordinates by linear fitting the point coordinates, and solving a slope h and an intercept b of the linear fitting equation, then:

1) If it is

The state is a descending trend state;

2) If it is

The state is an ascending trend state;

3) If lambda is more than arctaph and less than mu, the state is a steady trend state;

where λ is a first threshold, μ is a second threshold, λ < 0 and μ > 0.

As a preferred technical solution, the reinforcement learning unit includes an action set for memorizing the neural network for a long time corresponding to the number of hidden layers and the number of hidden nodes, and a Q value corresponding to the trend state and the action thereof; and the reinforcement learning unit selects an action from the action set according to the trend state of the input time sequence, obtains the action in the trend state according to the Q value set and the optimal strategy in the trend state, obtains the long-time memory neural network corresponding to the time sequence in the trend state through the number of hidden layers and hidden nodes corresponding to the action in the action set in the trend state, and calculates the final output of the long-time memory neural network.

A training method for a reinforcement learning tri-state combination long-time memory neural network system comprises the following steps:

calculating the final output of the long-time memory neural network according to the trend state corresponding to the current time sequence, the action executed in the trend state and the long-time memory neural network corresponding to the current time sequence;

calculating the error between the final output and the ideal output, and updating the Q value of the action executed under the trend state in the Q value set according to the error;

and updating the weight and the activity value of each hidden layer of the long-time and short-time memory neural network corresponding to the current time sequence by a random gradient descent method.

A method for predicting degradation tendency of a rotating machine comprises the following steps:

performing feature extraction on vibration data of the rotating machine to obtain singular spectrum entropy of the rotating machine, and taking the singular spectrum entropy as a state degradation feature of the rotating machine after the singular spectrum entropy is subjected to moving average noise reduction processing;

decomposing the singular spectrum entropy into a plurality of training samples, sequentially inputting the training samples as an input time sequence into a reinforcement learning matching long-time memory neural network system, judging the trend state of the input time sequence through a monotonic trend recognizer to obtain a long-time memory neural network corresponding to the trend state, and performing multiple training on the long-time memory neural network;

judging the trend state of the last training sample through a monotonic trend recognizer to obtain a long-time and short-time memory neural network corresponding to the last training sample, obtaining a first singular spectrum entropy predicted value through the long-time memory neural network, combining the first singular spectrum entropy predicted value with the last t-1 singular spectrum entropy values in the last training sample to construct a new training sample, inputting the new training sample into the long-time memory neural network corresponding to the trend state of the new training sample to obtain a second singular spectrum entropy predicted value, and so on to obtain t singular spectrum entropy predicted values, and constructing the t singular spectrum entropy predicted values into a first prediction sample;

and after the trend state of the first prediction sample is judged by the monotonic trend recognizer, the first prediction sample is input into a long-time memory neural network corresponding to the trend state of the first prediction sample to obtain a second prediction sample, V prediction samples constructed by the singular spectrum entropy prediction value are obtained by analogy, and a curve graph of the singular spectrum entropy prediction value is obtained through the prediction samples.

Due to the adoption of the technical scheme, the invention has the beneficial effects that: the method judges the trend (ascending, descending and stable) of the input time sequence through the monotonic trend identifier, respectively represents the state and the action of a Q value set by using the three trends and different hidden layer numbers and hidden node numbers, and selects and executes the optimal action (namely selects the long-time memory neural network with the hidden layer number and the hidden node number which are most matched with each sequence trend unit) according to the updated Q value set by the Agent, thereby enhancing the generalization capability of the network and leading the provided prediction method to have higher prediction precision; in addition, in order to clarify the learning target of reinforcement learning (namely the output error E of the i-LSTMNN is smaller), blind search action of the Agent in the Q value set updating process is avoided, the reward is calculated through the output error, the blind search of the Agent is avoided, the convergence speed of the network is improved, and the provided prediction method has higher calculation efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a topology diagram of a reinforcement learning unit in an embodiment of the present invention;

FIG. 2 is a topological diagram of a long-term and short-term memory neural network model in the embodiment of the present invention;

FIG. 3 is a schematic diagram of a model of a long-time and short-time memory neural network system matched with reinforcement learning units according to an embodiment of the present invention;

FIG. 4 is a flowchart of a method for predicting remaining life of a rotating machine according to an embodiment of the present invention;

FIG. 5 is a plot of singular spectrum entropy for a rotating machine in an embodiment of the present invention;

FIG. 6 is a graph of singular spectrum entropy predicted by a memory neural network when a reinforcement learning unit is matched with a long time;

FIG. 7 is a plot of singular spectral entropy predicted by long and short term memory neural networks;

FIG. 8 is a plot of singular spectral entropy predicted by the multi-kernel least squares support vector machine MK-LSSVM;

FIG. 9 is a plot of the singular spectrum entropy of the GA-BP prediction for the genetic-BP network;

FIG. 10 is a plot of singular spectral entropy predicted by extreme learning machine ELM;

fig. 11 is a comparison graph of the consumption time of the five remaining life prediction methods.

Detailed Description

A long-time and short-time memory neural network system for reinforcement learning tri-state combination comprises a long-time and short-time memory neural network, a reinforcement learning unit and a monotone trend recognizer.

The long-time and short-time memory neural network comprises an input gate, an output gate, a forgetting gate, a memory unit, a candidate memory unit and a unit output (namely a hidden layer state).

The monotone trend recognizer judges the trend state of a time sequence constructed by input, and the reinforcement learning unit selects a long-term memory neural network with the number of hidden layers and the number of hidden nodes adapted to the change rule of the hidden layers according to the trend state of the input time sequence, wherein the trend state of the input time sequence comprises three states, and each trend state corresponds to the long-term memory neural network with the number of hidden layers and the number of hidden nodes adapted to the change rule of the hidden layers.

The trend states include an up trend state, a down trend state, and a steady trend state.

Monotonic trend recognizer will input time series x _t ＝[x ₁ ,x ₂ ,…,x _t ] ^T Corresponding point coordinates (1,x) are constructed in a time domain coordinate system ₁ ),(2,x ₂ ),…,(t,x _t ) And linear fitting the point coordinates to obtain a linear fitting linear equation x = ht + b of the point coordinates, and solving a slope h and an intercept b of the linear fitting equation, then:

1) If it is

The state is a descending trend state;

2) If it is

The state is an ascending trend state;

3) If lambda is less than arctanh and less than mu, the state is a steady trend state;

where λ is a first threshold, μ is a second threshold, λ < 0 and μ > 0.

The reinforcement learning unit comprises an action set for memorizing the neural network in long and short time corresponding to the number of hidden layers and the number of hidden nodes, and a Q value corresponding to the trend state and the action thereof; the reinforcement learning unit selects an action from the action set according to the trend state of the input time sequence, then obtains the action in the trend state according to the Q value set and the optimal strategy in the trend state, obtains the long-time memory neural network corresponding to the time sequence in the trend state through the number of hidden layers and the number of hidden nodes corresponding to the action in the action set in the trend state, and calculates the final output of the long-time memory neural network.

The construction process of the reinforcement learning tri-state combination long-time memory neural network system model is as follows:

1. reinforced learning unit

Reinforcement learning is based on the theoretical framework of the Markov Decision Process (MDP) [13].

As shown in fig. 1, in a standard reinforcement learning framework, there are mainly four elements: action, reward, status, environment. The goal is to learn a behavior strategy so that the actions selected by the Agent ultimately receive the maximum reward from the environment.

Note that the state at time t is s _t The state at the next time is s _t+1 The action taken at the state at time t and the action taken at the state at the next time are a _t And a _t+1 . The discounted jackpot expectation expression is defined as follows:

in the formula: gamma is a discount factor, and gamma is more than 0 and less than 1; pi is a strategy space; r is _t Taking action a for the state of time t _t The prize earned.

After each action is taken, the Q value is iteratively updated by the bellman equation, which is expressed as follows:

Q(s _t+1 ,a _t+1 )＝(1-α)Q(s _t ,a _t )+α(r(s _t ,a _t ,s _t+1 )+γV(s)) (2)

in the formula: alpha is an adjusting coefficient; r(s) _t ,a _t ,s _t+1 ) Represents the slave state s _t Selection action a _t Reach state s _t+1 Awarded prize, state s _t The following cost function is the expression:

at s _t The optimal strategy in the state, i.e. the decision function expression for obtaining the maximum reward, is as follows:

2. long-and-short time memory neural network

Conventional Recurrent Neural Networks (RNNs) model long sequences poorly due to gradient dispersion problems. The LSTMNN controls the degree of influence of instant messages on history messages by adding a gate control unit, so that the network can store and transmit messages for a long time, and the topological structure of the LSTMNN is shown in fig. 2, wherein i represents an input gate; f represents a forgetting gate; o denotes an output gate; c and

respectively representing a memory cell and a candidate memory cell; h represents the cell output (i.e., hidden state).

The LSTMNN respectively adjusts the candidate memory adding degree, the existing memory forgetting degree and the memory exposure through an input gate, a forgetting gate and an output gate, i _t 、f _t And o _t Are respectively defined as follows:

i _t ＝σ(W _i x _t +U _i h _t-1 +b _i ) (5)

f _t ＝σ(W _f x _t +U _f h _t-1 +b _f ) (6)

o _t ＝σ(W _o x _t +U _o h _t-1 +b _o ) (7)

wherein sigma is a sigmoid function; w _i 、U _i And b _i Respectively representing an input weight matrix, an activity value weight matrix at the last moment and a bias vector in an input gate; w _f 、U _f And b _f Respectively representing an input weight matrix, an activity value weight matrix at the last moment and a bias vector in the forgetting gate; w _o 、U _o And b ₀ Respectively representing an input weight matrix, an activity value weight matrix at the last moment and an offset vector in an output gate.

In addition, LSTMNN updates the memory cell by forgetting a portion of the existing memory and adding candidate memories. Memory cell c at time t _t And candidate memory cell

Are respectively defined as follows:

wherein, the first and the second end of the pipe are connected with each other,

and

respectively representing the input weight matrix, the previous hidden layer weight matrix and the offset vector in the candidate memory unit.

Hidden state h at time t _t The definition is as follows:

wherein the content of the first and second substances,

representing a tensor product; tanh represents a hyperbolic tangent function.

Finally, according to equations (5-10), the output y of LSTMNN can be calculated by _t ：

y _t ＝σ(W _y h _t ) (11)

Wherein, W _y Representing output layer weightsAnd (4) matrix.

3. Reinforced learning tri-state combined long-and-short time memory neural network system

The reinforced learning tri-state combined long-short time memory neural network system is called RL-3S-LSTMNN for short, and an RL-3S-LSTMNN model divides a time sequence into three basic trend states, namely a rising state, a falling state and a stable state, by constructing a monotone trend recognizer, and selects a long-short time memory neural network with the number of hidden layers and the number of nodes which are adaptive to the change rule of the number of hidden layers and the number of nodes for each trend state by utilizing a reinforced learning unit. The model is shown in fig. 3.

The time sequence input by the input gate is recorded as x _t ＝[x ₁ ,x ₂ ,…,x _t ] ^T In time domain coordinates with x _t The corresponding point coordinate is (1,x) ₁ ),(2,x ₂ ),…,(t,x _t ). First, point-to-point (1,x) is identified by a monotonic trend identifier ₁ ),(2,x ₂ ),…,(t,x _t ) Performing linear fitting, and setting the fitted linear equation as follows:

x＝ht+b (12)

the square fit error is then:

in order to solve the optimal fitting equation, the extreme value idea is solved according to the calculus, and the following conditions are satisfied:

the slope h and intercept b of the linear fit equation are solved from equation (14). The trend state of the time series can be judged according to the value of the slope h, and the specific criterion is as follows:

1) If it is

Then it is in the state of descending trend and is recorded as s ₁ ；

2) If it is

Then it is in the ascending trend state and is recorded as s ₂ ；

3) If λ < arctanh < mu, it is a steady trend and is recorded as s ₃ ；

Wherein λ and μ are state critical values, i.e., a first threshold value and a second threshold value, λ < 0 and μ > 0.

And taking the three trend states as the environmental states of reinforcement learning, and selecting an action from an action set a by the Agent according to the current trend state, wherein the action set a is shown as a table I.

Table-action set a

In selecting an action, a Q value set consisting of a state set s and an action set a is used instead of the discount jackpot expectation, as shown in table two.

TABLE II Q value set

Selecting a corresponding action for each state by adopting a decision function according to the Q value set, wherein the expression of the decision function is as follows:

wherein i belongs to 1,2,3; a is ^* (s _i )∈a ₁ ,a ₂ ,…a _d Is shown in state s _i And (5) the action of decision function selection is carried out.

Obtain the state s _i Action a of ^* (s _i ) Then, pass through a ^* (s _i ) The number of the expressed network hidden layers and the number of the nodes are used for setting the LSTMNN with a plurality of hidden layers, and then a time sequence x is obtained _t (i.e., trend state s) _i ) And the corresponding long and short time neural network is marked as i-LSTMNN.

Will time series x _t ＝[x ₁ ,x ₂ ,…,x _t ] ^T As the input of i-LSTMNN, if the hidden layer of i-LSTMNN is one layer and the hidden nodes are m, the input gate in the hidden layer outputs

Forget gate output f _t ¹ And an output gate output

The following are calculated respectively:

according to the matrix algorithm, the number of hidden nodes and the dimension of the input vector jointly determine the dimension of the weight and the dimension of the activity value, so that the dimension of the weight and the activity value of each gate in the formula is t multiplied by m; in order to simplify the updating process of the network, each offset is simplified, so that only the weight value and the activity value need to be updated.

The hidden layer memory cell

And candidate memory cell

The expression is as follows:

then, the hidden layer state can be obtained from the equation (18-19)

The following were used:

finally, the final output is calculated from equation (21)

The following were used:

if the i-LSTMNN hidden layers are two layers and the number of hidden layer nodes is m, the hidden layers are changed into a first hidden layer, and the first hidden layer is continuously output finally

As input to the second hidden layer. According to the calculation process of the first hidden layer, the input gate output of the second hidden layer can be calculated in the same way

Forget gate output f _t ² Output of the output gate

Memory cell

Candidate memory cell

Hidden layer state

And second layer final output

By analogy, if the hidden layer of i-LSTMNN is n layers and the number of hidden nodes is m, the number of hidden nodes can be obtained

f _t ⁿ 、

And

although the calculation rules of different hidden layers are the same, the parameter values (i.e., the weights W and the activity values U) initially set in the hidden layers are different.

according to the trend state corresponding to the current time sequence, the action executed in the trend state and the long-time and short-time memory neural network corresponding to the current time sequence, and calculating the final output of the long-time and short-time memory neural network;

calculating the error between the final output and the ideal output, and updating the Q value of the action executed under the condition of the Q value centralized trend according to the error;

The specific steps of the reinforcement learning training are as follows:

1. update of reinforcement learning Q-value set

And (3) performing iterative updating on the Q value set by adopting an epsilon-greedy strategy: let ε = [ ε ] ₁ ,ε ₂ ,…,ε _P ]Is a monotonically decreasing series of numbers, and each element ε of the series _ρ E (0,1). P rounds of updating the Q value set and e ₁ ,ε ₂ ,…,ε _P Respectively as the motion selection reference value of each round (namely, the motion selection reference value of the rho round is epsilon) _ρ ). In the p-th wheel K is again performed _ρ Updating, each time generating a random number χ _ρk E (0,1), compare χ _ρk And e _ρ The size of (c): if x _ρk ≤ε _ρ Then in state s _i Selecting the execution action at random; ruo X _ρk ＞ε _ρ Then in the state s _i The execution action is selected according to equation (15). Then, after obtaining the corresponding i-LSTMNN according to the above, the output of the i-LSTMNN is calculated

Let the ideal output be y _t Then the output error function is as follows:

combined with output error, at state s _i The reward r resulting from the selection of the execution of action a is then calculated as follows:

wherein e is a natural index. It is clear that r ∈ (0,1) and the output error is a norm | | | E ⁿ I is negatively correlated (i.e., the larger the error, the smaller the resulting prize value).

And updating and calculating the Q value according to the obtained reward and the Bellman equation and concentrating the Q value into the state s _i The Q value to perform action a is selected as follows:

in the formula, q(s) _i And, a)' represents a set of Q values Q(s) _i And a) the updated value is updated,

indicating that the Q values are concentrated in the next state s _i ' maximum Q value at, and state s _i Can be prepared by

And inputting the data into a trend state identifier for judgment.

2.i-LSTMNN weight and activity value update

Updating the weights and the activity values by adopting a random gradient descent method, and if the final i-LSTMNN hidden layer is one layer, calculating the gradient of each weight and activity value according to the formulas (16-20), (23) and a chain derivation method

And

after the gradient is obtained, updating is carried out according to the following formulas respectively:

wherein W 'and U' are updated weight and activity values, and psi is learning rate.

If the final i-LSTMNN hidden layer is two layers, the weight and the activity value of the second hidden layer are updated by the same updating rule, and then the first hidden layer is updated

Of (2) but

Is input to the second hidden layer, and thus can be based on the output error E of the second hidden layer ² Indirectly determining output with respect to first hidden layer

The formula is as follows:

will be provided with

Instead of

And calculating the weight and the activity value gradient of the first hidden layer, and updating the weight and the activity value according to the formula (26).

By analogy, if the final i-LSTMNN hidden layer is n layers, the updating of each weight value and each activity value in the n layers can be realized.

A method for predicting the degradation trend of a rotating machine comprises the following steps:

performing feature extraction on vibration data of the rotating machine to obtain singular spectrum entropy of the rotating machine, and taking the singular spectrum entropy as state degradation features of the rotating machine after the singular spectrum entropy is subjected to moving average noise reduction processing;

decomposing the processed singular entropy into a plurality of training samples, sequentially inputting the training samples serving as input time sequences into an enhanced learning matching long-time and short-time memory neural network system, judging the trend state of the input time sequences through a monotonic trend recognizer to obtain a long-time and short-time memory neural network corresponding to the trend state, and training the long-time and short-time memory neural network for a plurality of times;

The method comprises the following specific steps:

sampling a segment of singular spectrum entropy sequence [ x ] _b ,x _b+1 ,…,x _b+(l+1)t-1 ]As training samples, and decompose the sequence:

T ₁ ＝[x _b ,x _b+1 ,…,x _b+t-1 ]→T ₁ ′＝[x _b+t ,x _b+t+1 ,…,x _b+2t-1 ]

T ₂ ＝[x _b+t ,x _b+t+1 ,…,x _b+2t-1 ]→T′ ₂ ＝[x _b+2t ,x _b+2t+1 ,…,x _b+3t-1 ]

T _l ＝[x _b+(l-1)t ,x _b+(l-1)t+1 ,…,x _b+lt-1 ]→T _l ′＝[x _b+lt ,x _b+lt+1 ,…,x _b+(l+1)t-1 ]

wherein, b is a sampling starting point; t is a unit of ₁ 、T ₂ 、…、T _l Inputting samples for training; t is ₁ ′、T′ ₂ 、…、T _l ' is the expected output corresponding to the training input sample; l is the number of training sample sets; t is the sample dimension.

Sequentially inputting the samples into RL-3S-LSTMNN, and firstly, judging the trend state for l groups of training samples by using a monotonic trend recognizer; then, the reinforcement learning unit selects and executes the best action according to the final training updated Q value set, and selects the best action for the three trend statesThe final i-LSTMNN (1-LSTMNN, 2-LSTMNN, 3-LSTMNN). Then, M times of training are respectively carried out on the i-LSTMNN selected by the reinforcement learning unit by adopting a random gradient descent method, namely the slave state s before each training _i Randomly extracting a group of samples from the training samples and inputting the samples into the corresponding i-LSTMNN, and then updating the weight value and the activity value of the i-LSTMNN to finish the primary training of the i-LSTMNN. The training process is repeated for M times in a circulating way, and the complete training process of RL-3S-LSTMNN is completed.

The prediction process of RL-3S-LSTMNN is as follows:

the last group of samples [ x ] of the training set _b+lt ,x _b+lt+1 ,…,x _b+(l+1)t-1 ]Identifying the trend through the monotone trend identifier and inputting the trend to i-LSTMNN corresponding to the trend to obtain a predicted value x 'at a b + (l + 1) th t point' _b+(l+1)t Then [ x ] is added _b+lt+1 ,x _b+lt+2 ,…,x′ _b+(l+1)t ]Input to i-LSTMNN as before to give x' _b+(l+1)t+1 By analogy, t times of prediction are carried out to obtain [ x' _b+(l+1)t ,x′ _b+(l+1)t+1 ,…,x′ _b+(l+2)t-1 ]If the prediction is a prediction round every t times, the prediction is a first round, and the prediction is performed by taking the output of the first round as the input of a second round as the same as the prediction process of the first round. By analogy, V round prediction is carried out, and V multiplied by t prediction values exist.

The specific working example is as follows:

the method is verified by adopting the rolling bearing state degradation data measured by the Cincinnati university.

Four aviation bearings are mounted on a rotating shaft of the bearing experiment table, the aviation bearings are ZA-2115 double-row roller bearings manufactured by Rexnord company, an alternating current motor drives the rotating shaft to rotate at a constant rotating speed of 2000r/min through belt transmission, and radial load of 6000lbs is applied to the bearings in the experiment process. The sampling frequency is 20kHz, the sampling length is 20480 points, vibration data of the bearing is collected every 10min, and the bearing continuously operates until a fault occurs. In the first set of experiments, after the experiment table is continuously operated for 21560 minutes, the bearing 3 has inner ring faults. The method provided by the state degradation data verification of the bearing 3 collected in the set of experiments is adopted.

The vibration data of the bearing 3 in the whole service life are 2156 groups, each group of 20480 data is obtained by respectively extracting the former 10000 vibration data of each group, carrying out matrix recombination to obtain a matrix with dimension of 1000 × 10, and calculating singular spectrum entropy, as shown in fig. 6. Performing moving average noise reduction processing on the singular spectrum entropy sequence to obtain a noise-reduced singular spectrum entropy sequence, and rapidly climbing from a starting point to a 200 th point as shown in fig. 5, wherein the bearing is in a running-in stage; the singular spectrum entropy change rate is slow from the 200 th point to the 1700 th point, and the bearing is in a stable operation stage; after the 1700 th point, the singular spectrum entropy starts to rise sharply, and the bearing is in a failure stage. Since the failure of the bearing is caused by the gradual deterioration of the failure thereof, the bearing is in the initial stage of the failure at the latter stage of the steady operation stage. Taking the 1301 th point to the 1500 th point (200 points in total) as training samples; and inputting the singular spectrum entropy into RL-3S-LSTMNN to predict the singular spectrum entropy of 500 points (namely 1501 to 2000 points).

The RL-3S-LSTMNN parameters are set as follows: state trend identifier threshold value lambda = -7 x 10 ^-6 ，μ＝7×10 ^-6 (ii) a Training round number P =5 in reinforcement learning process, and motion selection reference value epsilon = [0.9,0.7,0.5,0.3,0.1]And the number of times of training K per round _ρ ＝100ε _ρ (ii) a The action set is a selectable hidden layer number [1,2,3 ]]And the number of the optional hidden nodes is 3 to 10, and the nodes are combined in pairs to form a set of 24 actions; each Q value in the Q value table is represented by [0,1]A random value of (a); q-value update discount factor γ =0.001, Q-value update adjustment coefficient α =0.1; i-LSTMNN learning rate psi =0.001 and training times M =2000; training sample set number l =49; the number of prediction rounds V =125 and the number of predictions per round (i.e. the number of sample dimensions, i.e. the number of input nodes) t =4; the number of output nodes is 1. The prediction results are shown in fig. 6.

In order to verify the advantages of the provided RL-3S-LSTMNN-based rotating machine state degradation trend prediction method, firstly, prediction accuracies of four models, namely LSTMNN, a multi-core least squares support vector machine (MK-LSSVM), a genetic-BP network (GA-BP) and an Extreme Learning Machine (ELM), are respectively compared with the provided method. The training times of the four models and the total training time of RL-3S-LSTMNN

The same; the number of standard LSTMNN hidden layers is set to be 1, and the number of hidden nodes is set to be 8; the number of hidden layers of the GA-BP is set to be 3, and the number of hidden nodes is set to be 8; the learning rates of LSTMNN and GA-BP are set as psi =0.001; the number of ELM hidden nodes is set to be 10, and a sigmoid function is adopted as an activation function. The state degradation prediction results of the double-row roller bearing obtained by the four models are shown in fig. 7 to 10.

In order to better evaluate the prediction effect of the model, a nash coefficient (NSE), a Mean Absolute Percentage Error (MAPE) and a Root Mean Square Error (RMSE) are used as prediction accuracy evaluation indexes, namely:

wherein, y _i Is an actual value; y' _i Is a predicted value; n is the number of prediction points;

is the average of n actual values; NSE ∈ (- ∞, 1), and the closer NSE is to 1, the higher the model prediction accuracy.

TABLE 3 comparison of the predicted effects of the five state degradation trend prediction methods

Under the condition that RL-3S-LSTMNN, MK-LSSVM, GA-BP and ELM parameter settings are kept unchanged, repeatedly carrying out prediction for 100 times by using the five prediction models, and calculating the average value of three evaluation indexes after the 100 predictions

And

the results are shown in Table 3.

The results of fig. 7-10 and table 3 show that: of RL-3S-LSTMNN

And

are all minimal, and

the nearest value to 1 shows that RL-3S-LSTMNN has good generalization performance, and can obtain higher prediction accuracy compared with LSTMNN, MK-LSSVM, GA-BP and ELM when used for predicting the state degradation trend of the double-row roller bearing.

Finally, the calculation time (i.e. the sum of the training time and the prediction time) consumed for predicting the state degradation trend by using the LSTMNN, the MK-LSSVM, the GA-BP and the ELM is compared with the calculation time consumed for the RL-3S-LSTMNN, and as a result, as shown in FIG. 11, the time consumed by the RL-3S-LSTMNN is only 14.782s, the time consumed by the LSTMNN is 10.866s, the time consumed by the MK-LSSVM is 26.051s, the time consumed by the GA-BP is 35.636s, and the time consumed by the ELM is 22.374S. It is clear that the computation time of RL-3S-LSTMNN is shorter than MK-LSSVM, GA-BP, ELM, and only slightly longer than LSTMNN (but both are still in the same order of magnitude). The comparison result shows that: the RL-3S-LSTMNN is used for predicting the state degradation trend of the double-row roller bearing, and has higher convergence rate and calculation efficiency than MK-LSSVM, GA-BP and ELM.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A prediction method of a degradation trend of a rotating machine by applying an enhanced learning tri-state combined long-time memory neural network system and a training method thereof is characterized by comprising the following steps of:

after the trend state of the first prediction sample is judged by the monotonic trend recognizer, the first prediction sample is input into a long-time memory neural network corresponding to the trend state of the first prediction sample to obtain a second prediction sample, V prediction samples constructed by singular spectrum entropy prediction values are obtained by analogy, and a curve graph of the singular spectrum entropy prediction values is obtained through the prediction samples;

the long-short time memory neural network system comprises a long-short time memory neural network, a reinforced learning unit and a monotone trend recognizer, wherein the long-short time memory neural network comprises an input gate, an output gate, a forgetting gate, a memory unit, a candidate memory unit and a unit output hidden layer state;

the trend states include an ascending trend state, a descending trend state and a steady trend state;

the monotonic trend identifier combines the time series x of the input _t ＝[x ₁ ,x ₂ ,…,x _t ] ^T Corresponding point coordinates (1,x) are constructed in a time domain coordinate system ₁ ),(2,x ₂ ),…,(t,x _t ) And obtaining a linear fitting linear equation x = ht + b of the point coordinates by linear fitting the point coordinates, and solving a slope h and an intercept b of the linear fitting equation, then:

1) If it is

The state is a descending trend state;

2) If it is

The state is an ascending trend state;

3) If lambda is less than arctan h and less than mu, the state is a steady trend state;

wherein λ is a first threshold, μ is a second threshold, λ < 0 and μ > 0;

the reinforcement learning unit comprises an action set of a long-time memory neural network corresponding to the number of hidden layers and the number of hidden nodes, and a Q value corresponding to the trend state and the action thereof; the reinforcement learning unit selects an action from the action set according to the trend state of the input time sequence, obtains the action in the trend state according to a Q value set and an optimal strategy in the trend state, obtains the long-time memory neural network corresponding to the time sequence in the trend state through the number of hidden layers and hidden nodes corresponding to the action in the action set in the trend state, and calculates the final output of the long-time memory neural network;

the training method for the reinforcement learning tri-state combination long-time memory neural network system comprises the following steps: