CN109766745B - Reinforced learning tri-state combined long-time and short-time memory neural network system and training and predicting method - Google Patents

Reinforced learning tri-state combined long-time and short-time memory neural network system and training and predicting method Download PDF

Info

Publication number
CN109766745B
CN109766745B CN201811393984.9A CN201811393984A CN109766745B CN 109766745 B CN109766745 B CN 109766745B CN 201811393984 A CN201811393984 A CN 201811393984A CN 109766745 B CN109766745 B CN 109766745B
Authority
CN
China
Prior art keywords
trend
state
neural network
long
memory neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811393984.9A
Other languages
Chinese (zh)
Other versions
CN109766745A (en
Inventor
李锋
陈勇
田大庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201811393984.9A priority Critical patent/CN109766745B/en
Publication of CN109766745A publication Critical patent/CN109766745A/en
Application granted granted Critical
Publication of CN109766745B publication Critical patent/CN109766745B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a reinforcement learning three-state combined long-short term memory neural network system, which comprises a long-short term memory neural network, a reinforcement learning unit and a monotonous trend identifier, wherein the monotonous trend identifier judges a trend state of an input time sequence, the reinforcement learning unit selects a long-short term memory neural network with the number of hidden layers and the number of hidden nodes matched with a change rule of the long-short term memory neural network according to the trend state of the input time sequence, the trend state of the input time sequence comprises three states, and each trend state corresponds to the long-short term memory neural network with the number of hidden layers and the number of hidden nodes matched with the change rule of the long-short term memory neural network. The method judges the trend of the input time sequence, selects and executes the optimal action according to the updated Q value set, enhances the network generalization capability, and has higher prediction precision on the degradation trend of the rotating machinery; and the reward is calculated by outputting the error, so that the convergence speed of the network is improved, and the calculation efficiency of the system is improved.

Description

Reinforced learning tri-state combined long-time and short-time memory neural network system and training and predicting method
Technical Field
The invention relates to the technical field of neural networks, in particular to a reinforced learning tri-state combined long-time memory neural network system and a training and predicting method.
Background
The rotating machine is one of the most widely applied components in mechanical equipment in the fields of civil use and national defense, the rotating machine can gradually degrade in the long-term operation process, the residual service life can gradually decrease, and catastrophic accidents are often brought by the occurrence of faults, so that great economic loss and serious social influence are caused. At present, an optical time maintenance system is generally adopted for the maintenance of the rotating machinery by industrial enterprises, namely, the equipment is periodically overhauled no matter whether the equipment has a fault or not, although the maintenance system is strong in planning, the time and the space are occupied, a large number of spare parts need to be stored, a large amount of funds are consumed, and the insufficient maintenance or the over maintenance of the rotating machinery equipment is easily caused by overlong or overlong maintenance period intervals. Therefore, it is an important and urgent subject to plan and target prediction of the state degradation trend of the rotating machine and to take appropriate measures to prevent catastrophic accidents before the rotating machine fails.
In recent years, much research is carried out at home and abroad aiming at the theory of predicting the state degradation trend of the rotary machine, various models, new algorithms and new technologies are proposed and introduced into the research of predicting the state degradation trend, and the prediction is classified into four types after induction and summary: a prediction method based on a physical model, a prediction method based on statistical experience, a prediction method based on knowledge, a prediction method based on data driving. The physical model-based prediction method estimates degradation data of the equipment according to a mathematical formula of physical behaviors of materials in the degradation process of the equipment. The common methods include stress and strain methods, field intensity energy methods, fracture mechanics methods, and the like. However, in practice, a failure physical model of the equipment or the component is difficult to establish, and a certain deviation exists between the established model and the real model, so that the prediction result is poor. The method based on statistical experience is that product failure data is obtained through a large number of life tests, and then according to statistical analysis criteria, a proper life statistical distribution model is selected to carry out 'fitting' on the failure data, so that the characteristic distribution of the product life is obtained. The method is based on probability distribution of similar events, influence of factors such as external load and environment on individuals is not considered in the analysis process, degradation data of mechanical equipment is ignored, and the method has high discreteness, so that the reliability of prediction results is poor. The prediction method based on knowledge is to predict the failure time of the equipment according to the existing knowledge and various reasoning methods. The main methods include expert systems and fuzzy logic. Knowledge-based methods often encounter difficulties in obtaining domain knowledge and converting it to rules, and system models are easily limited by human expert knowledge; fuzzy logic needs to be combined with other methods for prediction, fuzzy rules are not easy to set, learning ability is lacked, and memory ability is not available. The prediction method based on the data is based on various statistical models and machine learning theories, the state degradation trend prediction is carried out through the historical fault data of the equipment and the existing observation data, and the method is independent of any physical or engineering principle and has the greatest advantage. Only characteristic data in the operation process of the equipment need to be collected and stored, and the prediction result only depends on the availability of the data.
The method for predicting the residual life of the rotary machine based on data driving can be subdivided into three categories: the first type is a modern model prediction method, such as Particle Filter (PF); the second type is a numerical analysis prediction method, such as Support Vector Regression (SVR) and the like; the third category is artificial intelligence prediction methods, such as neural networks. However, these methods still have disadvantages, such as, for PF, the resampling stage can cause loss of sample validity and diversity, resulting in sample depletion. The kernel function type and kernel parameters of the SVR are still difficult to accurately set, so that the prediction result is also uncertain. The selection of the number of hidden layers and the number of nodes of the artificial neural network is not guided by a mature theory, and the selection is generally carried out according to experience, so that the prediction accuracy and the calculation efficiency of the model are not ideal.
As a machine learning method for solving sequential decisions, reinforcement learning adopts a continuous 'interaction-trial and error' mechanism to realize continuous interaction of the Agent and the environment, so that an optimal strategy for completing tasks is learned, and a behavior decision mode for improving intelligence of human is conformed. Aiming at the problem that the number of hidden layers and the number of nodes of a neural network are selected according to experience to cause uncontrollable nonlinear approximation capability and generalization performance, the invention provides a Long-time and short-time memory neural network based on Long and Short Time Memory Neural Network (LSTMNN) by combining the advantages of reinforcement learning in the aspect of intelligent decision.
Disclosure of Invention
The invention aims to provide a reinforcement learning three-state combination long-time memory neural network system with high convergence rate, high calculation efficiency and accurate prediction precision and a training and prediction method.
In order to solve the technical problems, the technical scheme of the invention is as follows: a long-short time memory neural network system combining three states of reinforcement learning comprises a long-short time memory neural network, a reinforcement learning unit and a monotone trend recognizer, wherein the long-short time memory neural network comprises an input gate, an output gate, a forgetting gate, a memory unit, a candidate memory unit and a unit output (namely a hidden layer state), the monotone trend recognizer judges the trend state of a time sequence constructed by input, the reinforcement learning unit selects a long-short time memory neural network with the hidden layer number and the hidden node number adaptive to the change rule of the long-short time memory neural network according to the trend state of the input time sequence, the trend state of the input time sequence comprises three states, and each trend state corresponds to the long-short time memory neural network with the hidden layer number and the hidden node number adaptive to the change rule of the long-short time memory neural network.
As a preferred technical solution, the trend state includes an ascending trend state, a descending trend state and a steady trend state.
As a preferred technical solution, the monotonic trend identifier is to identify the input time series x t =[x 1 ,x 2 ,…,x t ] T Corresponding point coordinates (1,x) are constructed in a time domain coordinate system 1 ),(2,x 2 ),…,(t,x t ) And obtaining a linear fitting linear equation x = ht + b of the point coordinates by linear fitting the point coordinates, and solving a slope h and an intercept b of the linear fitting equation, then:
1) If it is
Figure BDA0001874723260000031
The state is a descending trend state;
2) If it is
Figure BDA0001874723260000032
The state is an ascending trend state;
3) If lambda is more than arctaph and less than mu, the state is a steady trend state;
where λ is a first threshold, μ is a second threshold, λ < 0 and μ > 0.
As a preferred technical solution, the reinforcement learning unit includes an action set for memorizing the neural network for a long time corresponding to the number of hidden layers and the number of hidden nodes, and a Q value corresponding to the trend state and the action thereof; and the reinforcement learning unit selects an action from the action set according to the trend state of the input time sequence, obtains the action in the trend state according to the Q value set and the optimal strategy in the trend state, obtains the long-time memory neural network corresponding to the time sequence in the trend state through the number of hidden layers and hidden nodes corresponding to the action in the action set in the trend state, and calculates the final output of the long-time memory neural network.
A training method for a reinforcement learning tri-state combination long-time memory neural network system comprises the following steps:
calculating the final output of the long-time memory neural network according to the trend state corresponding to the current time sequence, the action executed in the trend state and the long-time memory neural network corresponding to the current time sequence;
calculating the error between the final output and the ideal output, and updating the Q value of the action executed under the trend state in the Q value set according to the error;
and updating the weight and the activity value of each hidden layer of the long-time and short-time memory neural network corresponding to the current time sequence by a random gradient descent method.
A method for predicting degradation tendency of a rotating machine comprises the following steps:
performing feature extraction on vibration data of the rotating machine to obtain singular spectrum entropy of the rotating machine, and taking the singular spectrum entropy as a state degradation feature of the rotating machine after the singular spectrum entropy is subjected to moving average noise reduction processing;
decomposing the singular spectrum entropy into a plurality of training samples, sequentially inputting the training samples as an input time sequence into a reinforcement learning matching long-time memory neural network system, judging the trend state of the input time sequence through a monotonic trend recognizer to obtain a long-time memory neural network corresponding to the trend state, and performing multiple training on the long-time memory neural network;
judging the trend state of the last training sample through a monotonic trend recognizer to obtain a long-time and short-time memory neural network corresponding to the last training sample, obtaining a first singular spectrum entropy predicted value through the long-time memory neural network, combining the first singular spectrum entropy predicted value with the last t-1 singular spectrum entropy values in the last training sample to construct a new training sample, inputting the new training sample into the long-time memory neural network corresponding to the trend state of the new training sample to obtain a second singular spectrum entropy predicted value, and so on to obtain t singular spectrum entropy predicted values, and constructing the t singular spectrum entropy predicted values into a first prediction sample;
and after the trend state of the first prediction sample is judged by the monotonic trend recognizer, the first prediction sample is input into a long-time memory neural network corresponding to the trend state of the first prediction sample to obtain a second prediction sample, V prediction samples constructed by the singular spectrum entropy prediction value are obtained by analogy, and a curve graph of the singular spectrum entropy prediction value is obtained through the prediction samples.
Due to the adoption of the technical scheme, the invention has the beneficial effects that: the method judges the trend (ascending, descending and stable) of the input time sequence through the monotonic trend identifier, respectively represents the state and the action of a Q value set by using the three trends and different hidden layer numbers and hidden node numbers, and selects and executes the optimal action (namely selects the long-time memory neural network with the hidden layer number and the hidden node number which are most matched with each sequence trend unit) according to the updated Q value set by the Agent, thereby enhancing the generalization capability of the network and leading the provided prediction method to have higher prediction precision; in addition, in order to clarify the learning target of reinforcement learning (namely the output error E of the i-LSTMNN is smaller), blind search action of the Agent in the Q value set updating process is avoided, the reward is calculated through the output error, the blind search of the Agent is avoided, the convergence speed of the network is improved, and the provided prediction method has higher calculation efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a topology diagram of a reinforcement learning unit in an embodiment of the present invention;
FIG. 2 is a topological diagram of a long-term and short-term memory neural network model in the embodiment of the present invention;
FIG. 3 is a schematic diagram of a model of a long-time and short-time memory neural network system matched with reinforcement learning units according to an embodiment of the present invention;
FIG. 4 is a flowchart of a method for predicting remaining life of a rotating machine according to an embodiment of the present invention;
FIG. 5 is a plot of singular spectrum entropy for a rotating machine in an embodiment of the present invention;
FIG. 6 is a graph of singular spectrum entropy predicted by a memory neural network when a reinforcement learning unit is matched with a long time;
FIG. 7 is a plot of singular spectral entropy predicted by long and short term memory neural networks;
FIG. 8 is a plot of singular spectral entropy predicted by the multi-kernel least squares support vector machine MK-LSSVM;
FIG. 9 is a plot of the singular spectrum entropy of the GA-BP prediction for the genetic-BP network;
FIG. 10 is a plot of singular spectral entropy predicted by extreme learning machine ELM;
fig. 11 is a comparison graph of the consumption time of the five remaining life prediction methods.
Detailed Description
A long-time and short-time memory neural network system for reinforcement learning tri-state combination comprises a long-time and short-time memory neural network, a reinforcement learning unit and a monotone trend recognizer.
The long-time and short-time memory neural network comprises an input gate, an output gate, a forgetting gate, a memory unit, a candidate memory unit and a unit output (namely a hidden layer state).
The monotone trend recognizer judges the trend state of a time sequence constructed by input, and the reinforcement learning unit selects a long-term memory neural network with the number of hidden layers and the number of hidden nodes adapted to the change rule of the hidden layers according to the trend state of the input time sequence, wherein the trend state of the input time sequence comprises three states, and each trend state corresponds to the long-term memory neural network with the number of hidden layers and the number of hidden nodes adapted to the change rule of the hidden layers.
The trend states include an up trend state, a down trend state, and a steady trend state.
Monotonic trend recognizer will input time series x t =[x 1 ,x 2 ,…,x t ] T Corresponding point coordinates (1,x) are constructed in a time domain coordinate system 1 ),(2,x 2 ),…,(t,x t ) And linear fitting the point coordinates to obtain a linear fitting linear equation x = ht + b of the point coordinates, and solving a slope h and an intercept b of the linear fitting equation, then:
1) If it is
Figure BDA0001874723260000061
The state is a descending trend state;
2) If it is
Figure BDA0001874723260000062
The state is an ascending trend state;
3) If lambda is less than arctanh and less than mu, the state is a steady trend state;
where λ is a first threshold, μ is a second threshold, λ < 0 and μ > 0.
The reinforcement learning unit comprises an action set for memorizing the neural network in long and short time corresponding to the number of hidden layers and the number of hidden nodes, and a Q value corresponding to the trend state and the action thereof; the reinforcement learning unit selects an action from the action set according to the trend state of the input time sequence, then obtains the action in the trend state according to the Q value set and the optimal strategy in the trend state, obtains the long-time memory neural network corresponding to the time sequence in the trend state through the number of hidden layers and the number of hidden nodes corresponding to the action in the action set in the trend state, and calculates the final output of the long-time memory neural network.
The construction process of the reinforcement learning tri-state combination long-time memory neural network system model is as follows:
1. reinforced learning unit
Reinforcement learning is based on the theoretical framework of the Markov Decision Process (MDP) [13].
As shown in fig. 1, in a standard reinforcement learning framework, there are mainly four elements: action, reward, status, environment. The goal is to learn a behavior strategy so that the actions selected by the Agent ultimately receive the maximum reward from the environment.
Note that the state at time t is s t The state at the next time is s t+1 The action taken at the state at time t and the action taken at the state at the next time are a t And a t+1 . The discounted jackpot expectation expression is defined as follows:
Figure BDA0001874723260000063
in the formula: gamma is a discount factor, and gamma is more than 0 and less than 1; pi is a strategy space; r is t Taking action a for the state of time t t The prize earned.
After each action is taken, the Q value is iteratively updated by the bellman equation, which is expressed as follows:
Q(s t+1 ,a t+1 )=(1-α)Q(s t ,a t )+α(r(s t ,a t ,s t+1 )+γV(s)) (2)
in the formula: alpha is an adjusting coefficient; r(s) t ,a t ,s t+1 ) Represents the slave state s t Selection action a t Reach state s t+1 Awarded prize, state s t The following cost function is the expression:
Figure BDA0001874723260000071
at s t The optimal strategy in the state, i.e. the decision function expression for obtaining the maximum reward, is as follows:
Figure BDA0001874723260000072
2. long-and-short time memory neural network
Conventional Recurrent Neural Networks (RNNs) model long sequences poorly due to gradient dispersion problems. The LSTMNN controls the degree of influence of instant messages on history messages by adding a gate control unit, so that the network can store and transmit messages for a long time, and the topological structure of the LSTMNN is shown in fig. 2, wherein i represents an input gate; f represents a forgetting gate; o denotes an output gate; c and
Figure BDA00018747232600000710
respectively representing a memory cell and a candidate memory cell; h represents the cell output (i.e., hidden state).
The LSTMNN respectively adjusts the candidate memory adding degree, the existing memory forgetting degree and the memory exposure through an input gate, a forgetting gate and an output gate, i t 、f t And o t Are respectively defined as follows:
i t =σ(W i x t +U i h t-1 +b i ) (5)
f t =σ(W f x t +U f h t-1 +b f ) (6)
o t =σ(W o x t +U o h t-1 +b o ) (7)
wherein sigma is a sigmoid function; w i 、U i And b i Respectively representing an input weight matrix, an activity value weight matrix at the last moment and a bias vector in an input gate; w f 、U f And b f Respectively representing an input weight matrix, an activity value weight matrix at the last moment and a bias vector in the forgetting gate; w o 、U o And b 0 Respectively representing an input weight matrix, an activity value weight matrix at the last moment and an offset vector in an output gate.
In addition, LSTMNN updates the memory cell by forgetting a portion of the existing memory and adding candidate memories. Memory cell c at time t t And candidate memory cell
Figure BDA0001874723260000073
Are respectively defined as follows:
Figure BDA0001874723260000074
Figure BDA0001874723260000075
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0001874723260000076
and
Figure BDA0001874723260000077
respectively representing the input weight matrix, the previous hidden layer weight matrix and the offset vector in the candidate memory unit.
Hidden state h at time t t The definition is as follows:
Figure BDA0001874723260000078
wherein the content of the first and second substances,
Figure BDA0001874723260000079
representing a tensor product; tanh represents a hyperbolic tangent function.
Finally, according to equations (5-10), the output y of LSTMNN can be calculated by t
y t =σ(W y h t ) (11)
Wherein, W y Representing output layer weightsAnd (4) matrix.
3. Reinforced learning tri-state combined long-and-short time memory neural network system
The reinforced learning tri-state combined long-short time memory neural network system is called RL-3S-LSTMNN for short, and an RL-3S-LSTMNN model divides a time sequence into three basic trend states, namely a rising state, a falling state and a stable state, by constructing a monotone trend recognizer, and selects a long-short time memory neural network with the number of hidden layers and the number of nodes which are adaptive to the change rule of the number of hidden layers and the number of nodes for each trend state by utilizing a reinforced learning unit. The model is shown in fig. 3.
The time sequence input by the input gate is recorded as x t =[x 1 ,x 2 ,…,x t ] T In time domain coordinates with x t The corresponding point coordinate is (1,x) 1 ),(2,x 2 ),…,(t,x t ). First, point-to-point (1,x) is identified by a monotonic trend identifier 1 ),(2,x 2 ),…,(t,x t ) Performing linear fitting, and setting the fitted linear equation as follows:
x=ht+b (12)
the square fit error is then:
Figure BDA0001874723260000081
in order to solve the optimal fitting equation, the extreme value idea is solved according to the calculus, and the following conditions are satisfied:
Figure BDA0001874723260000082
the slope h and intercept b of the linear fit equation are solved from equation (14). The trend state of the time series can be judged according to the value of the slope h, and the specific criterion is as follows:
1) If it is
Figure BDA0001874723260000083
Then it is in the state of descending trend and is recorded as s 1
2) If it is
Figure BDA0001874723260000084
Then it is in the ascending trend state and is recorded as s 2
3) If λ < arctanh < mu, it is a steady trend and is recorded as s 3
Wherein λ and μ are state critical values, i.e., a first threshold value and a second threshold value, λ < 0 and μ > 0.
And taking the three trend states as the environmental states of reinforcement learning, and selecting an action from an action set a by the Agent according to the current trend state, wherein the action set a is shown as a table I.
Table-action set a
Figure BDA0001874723260000091
In selecting an action, a Q value set consisting of a state set s and an action set a is used instead of the discount jackpot expectation, as shown in table two.
TABLE II Q value set
Figure BDA0001874723260000092
Selecting a corresponding action for each state by adopting a decision function according to the Q value set, wherein the expression of the decision function is as follows:
Figure BDA0001874723260000093
wherein i belongs to 1,2,3; a is * (s i )∈a 1 ,a 2 ,…a d Is shown in state s i And (5) the action of decision function selection is carried out.
Obtain the state s i Action a of * (s i ) Then, pass through a * (s i ) The number of the expressed network hidden layers and the number of the nodes are used for setting the LSTMNN with a plurality of hidden layers, and then a time sequence x is obtained t (i.e., trend state s) i ) And the corresponding long and short time neural network is marked as i-LSTMNN.
Will time series x t =[x 1 ,x 2 ,…,x t ] T As the input of i-LSTMNN, if the hidden layer of i-LSTMNN is one layer and the hidden nodes are m, the input gate in the hidden layer outputs
Figure BDA0001874723260000094
Forget gate output f t 1 And an output gate output
Figure BDA0001874723260000095
The following are calculated respectively:
Figure BDA0001874723260000096
Figure BDA0001874723260000097
Figure BDA0001874723260000098
according to the matrix algorithm, the number of hidden nodes and the dimension of the input vector jointly determine the dimension of the weight and the dimension of the activity value, so that the dimension of the weight and the activity value of each gate in the formula is t multiplied by m; in order to simplify the updating process of the network, each offset is simplified, so that only the weight value and the activity value need to be updated.
The hidden layer memory cell
Figure BDA0001874723260000101
And candidate memory cell
Figure BDA0001874723260000102
The expression is as follows:
Figure BDA0001874723260000103
Figure BDA0001874723260000104
then, the hidden layer state can be obtained from the equation (18-19)
Figure BDA0001874723260000105
The following were used:
Figure BDA0001874723260000106
finally, the final output is calculated from equation (21)
Figure BDA0001874723260000107
The following were used:
Figure BDA0001874723260000108
if the i-LSTMNN hidden layers are two layers and the number of hidden layer nodes is m, the hidden layers are changed into a first hidden layer, and the first hidden layer is continuously output finally
Figure BDA0001874723260000109
As input to the second hidden layer. According to the calculation process of the first hidden layer, the input gate output of the second hidden layer can be calculated in the same way
Figure BDA00018747232600001010
Forget gate output f t 2 Output of the output gate
Figure BDA00018747232600001011
Memory cell
Figure BDA00018747232600001012
Candidate memory cell
Figure BDA00018747232600001013
Hidden layer state
Figure BDA00018747232600001014
And second layer final output
Figure BDA00018747232600001015
By analogy, if the hidden layer of i-LSTMNN is n layers and the number of hidden nodes is m, the number of hidden nodes can be obtained
Figure BDA00018747232600001016
f t n
Figure BDA00018747232600001017
And
Figure BDA00018747232600001018
although the calculation rules of different hidden layers are the same, the parameter values (i.e., the weights W and the activity values U) initially set in the hidden layers are different.
A training method for a reinforcement learning tri-state combination long-time memory neural network system comprises the following steps:
according to the trend state corresponding to the current time sequence, the action executed in the trend state and the long-time and short-time memory neural network corresponding to the current time sequence, and calculating the final output of the long-time and short-time memory neural network;
calculating the error between the final output and the ideal output, and updating the Q value of the action executed under the condition of the Q value centralized trend according to the error;
and updating the weight and the activity value of each hidden layer of the long-time and short-time memory neural network corresponding to the current time sequence by a random gradient descent method.
The specific steps of the reinforcement learning training are as follows:
1. update of reinforcement learning Q-value set
And (3) performing iterative updating on the Q value set by adopting an epsilon-greedy strategy: let ε = [ ε ] 12 ,…,ε P ]Is a monotonically decreasing series of numbers, and each element ε of the series ρ E (0,1). P rounds of updating the Q value set and e 12 ,…,ε P Respectively as the motion selection reference value of each round (namely, the motion selection reference value of the rho round is epsilon) ρ ). In the p-th wheel K is again performed ρ Updating, each time generating a random number χ ρk E (0,1), compare χ ρk And e ρ The size of (c): if x ρk ≤ε ρ Then in state s i Selecting the execution action at random; ruo X ρk >ε ρ Then in the state s i The execution action is selected according to equation (15). Then, after obtaining the corresponding i-LSTMNN according to the above, the output of the i-LSTMNN is calculated
Figure BDA00018747232600001111
Let the ideal output be y t Then the output error function is as follows:
Figure BDA0001874723260000111
combined with output error, at state s i The reward r resulting from the selection of the execution of action a is then calculated as follows:
Figure BDA0001874723260000112
wherein e is a natural index. It is clear that r ∈ (0,1) and the output error is a norm | | | E n I is negatively correlated (i.e., the larger the error, the smaller the resulting prize value).
And updating and calculating the Q value according to the obtained reward and the Bellman equation and concentrating the Q value into the state s i The Q value to perform action a is selected as follows:
Figure BDA0001874723260000113
in the formula, q(s) i And, a)' represents a set of Q values Q(s) i And a) the updated value is updated,
Figure BDA0001874723260000114
indicating that the Q values are concentrated in the next state s i ' maximum Q value at, and state s i Can be prepared by
Figure BDA0001874723260000115
And inputting the data into a trend state identifier for judgment.
2.i-LSTMNN weight and activity value update
Updating the weights and the activity values by adopting a random gradient descent method, and if the final i-LSTMNN hidden layer is one layer, calculating the gradient of each weight and activity value according to the formulas (16-20), (23) and a chain derivation method
Figure BDA0001874723260000116
And
Figure BDA0001874723260000117
after the gradient is obtained, updating is carried out according to the following formulas respectively:
Figure BDA0001874723260000118
wherein W 'and U' are updated weight and activity values, and psi is learning rate.
If the final i-LSTMNN hidden layer is two layers, the weight and the activity value of the second hidden layer are updated by the same updating rule, and then the first hidden layer is updated
Figure BDA0001874723260000119
Of (2) but
Figure BDA00018747232600001110
Is input to the second hidden layer, and thus can be based on the output error E of the second hidden layer 2 Indirectly determining output with respect to first hidden layer
Figure BDA0001874723260000121
The formula is as follows:
Figure BDA0001874723260000122
will be provided with
Figure BDA0001874723260000123
Instead of
Figure BDA0001874723260000124
And calculating the weight and the activity value gradient of the first hidden layer, and updating the weight and the activity value according to the formula (26).
By analogy, if the final i-LSTMNN hidden layer is n layers, the updating of each weight value and each activity value in the n layers can be realized.
A method for predicting the degradation trend of a rotating machine comprises the following steps:
performing feature extraction on vibration data of the rotating machine to obtain singular spectrum entropy of the rotating machine, and taking the singular spectrum entropy as state degradation features of the rotating machine after the singular spectrum entropy is subjected to moving average noise reduction processing;
decomposing the processed singular entropy into a plurality of training samples, sequentially inputting the training samples serving as input time sequences into an enhanced learning matching long-time and short-time memory neural network system, judging the trend state of the input time sequences through a monotonic trend recognizer to obtain a long-time and short-time memory neural network corresponding to the trend state, and training the long-time and short-time memory neural network for a plurality of times;
judging the trend state of the last training sample through a monotonic trend recognizer to obtain a long-time and short-time memory neural network corresponding to the last training sample, obtaining a first singular spectrum entropy predicted value through the long-time memory neural network, combining the first singular spectrum entropy predicted value with the last t-1 singular spectrum entropy values in the last training sample to construct a new training sample, inputting the new training sample into the long-time memory neural network corresponding to the trend state of the new training sample to obtain a second singular spectrum entropy predicted value, and so on to obtain t singular spectrum entropy predicted values, and constructing the t singular spectrum entropy predicted values into a first prediction sample;
and after the trend state of the first prediction sample is judged by the monotonic trend recognizer, the first prediction sample is input into a long-time memory neural network corresponding to the trend state of the first prediction sample to obtain a second prediction sample, V prediction samples constructed by the singular spectrum entropy prediction value are obtained by analogy, and a curve graph of the singular spectrum entropy prediction value is obtained through the prediction samples.
The method comprises the following specific steps:
sampling a segment of singular spectrum entropy sequence [ x ] b ,x b+1 ,…,x b+(l+1)t-1 ]As training samples, and decompose the sequence:
T 1 =[x b ,x b+1 ,…,x b+t-1 ]→T 1 ′=[x b+t ,x b+t+1 ,…,x b+2t-1 ]
T 2 =[x b+t ,x b+t+1 ,…,x b+2t-1 ]→T′ 2 =[x b+2t ,x b+2t+1 ,…,x b+3t-1 ]
Figure BDA0001874723260000131
T l =[x b+(l-1)t ,x b+(l-1)t+1 ,…,x b+lt-1 ]→T l ′=[x b+lt ,x b+lt+1 ,…,x b+(l+1)t-1 ]
wherein, b is a sampling starting point; t is a unit of 1 、T 2 、…、T l Inputting samples for training; t is 1 ′、T′ 2 、…、T l ' is the expected output corresponding to the training input sample; l is the number of training sample sets; t is the sample dimension.
Sequentially inputting the samples into RL-3S-LSTMNN, and firstly, judging the trend state for l groups of training samples by using a monotonic trend recognizer; then, the reinforcement learning unit selects and executes the best action according to the final training updated Q value set, and selects the best action for the three trend statesThe final i-LSTMNN (1-LSTMNN, 2-LSTMNN, 3-LSTMNN). Then, M times of training are respectively carried out on the i-LSTMNN selected by the reinforcement learning unit by adopting a random gradient descent method, namely the slave state s before each training i Randomly extracting a group of samples from the training samples and inputting the samples into the corresponding i-LSTMNN, and then updating the weight value and the activity value of the i-LSTMNN to finish the primary training of the i-LSTMNN. The training process is repeated for M times in a circulating way, and the complete training process of RL-3S-LSTMNN is completed.
The prediction process of RL-3S-LSTMNN is as follows:
the last group of samples [ x ] of the training set b+lt ,x b+lt+1 ,…,x b+(l+1)t-1 ]Identifying the trend through the monotone trend identifier and inputting the trend to i-LSTMNN corresponding to the trend to obtain a predicted value x 'at a b + (l + 1) th t point' b+(l+1)t Then [ x ] is added b+lt+1 ,x b+lt+2 ,…,x′ b+(l+1)t ]Input to i-LSTMNN as before to give x' b+(l+1)t+1 By analogy, t times of prediction are carried out to obtain [ x' b+(l+1)t ,x′ b+(l+1)t+1 ,…,x′ b+(l+2)t-1 ]If the prediction is a prediction round every t times, the prediction is a first round, and the prediction is performed by taking the output of the first round as the input of a second round as the same as the prediction process of the first round. By analogy, V round prediction is carried out, and V multiplied by t prediction values exist.
The specific working example is as follows:
the method is verified by adopting the rolling bearing state degradation data measured by the Cincinnati university.
Four aviation bearings are mounted on a rotating shaft of the bearing experiment table, the aviation bearings are ZA-2115 double-row roller bearings manufactured by Rexnord company, an alternating current motor drives the rotating shaft to rotate at a constant rotating speed of 2000r/min through belt transmission, and radial load of 6000lbs is applied to the bearings in the experiment process. The sampling frequency is 20kHz, the sampling length is 20480 points, vibration data of the bearing is collected every 10min, and the bearing continuously operates until a fault occurs. In the first set of experiments, after the experiment table is continuously operated for 21560 minutes, the bearing 3 has inner ring faults. The method provided by the state degradation data verification of the bearing 3 collected in the set of experiments is adopted.
The vibration data of the bearing 3 in the whole service life are 2156 groups, each group of 20480 data is obtained by respectively extracting the former 10000 vibration data of each group, carrying out matrix recombination to obtain a matrix with dimension of 1000 × 10, and calculating singular spectrum entropy, as shown in fig. 6. Performing moving average noise reduction processing on the singular spectrum entropy sequence to obtain a noise-reduced singular spectrum entropy sequence, and rapidly climbing from a starting point to a 200 th point as shown in fig. 5, wherein the bearing is in a running-in stage; the singular spectrum entropy change rate is slow from the 200 th point to the 1700 th point, and the bearing is in a stable operation stage; after the 1700 th point, the singular spectrum entropy starts to rise sharply, and the bearing is in a failure stage. Since the failure of the bearing is caused by the gradual deterioration of the failure thereof, the bearing is in the initial stage of the failure at the latter stage of the steady operation stage. Taking the 1301 th point to the 1500 th point (200 points in total) as training samples; and inputting the singular spectrum entropy into RL-3S-LSTMNN to predict the singular spectrum entropy of 500 points (namely 1501 to 2000 points).
The RL-3S-LSTMNN parameters are set as follows: state trend identifier threshold value lambda = -7 x 10 -6 ,μ=7×10 -6 (ii) a Training round number P =5 in reinforcement learning process, and motion selection reference value epsilon = [0.9,0.7,0.5,0.3,0.1]And the number of times of training K per round ρ =100ε ρ (ii) a The action set is a selectable hidden layer number [1,2,3 ]]And the number of the optional hidden nodes is 3 to 10, and the nodes are combined in pairs to form a set of 24 actions; each Q value in the Q value table is represented by [0,1]A random value of (a); q-value update discount factor γ =0.001, Q-value update adjustment coefficient α =0.1; i-LSTMNN learning rate psi =0.001 and training times M =2000; training sample set number l =49; the number of prediction rounds V =125 and the number of predictions per round (i.e. the number of sample dimensions, i.e. the number of input nodes) t =4; the number of output nodes is 1. The prediction results are shown in fig. 6.
In order to verify the advantages of the provided RL-3S-LSTMNN-based rotating machine state degradation trend prediction method, firstly, prediction accuracies of four models, namely LSTMNN, a multi-core least squares support vector machine (MK-LSSVM), a genetic-BP network (GA-BP) and an Extreme Learning Machine (ELM), are respectively compared with the provided method. The training times of the four models and the total training time of RL-3S-LSTMNN
Figure BDA0001874723260000141
The same; the number of standard LSTMNN hidden layers is set to be 1, and the number of hidden nodes is set to be 8; the number of hidden layers of the GA-BP is set to be 3, and the number of hidden nodes is set to be 8; the learning rates of LSTMNN and GA-BP are set as psi =0.001; the number of ELM hidden nodes is set to be 10, and a sigmoid function is adopted as an activation function. The state degradation prediction results of the double-row roller bearing obtained by the four models are shown in fig. 7 to 10.
In order to better evaluate the prediction effect of the model, a nash coefficient (NSE), a Mean Absolute Percentage Error (MAPE) and a Root Mean Square Error (RMSE) are used as prediction accuracy evaluation indexes, namely:
Figure BDA0001874723260000151
Figure BDA0001874723260000152
Figure BDA0001874723260000153
wherein, y i Is an actual value; y' i Is a predicted value; n is the number of prediction points;
Figure BDA0001874723260000154
is the average of n actual values; NSE ∈ (- ∞, 1), and the closer NSE is to 1, the higher the model prediction accuracy.
TABLE 3 comparison of the predicted effects of the five state degradation trend prediction methods
Figure BDA0001874723260000155
Under the condition that RL-3S-LSTMNN, MK-LSSVM, GA-BP and ELM parameter settings are kept unchanged, repeatedly carrying out prediction for 100 times by using the five prediction models, and calculating the average value of three evaluation indexes after the 100 predictions
Figure BDA0001874723260000156
And
Figure BDA0001874723260000157
the results are shown in Table 3.
The results of fig. 7-10 and table 3 show that: of RL-3S-LSTMNN
Figure BDA0001874723260000158
And
Figure BDA0001874723260000159
are all minimal, and
Figure BDA00018747232600001510
the nearest value to 1 shows that RL-3S-LSTMNN has good generalization performance, and can obtain higher prediction accuracy compared with LSTMNN, MK-LSSVM, GA-BP and ELM when used for predicting the state degradation trend of the double-row roller bearing.
Finally, the calculation time (i.e. the sum of the training time and the prediction time) consumed for predicting the state degradation trend by using the LSTMNN, the MK-LSSVM, the GA-BP and the ELM is compared with the calculation time consumed for the RL-3S-LSTMNN, and as a result, as shown in FIG. 11, the time consumed by the RL-3S-LSTMNN is only 14.782s, the time consumed by the LSTMNN is 10.866s, the time consumed by the MK-LSSVM is 26.051s, the time consumed by the GA-BP is 35.636s, and the time consumed by the ELM is 22.374S. It is clear that the computation time of RL-3S-LSTMNN is shorter than MK-LSSVM, GA-BP, ELM, and only slightly longer than LSTMNN (but both are still in the same order of magnitude). The comparison result shows that: the RL-3S-LSTMNN is used for predicting the state degradation trend of the double-row roller bearing, and has higher convergence rate and calculation efficiency than MK-LSSVM, GA-BP and ELM.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (1)

1. A prediction method of a degradation trend of a rotating machine by applying an enhanced learning tri-state combined long-time memory neural network system and a training method thereof is characterized by comprising the following steps of:
performing feature extraction on vibration data of the rotating machine to obtain singular spectrum entropy of the rotating machine, and taking the singular spectrum entropy as a state degradation feature of the rotating machine after the singular spectrum entropy is subjected to moving average noise reduction processing;
decomposing the singular spectrum entropy into a plurality of training samples, sequentially inputting the training samples as an input time sequence into a reinforcement learning matching long-time memory neural network system, judging the trend state of the input time sequence through a monotonic trend recognizer to obtain a long-time memory neural network corresponding to the trend state, and performing multiple training on the long-time memory neural network;
judging the trend state of the last training sample through a monotonic trend recognizer to obtain a long-time and short-time memory neural network corresponding to the last training sample, obtaining a first singular spectrum entropy predicted value through the long-time memory neural network, combining the first singular spectrum entropy predicted value with the last t-1 singular spectrum entropy values in the last training sample to construct a new training sample, inputting the new training sample into the long-time memory neural network corresponding to the trend state of the new training sample to obtain a second singular spectrum entropy predicted value, and so on to obtain t singular spectrum entropy predicted values, and constructing the t singular spectrum entropy predicted values into a first prediction sample;
after the trend state of the first prediction sample is judged by the monotonic trend recognizer, the first prediction sample is input into a long-time memory neural network corresponding to the trend state of the first prediction sample to obtain a second prediction sample, V prediction samples constructed by singular spectrum entropy prediction values are obtained by analogy, and a curve graph of the singular spectrum entropy prediction values is obtained through the prediction samples;
the long-short time memory neural network system comprises a long-short time memory neural network, a reinforced learning unit and a monotone trend recognizer, wherein the long-short time memory neural network comprises an input gate, an output gate, a forgetting gate, a memory unit, a candidate memory unit and a unit output hidden layer state;
the trend states include an ascending trend state, a descending trend state and a steady trend state;
the monotonic trend identifier combines the time series x of the input t =[x 1 ,x 2 ,…,x t ] T Corresponding point coordinates (1,x) are constructed in a time domain coordinate system 1 ),(2,x 2 ),…,(t,x t ) And obtaining a linear fitting linear equation x = ht + b of the point coordinates by linear fitting the point coordinates, and solving a slope h and an intercept b of the linear fitting equation, then:
1) If it is
Figure FDA0003714357340000021
The state is a descending trend state;
2) If it is
Figure FDA0003714357340000022
The state is an ascending trend state;
3) If lambda is less than arctan h and less than mu, the state is a steady trend state;
wherein λ is a first threshold, μ is a second threshold, λ < 0 and μ > 0;
the reinforcement learning unit comprises an action set of a long-time memory neural network corresponding to the number of hidden layers and the number of hidden nodes, and a Q value corresponding to the trend state and the action thereof; the reinforcement learning unit selects an action from the action set according to the trend state of the input time sequence, obtains the action in the trend state according to a Q value set and an optimal strategy in the trend state, obtains the long-time memory neural network corresponding to the time sequence in the trend state through the number of hidden layers and hidden nodes corresponding to the action in the action set in the trend state, and calculates the final output of the long-time memory neural network;
the training method for the reinforcement learning tri-state combination long-time memory neural network system comprises the following steps:
calculating the final output of the long-time memory neural network according to the trend state corresponding to the current time sequence, the action executed in the trend state and the long-time memory neural network corresponding to the current time sequence;
calculating the error between the final output and the ideal output, and updating the Q value of the action executed under the trend state in the Q value set according to the error;
and updating the weight and the activity value of each hidden layer of the long-time and short-time memory neural network corresponding to the current time sequence by a random gradient descent method.
CN201811393984.9A 2018-11-22 2018-11-22 Reinforced learning tri-state combined long-time and short-time memory neural network system and training and predicting method Active CN109766745B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811393984.9A CN109766745B (en) 2018-11-22 2018-11-22 Reinforced learning tri-state combined long-time and short-time memory neural network system and training and predicting method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811393984.9A CN109766745B (en) 2018-11-22 2018-11-22 Reinforced learning tri-state combined long-time and short-time memory neural network system and training and predicting method

Publications (2)

Publication Number Publication Date
CN109766745A CN109766745A (en) 2019-05-17
CN109766745B true CN109766745B (en) 2022-12-13

Family

ID=66450163

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811393984.9A Active CN109766745B (en) 2018-11-22 2018-11-22 Reinforced learning tri-state combined long-time and short-time memory neural network system and training and predicting method

Country Status (1)

Country Link
CN (1) CN109766745B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110220725B (en) * 2019-05-30 2021-03-23 河海大学 Subway wheel health state prediction method based on deep learning and BP integration
CN110807510B (en) * 2019-09-24 2023-05-09 中国矿业大学 Parallel learning soft measurement modeling method for industrial big data
CN111665718B (en) * 2020-06-05 2022-05-10 长春工业大学 Diagonal recurrent neural network control method based on Q learning algorithm
CN112783138B (en) * 2020-12-30 2022-03-22 上海交通大学 Intelligent monitoring and abnormity diagnosis method and device for processing stability of production line equipment
CN112758097B (en) * 2020-12-30 2022-06-03 北京理工大学 State prediction and estimation method for unmanned vehicle
CN113486731A (en) * 2021-06-17 2021-10-08 国网山东省电力公司汶上县供电公司 Abnormal state monitoring method for power transmission equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734220A (en) * 2018-05-23 2018-11-02 山东师范大学 Adaptive Financial Time Series Forecasting method based on k lines cluster and intensified learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11074495B2 (en) * 2013-02-28 2021-07-27 Z Advanced Computing, Inc. (Zac) System and method for extremely efficient image and pattern recognition and artificial intelligence platform
US20180284741A1 (en) * 2016-05-09 2018-10-04 StrongForce IoT Portfolio 2016, LLC Methods and systems for industrial internet of things data collection for a chemical production process
US11205103B2 (en) * 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis
US10565318B2 (en) * 2017-04-14 2020-02-18 Salesforce.Com, Inc. Neural machine translation with latent tree attention

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734220A (en) * 2018-05-23 2018-11-02 山东师范大学 Adaptive Financial Time Series Forecasting method based on k lines cluster and intensified learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Prediction of Bearing Performance Degradation with Bottleneck Feature based on LSTM Network";Gang Tang等;《2018 IEEE International Instrumentation and Measurement Technology Conference》;20180712;第1-7页 *
"基于多尺度形态分解谱熵的电机轴承预测特征提取及退化状态评估";王冰等;《振动与冲击》;20131130;第32卷(第22期);第124-128,139页 *
基于量子加权长短时记忆神经网络的状态退化趋势预测;李锋等;《仪器仪表学报》;20180731;第39卷(第07期);第217-225页 *

Also Published As

Publication number Publication date
CN109766745A (en) 2019-05-17

Similar Documents

Publication Publication Date Title
CN109766745B (en) Reinforced learning tri-state combined long-time and short-time memory neural network system and training and predicting method
CN109753872B (en) Reinforced learning unit matching cyclic neural network system and training and predicting method thereof
Liao et al. Uncertainty prediction of remaining useful life using long short-term memory network based on bootstrap method
Yang et al. A comparison between extreme learning machine and artificial neural network for remaining useful life prediction
CN110059867B (en) Wind speed prediction method combining SWLSTM and GPR
CN111813084A (en) Mechanical equipment fault diagnosis method based on deep learning
CN110309537B (en) Intelligent health prediction method and system for aircraft
Sari et al. Enabling external factors for inflation rate forecasting using fuzzy neural system
CN111079926B (en) Equipment fault diagnosis method with self-adaptive learning rate based on deep learning
CN111523727B (en) Method for predicting remaining life of battery by considering recovery effect based on uncertain process
Liu et al. Multiple sensors based prognostics with prediction interval optimization via echo state Gaussian process
González-Carrasco et al. SEffEst: Effort estimation in software projects using fuzzy logic and neural networks
CN113344288A (en) Method and device for predicting water level of cascade hydropower station group and computer readable storage medium
CN115062528A (en) Prediction method for industrial process time sequence data
CN109447305B (en) Trend prediction method based on quantum weighted long-time and short-time memory neural network
Kampouropoulos et al. An energy prediction method using adaptive neuro-fuzzy inference system and genetic algorithms
CN110059871B (en) Photovoltaic power generation power prediction method
CN116663419A (en) Sensorless equipment fault prediction method based on optimized Elman neural network
CN113837443B (en) Substation line load prediction method based on depth BiLSTM
CN111414927A (en) Method for evaluating seawater quality
CN115860232A (en) Steam load prediction method, system, electronic device and medium
Pan et al. Bearing condition prediction using enhanced online learning fuzzy neural networks
Zhang et al. Short-term wind power interval prediction based on gd-lstm and bootstrap techniques
Doudkin et al. Spacecraft Telemetry Time Series Forecasting With Ensembles of Neural Networks
CN112365022A (en) Engine bearing fault prediction method based on multiple stages

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant