CN108008627B - Parallel optimization reinforcement learning self-adaptive PID control method - Google Patents
Parallel optimization reinforcement learning self-adaptive PID control method Download PDFInfo
- Publication number
- CN108008627B CN108008627B CN201711325553.4A CN201711325553A CN108008627B CN 108008627 B CN108008627 B CN 108008627B CN 201711325553 A CN201711325553 A CN 201711325553A CN 108008627 B CN108008627 B CN 108008627B
- Authority
- CN
- China
- Prior art keywords
- pid
- control
- output
- parameters
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 230000002787 reinforcement Effects 0.000 title claims abstract description 5
- 238000005457 optimization Methods 0.000 title claims description 8
- 230000006870 function Effects 0.000 claims abstract description 29
- 239000013598 vector Substances 0.000 claims abstract description 29
- 238000012546 transfer Methods 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 14
- 230000008859 change Effects 0.000 claims abstract description 13
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 9
- 238000012360 testing method Methods 0.000 claims abstract description 9
- 230000000694 effects Effects 0.000 claims abstract description 7
- 230000003044 adaptive effect Effects 0.000 claims description 38
- 238000013528 artificial neural network Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 12
- 210000004556 brain Anatomy 0.000 claims description 10
- 238000009826 distribution Methods 0.000 claims description 4
- 239000003795 chemical substances by application Substances 0.000 claims description 3
- 239000004744 fabric Substances 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 abstract description 2
- 230000004044 response Effects 0.000 description 10
- 230000008901 benefit Effects 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 241000219000 Populus Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B11/00—Automatic controllers
- G05B11/01—Automatic controllers electric
- G05B11/36—Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential
- G05B11/42—Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential for obtaining a characteristic which is both proportional and time-dependent, e.g. P. I., P. I. D.
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses a parallel optimized reinforcement learning self-adaptive PID control method, which is characterized by comprising the following steps: step S1: discretizing a transfer function by using matlab software through a zero-order keeper method, and initializing controller parameters and M control threads to perform parallel learning; step S2: defining a transfer function for transferring the input signal to S1, calculating an output value, and taking the difference value of the input signal and the output signal as an input vector of a control algorithm; and step S3: transferring the input vector to an improved self-adaptive PID controller for training, and iterating for N times to obtain a trained model; and step S4: carrying out control test by using the trained model, and recording input and output signals and the change value of a PID parameter; step S5: visual test data and control effect comparison. The invention better solves the problems of the prior self-adaptive PID, and improves the stability and the learning efficiency of the algorithm by utilizing the characteristic of multi-thread parallel learning of A3C learning.
Description
Technical Field
The invention relates to a self-adaptive PID control method, belongs to the technical field of control, and particularly relates to an improved self-adaptive PID (proportional-integral-derivative) control algorithm based on a parallel optimized actuator evaluator.
Background
A PID (Proportional/Integral/Differential) control system is a linear controller, which is controlled according to the deviation principle, and has the advantages of simple principle, strong robustness, simple setting, no need of obtaining an accurate mathematical model of an object, and the like, so that the PID control system is the most commonly used control system in industrial control. In the engineering practice of PID control system parameter setting, particularly the engineering practice of PID control parameter setting of a linear, time-invariant and weak time-lag system, the traditional setting method obtains abundant experience and is widely applied. However, in the practical industrial process control engineering practice, many controlled objects have the characteristics of time-varying uncertainty, pure hysteresis and the like, and the control process mechanism is complex; under the influence of factors such as noise and load disturbance, process parameters and even model structures can change. Therefore, online adjustment of PID parameters is required to meet the requirement of real-time control. Under the condition, the traditional parameter setting method is difficult to meet the requirements of engineering practice, and shows great limitations.
Adaptive PID control techniques are an effective way to address such problems. The adaptive PID control model takes advantage of both the adaptive control philosophy and the conventional PID controller. Firstly, the method is an adaptive controller and has the advantages of automatically identifying a controlled process, automatically setting controller parameters, adapting to parameter change of the controlled process and the like; and secondly, the controller has the advantages of simple structure, good robustness, high reliability and the like of the conventional PID controller. Due to the advantages, the industrial process control device is ideal in engineering practice. After the adaptive PID control is proposed, the fuzzy adaptive PID controller, the neural network adaptive PID controller and the Actor-Critic adaptive PID controller are successively proposed under the research of a wide range of scholars.
For example, document 1: liu Guo Rong, yang Xianhui fuzzy adaptive PID controller [ J ] control and decision, 1995 (6) proposed a fuzzy rule based adaptive PID controller, whose main idea is: when the system gives sudden change, state interference or structure interference, the transient response can be divided into 9 conditions, and after the system response is obtained at each sampling moment, the control strength can be properly increased or decreased by using a fuzzy control method according to the given condition and the change trend of the system response at the moment and the existing system control knowledge so as to prevent the response from changing towards the given direction and enable the output to tend to be given as soon as possible. However, the control method requires the experience of professionals and parameter optimization to control a complex system, and the inaccurate control effect set by the fuzzy rule cannot achieve the satisfactory effect.
Document 3 shows a loose learning, populus, adaptive PID control [ J ] based on actuator-evaluator learning, a control theory and application, 2011 proposes adaptive PID control of an Actor-critical structure. The control idea is as follows: the PID parameter is adaptively adjusted by using the model-free online learning capability of AC learning, the strategy function of the Actor and the value function of Critic are learned simultaneously by adopting one RBF network, the defect that the traditional PID controller is difficult to adjust the parameter online in real time is overcome, and the method has the advantages of strong response speed adaptive capability and the like. But instability of the AC learning structure itself often causes the algorithm to be difficult to converge.
Patent CN201510492758 discloses an actuator adaptive PID control method, which combines an expert PID controller and a fuzzy PID controller and is respectively connected with an actuator, and the actuator selects the expert PID controller or the fuzzy PID controller according to current state information and expected information, and although this controller can reduce overshoot and has the characteristic of high control precision, this controller still needs a lot of prior knowledge of professionals to make a decision on the use of the controller.
Disclosure of Invention
The invention aims at: aiming at the characteristics of adaptive PID control, a method of adaptive PID control (A3C) based on parallel optimization actuator evaluator learning is provided, and the method is used for controlling a system in industry. The invention better solves the problems of the prior self-adaptive PID, and improves the stability and the learning efficiency of the algorithm by utilizing the characteristic of multi-thread parallel learning of A3C learning. The adaptive PID controller based on A3C has the advantages of high response speed, strong adaptive capacity, strong anti-interference capacity and the like.
The self-adaptive PID control method based on the parallel optimization and the learning of the actuator evaluator comprises the following steps:
step S1: using MATLAB (MATLAB, commercial mathematical software produced by MathWorks company in America) software to define a continuous transfer function of any order of a controlled system, discretizing the continuous transfer function by a zero-order keeper method to obtain a discretized transfer function of a user-defined time interval, initializing controller parameters and M control threads to perform parallel learning, wherein the parameters mainly comprise BP neural network parameters and PID control environment parameters, and each thread is an independent control Agent;
step S2: in step S1, after initializing a BP neural network weight parameter and a control object of a PID controller, defining a discrete input signal RIN, sequentially transmitting the discrete input signal into a discrete transfer function according to a defined time interval, calculating an output value of the transfer function, and taking a difference value of the input signal and the output signal as an input vector x (t) of an A3C adaptive PID control algorithm;
and step S3: transmitting the input vector x (t) obtained in the step S2 into a built A3C self-adaptive PID control system for iterative training, and obtaining a trained model after iterating for N times;
step S31: calculating the current error e (t), the primary error delta e (t) and the secondary error delta e 2 (t) as input vector x (t) = [ e (t), Δ 2 e(t)] T And normalizing the same by a sigmod function;
step S32: and transmitting the input vector to an Actor network of each thread, and obtaining a new parameter of the PID. The Actor network outputs the mean value and the variance of Gaussian distribution of three parameters of the PID instead of directly outputting the parameter value of the PID, and estimates the three parameter values through the Gaussian distribution of the three parameters, wherein when o =1,2,3, the output layer outputs the mean value of the PID parameters, and when o =4,5,6, the output layer outputs the variance of the PID parameters.Wherein the Actor network is a BP neural network with 3 layers: input with layer 1 as input layer and layer 2 as hidden layer
Output ho of hidden layer k (t)=min(max(hi k (t),0),6)k=1,2,3…20
Step S33: and the new PID parameters are given to the controller to obtain control output, the control error is calculated, and the reward value is calculated according to the environment reward function R (t). R (t) = alpha 1 r 1 (t)+α 2 r 2 (t) Vector value x' (t) to the next state;
step S34: the reward function R (t), the current state vector x (t), and the next state vector x' (t) are passed to the Critic network, which is similar to the Actor network except that there is only one output node. Critic network outputs the state values and calculates the TD error, δ TD =r(t)+γV(S t+1 ,W v ′)-V(S t ,W v ′);
Step S35: after the TD error is calculated, each Actor-critical network in the A3C structure does not directly update the own network weight, but updates Actor-critical network parameters stored in a central brain (Global-net) by using the own gradient, wherein the updating mode is thatW v =W v +α c dW v Wherein W is a Actor network weight, W 'stored for central brain' a Weight of Actor network for each AC fabric, W v Critic network weight, W 'stored by the Central brain' v Critic network weight, α, representing each AC structure a Alpha is the learning rate of Actor c For Critic learning rate, the central brain will pass an up-to-date parameter to each AC structure after update;
step S36: in order to complete a training process, the loop is iterated for N times, the training is quitted, and the model is saved.
And step S4: carrying out control test by using the trained model, and recording the change values of the input signal, the output signal and the PID parameter;
step S41: transmitting the input signal defined in the step S1 to a control model of the thread with the highest trained reward function;
step S42: s41, calculating current, primary and secondary errors as input vectors, inputting the input vectors into the selected control model, wherein the difference from the training process is that only PID parameter adjustment quantity output by an Actor network is needed, and the adjusted PID parameter is transmitted to a controller to obtain the output of the controller;
step S43: the input signal, the output signal, and the PID parameter variation value obtained in step S42 are saved.
Step S5: and (4) visualizing the experimental data obtained in the step S4 by using Matlab, wherein the experimental data comprise input signals, output signals and change values of PID parameters of the controller, and comparing the change values with fuzzy adaptive PID control and AC-PID adaptive PID control to control the effect.
Drawings
FIG. 1 is a schematic process flow diagram of the present invention.
FIG. 2 is a block diagram of an improved adaptive PID controller
FIG. 3 is an output signal of the improved controller using a step signal as an input signal
FIG. 4 shows the control quantity of the controller after improvement
FIG. 5 is a control error of the improved adaptive PID controller
FIG. 6 is a parameter adjustment curve of the A3C adaptive PID controller
FIG. 7 is a comparison of an improved controller with a fuzzy, AC architecture adaptive PID controller
FIG. 8 comparison and analysis of control experiments of different controllers
Detailed Description
The invention is further described below using MATLAB software in conjunction with FIGS. 1-5: the specific implementation scheme of the adaptive PID control based on the parallel optimization and the learning of the actuator evaluator comprises the following steps as shown in FIG. 1:
(1) And initializing parameters. Is selected by the control system asA third-order transfer function, the discrete time is set to be 0.001s, and the transfer function after discretization by adopting Z change is as follows: youtt (k) = -den (2) youtt (k-1) -den (3) youtt (k-2) -den (4) youtt (k-1) + num (2) u (k-1) + num (3) u (k-2) + num (4) u (k-3), the input signal is a step signal with the value equal to 1.0, the number of single training steps is 1000 steps, the time is 1.0s, 4 threads are initialized to represent 4 independent adaptive PID controllers, and training is carried out.
(2) An input vector is calculated. E (t) = rin (0) -yurt (0) =1.0 when t =0; e (t-1) =0; e (t-2) =0 input vector x (t) = [ e (t), Δ 2 e(t)] T Wherein e (t) = rin-yourt =1.0 Δ e (t) = e (t) -e (t-1) =1.0 Δ 2 e (t) = e (t) -2 × e (t-1) + e (t-2) =1.0; calculated x (t) = [1.0,1.0] T Normalizing by sigmod function to obtain final input vector x [ t ]]=[0.73,0.73,0.73] T 。
(3) And (5) training the model. The improved adaptive PID controller structure is shown in FIG. 2, after calculating the state vector, firstly transmitting the state vector to the Actor network, the Actor network outputting the mean value mu and variance sigma of the three parameters P, I and D, obtaining the actual parameter values of P, I and D according to Gaussian sampling, assigning the new parameter values to the incremental PID controller, and the controller calculating the control quantity u (t) according to the error and the new PID parameter
u(t)=u(t-1)+Δu(t)=u(t-1)+K I (t)e(t)+K P (t)Δe(t)+K D (t)Δ 2 e(t)
The discrete transfer function acted by the control quantity calculates an output signal value yourt (t + 1), an error value and a state vector at the next time t +1 according to the process of (1). In addition, the environment reward function calculates the reward value of the control Agent according to the error, and the reward function is as follows:
where α 1=0.6, α 2=0.4, e (t) =0.001
The reward function is an important component of reinforcement learning, after the reward value is obtained, the reward value and the state vector of the next moment are transmitted to the criticic network, the criticic network outputs the state values of the t moment and the t +1 moment, and the TD error is calculated, wherein the calculation formula is as follows: delta TD =r(t)+γV(S t+1 ,W v ′)-V(S t ,W v ′),W v ' is Critic network weight. Because the operation speeds of the threads are not synchronous, each controller does not fix the sequence to update the Actor network and Critic network parameters stored in Global Net in fig. 2, and the update formula is as follows: wherein W a Actor network weight, W 'stored by the Central brain' a Weight of Actor network for each AC fabric, W v Critic network weight, W 'stored for Central brain' v Critic network weight, α, representing each AC structure a =0.001 learning rate of Actor, α c Where 0.01 is Critic's learning rate, the algorithm reaches a steady state after 3000 iterations after one training has been completed.
(4) And collecting experimental data. And (3) using the trained controller model, and selecting the thread with the highest accumulated reward as a test controller when controlling the test because 4 threads are set for control training. And (4) carrying out control test according to the control parameters set in the step (1), wherein the control time is 1s, namely, carrying out 1000 times of control. According to the calculation mode in the step (2), calculating a state vector, transmitting the state vector into a trained model, enabling the Critic network to be out of action in the control test process, outputting P, I and D parameter values by an Actor, and storing the values of yourt, rin, u, P, I and D for visual analysis in the control test process.
(5) And (6) visualizing the data. And (3) performing visualization analysis on the data stored in the step (4) by using a matlab software visualization tool: as shown in FIG. 3, FIG. 3 shows y With an output value of outt, the controller can reach a steady state in less than 0.2s and has a fast regulation capability. Fig. 4 shows the output signal of the control variable of the controller, from which it can be derived that the controller can reach a stable state very quickly. Fig. 5 shows the control error of the controller, where the control error is equal to the input signal amount minus the output signal amount. Fig. 6 shows the variation of the parameters of the controllers P, I, and D, and it can be seen that there are different degrees of adjustment for the 3 parameters before the system is stabilized, and the parameters are not changed after the system is stabilized. The same control object and input signals are used for carrying out experimental comparison on the fuzzy adaptive PID controller and the Actor-critical adaptive PID controller, the signal output comparison diagram of the three controllers can be seen as an attached figure 7, the detailed control analysis can be seen as an attached figure 8, and as shown in the figure 8, the controller of the invention has smaller overshoot but faster response speed as the fuzzy PID controller while not needing too much professional priori knowledge, and has higher learning speed than the AC-PID controller, and the overshoot and the response speed both have great advantages.
The invention aims to solve the problems of the traditional adaptive PID controller, the fuzzy adaptive PID and the expert adaptive PID controller need related knowledge of a large number of professionals, and teacher signals of the neural network adaptive PID controller are difficult to obtain. And the learning algorithm is used for parallel learning in multiple threads of the CPU, so that the learning rate of the AC-PID controller is greatly improved, and a better control effect is achieved. The specific control effect comparison can be seen in fig. 7, and fig. 7 shows three controllers selected: the fuzzy PID controller, the AC-PID controller and the A3C-PID controller of the invention carry out control comparison under the same parameters, and detailed control analysis can be seen in the attached figure 8: the controller of the invention does not need too many professionals' prior knowledge, has smaller overshoot but faster response speed as the fuzzy PID controller, and has higher learning speed than the AC-PID controller, and both overshoot and response speed have great advantages.
The present invention is not limited to the above embodiments, and various other equivalent modifications, substitutions and alterations can be made without departing from the basic technical concept of the invention according to the common technical knowledge and conventional means in the field.
Claims (2)
1. A reinforcement learning self-adaptive PID control method for parallel optimization is characterized by comprising the following steps:
step S1: using MATLAB software to define an arbitrary order continuous transfer function of a controlled system, discretizing the transfer function by a zero order keeper method to obtain a discretized transfer function with a custom time interval, initializing controller parameters and M control threads for parallel learning, wherein the parameters mainly comprise BP neural network parameters and PID control environment parameters, and each thread is an independent control Agent;
step S2: after initializing a BP neural network weight parameter and a control object of a PID controller, defining a discrete input signal RIN, sequentially transmitting the discrete input signal into a discrete transfer function according to a defined time interval, calculating an output value of the transfer function, and taking a difference value of the input signal and the output signal as an input vector x (t) of an A3C self-adaptive PID control algorithm;
and step S3: transmitting the input vector x (t) obtained in the step S2 into a built A3C self-adaptive PID control system for iterative training, and obtaining a trained model after iterating for N times;
step S31: calculating the current error e (t), the primary error delta e (t) and the secondary error delta e 2 (t) as input vector x (t) = [ e (t), Δ 2 e(t)]T, and normalizing the T by a sigmod function;
step S32: the method comprises the steps of transmitting an input vector to an Actor network of each thread and obtaining new parameters of the PID, wherein the Actor network does not directly output parameter values of the PID but outputs the mean value and variance of Gaussian distribution of three parameters of the PID, the three parameter values are estimated through the Gaussian distribution of the three parameters, when o =1,2,3, the mean value of the PID parameters is output by an output layer, and when o =4,5,6, the variance of the PID parameters is output, wherein the Actor network is a BP neural network with 3 layers: input with layer 1 as input layer and layer 2 as hidden layer
The output hok (t) = min (max (hik (t), 0), 6) k =1,2,3 \823020of the hidden layer,
layer 3 is the output layer, the input of which
Output of the output layer
Step S33: giving a new PID parameter to a controller to obtain control output, calculating a control error, and calculating a reward value according to an environment reward function R (t), wherein R (t) = alpha 1R1 (t) + alpha 2R2 (t) until a vector value x' (t) of a next state;
step S34: passing the reward function R (t), the current state vector x (t), and the next state vector x ' (t) to a Critic network, which is similar in structure to the Actor network except that there is only one output node, the Critic network primarily outputs the state value and calculates the TD error, δ TD = R (t) + γ V (St +1, wv ') -V (St, wv ');
step S35: after the TD error is calculated, each Actor-critical network in the A3C structure does not directly update its own network weight, but updates Actor-critical network parameters stored in the central brain (Global-net) by using its own gradient, in a manner of Wat +1= Wat + α adWat, wvt +1= Wvt + α cdWvt, where t and t +1 represent different times, wa is the Actor network weight stored in the central brain,
w' a is the weight of the Actor network for each AC fabric, wv is the Critic network weight stored by the central brain,w' v represents a Critic network weight of each AC structure, alpha a is the learning rate of the Actor, alpha c is the learning rate of the Critic, and after updating, the central brain transmits a latest parameter to each AC structure;
step S36: in order to complete a training process, the loop is iterated for N times, the training is quitted, and the model is saved;
and step S4: carrying out control test by using the trained model, and recording the change values of the input signal, the output signal and the PID parameter;
step S5: and (4) visualizing the experimental data obtained in the step S4 by using Matlab, wherein the experimental data comprise input signals, output signals and change values of PID parameters of the controller, and comparing the change values with fuzzy adaptive PID control and AC-PID adaptive PID control to control the effect.
2. The robust learning adaptive PID control method of the parallel optimization according to the claim 1, wherein the step S4 comprises the following steps:
step S41: transmitting the input signal defined in the step S1 to a control model of the thread with the highest trained reward function;
step S42: s41, calculating current, primary and secondary errors as input vectors, inputting the input vectors into the selected control model, wherein the difference from the training process is that only PID parameter adjustment quantity output by an Actor network is needed, and the adjusted PID parameter is transmitted to a controller to obtain the output of the controller;
step S43: the input signal, the output signal, and the PID parameter variation value obtained in step S42 are saved.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711325553.4A CN108008627B (en) | 2017-12-13 | 2017-12-13 | Parallel optimization reinforcement learning self-adaptive PID control method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711325553.4A CN108008627B (en) | 2017-12-13 | 2017-12-13 | Parallel optimization reinforcement learning self-adaptive PID control method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108008627A CN108008627A (en) | 2018-05-08 |
CN108008627B true CN108008627B (en) | 2022-10-28 |
Family
ID=62058629
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711325553.4A Active CN108008627B (en) | 2017-12-13 | 2017-12-13 | Parallel optimization reinforcement learning self-adaptive PID control method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108008627B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107346138B (en) * | 2017-06-16 | 2020-05-05 | 武汉理工大学 | Unmanned ship lateral control method based on reinforcement learning algorithm |
CN109063823B (en) * | 2018-07-24 | 2022-06-07 | 北京工业大学 | Batch A3C reinforcement learning method for exploring 3D maze by intelligent agent |
CN108803348B (en) * | 2018-08-03 | 2021-07-13 | 北京深度奇点科技有限公司 | PID parameter optimization method and PID parameter optimization device |
CN109521669A (en) * | 2018-11-12 | 2019-03-26 | 中国航空工业集团公司北京航空精密机械研究所 | A kind of turning table control methods of self-tuning based on intensified learning |
CN109696830B (en) * | 2019-01-31 | 2021-12-03 | 天津大学 | Reinforced learning self-adaptive control method of small unmanned helicopter |
CN110308655B (en) * | 2019-07-02 | 2020-10-23 | 西安交通大学 | Servo system compensation method based on A3C algorithm |
CN110376879B (en) * | 2019-08-16 | 2022-05-10 | 哈尔滨工业大学(深圳) | PID type iterative learning control method based on neural network |
CN112631120B (en) * | 2019-10-09 | 2022-05-17 | Oppo广东移动通信有限公司 | PID control method, device and video coding and decoding system |
CN111079936B (en) * | 2019-11-06 | 2023-03-14 | 中国科学院自动化研究所 | Wave fin propulsion underwater operation robot tracking control method based on reinforcement learning |
CN111856920A (en) * | 2020-07-24 | 2020-10-30 | 重庆红江机械有限责任公司 | A3C-PID-based self-adaptive rail pressure adjusting method and storage medium |
CN112162861B (en) * | 2020-09-29 | 2024-04-19 | 广州虎牙科技有限公司 | Thread allocation method, thread allocation device, computer equipment and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102588129A (en) * | 2012-02-07 | 2012-07-18 | 上海艾铭思汽车控制系统有限公司 | Optimization cooperative control method for discharge of nitrogen oxides and particles of high-pressure common-rail diesel |
-
2017
- 2017-12-13 CN CN201711325553.4A patent/CN108008627B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102588129A (en) * | 2012-02-07 | 2012-07-18 | 上海艾铭思汽车控制系统有限公司 | Optimization cooperative control method for discharge of nitrogen oxides and particles of high-pressure common-rail diesel |
Non-Patent Citations (5)
Title |
---|
A Proposal of Adaptive PID Controller Based on Reinforcement Learning;WANG Xue-song等;《Journal of China University of Mining & Technology》;20070331;全文 * |
基于AC-PID控制器的焊接机器人仿真;张超等;《焊接技术》;20130728(第07期);全文 * |
基于执行器-评价器学习的自适应PID控制;陈学松等;《控制理论与应用》;20110815(第08期);全文 * |
多目标执行依赖启发式动态规划励磁控制;林小峰等;《电力系统及其自动化学报》;20120615(第03期);全文 * |
强化学习及其在机器人系统中的应用研究;陈学松;《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》;20111015;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108008627A (en) | 2018-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108008627B (en) | Parallel optimization reinforcement learning self-adaptive PID control method | |
CN111474965B (en) | Fuzzy neural network-based method for predicting and controlling water level of series water delivery channel | |
Chen et al. | Adaptive optimal tracking control of an underactuated surface vessel using actor–critic reinforcement learning | |
Yang et al. | Control of nonaffine nonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks | |
Perrusquia et al. | Discrete-time H 2 neural control using reinforcement learning | |
CN115167102A (en) | Reinforced learning self-adaptive PID control method based on parallel dominant motion evaluation | |
Ferrari et al. | Adaptive feedback control by constrained approximate dynamic programming | |
CN109062040B (en) | PID (proportion integration differentiation) predicting method based on system nesting optimization | |
Nguyen et al. | On-policy and off-policy Q-learning strategies for spacecraft systems: An approach for time-varying discrete-time without controllability assumption of augmented system | |
CN105676645A (en) | Double-loop water tank liquid level prediction control method based on function type weight RBF-ARX model | |
Xia et al. | Adaptive quantized output feedback DSC of uncertain systems with output constraints and unmodeled dynamics based on reduced-order K-filters | |
Han et al. | Symmetric actor–critic deep reinforcement learning for cascade quadrotor flight control | |
Scheurenberg et al. | Data Enhanced Model Predictive Control of a Coupled Tank System | |
Hager et al. | Adaptive Neural network control of a helicopter system with optimal observer and actor-critic design | |
CN106033189A (en) | Flight robot pose nerve network prediction controller | |
CN111240201B (en) | Disturbance suppression control method | |
Kamalapurkar et al. | State following (StaF) kernel functions for function approximation part II: Adaptive dynamic programming | |
CN116880191A (en) | Intelligent control method of process industrial production system based on time sequence prediction | |
Kosmatopoulos | Control of unknown nonlinear systems with efficient transient performance using concurrent exploitation and exploration | |
CN116594288A (en) | Control method and system based on longhorn beetle whisker fuzzy PID | |
Park et al. | Linear quadratic tracker with integrator using integral reinforcement learning | |
EP2778947B1 (en) | Sequential deterministic optimization based control system and method | |
CN114193458A (en) | Robot control method based on Gaussian process online learning | |
Abouheaf et al. | Neurofuzzy reinforcement learning control schemes for optimized dynamical performance | |
Kang et al. | Adaptive fuzzy finite‐time prescribed performance control for uncertain nonlinear systems with actuator saturation and unmodeled dynamics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |