CN108008627B - Parallel optimization reinforcement learning self-adaptive PID control method - Google Patents

Parallel optimization reinforcement learning self-adaptive PID control method Download PDF

Info

Publication number
CN108008627B
CN108008627B CN201711325553.4A CN201711325553A CN108008627B CN 108008627 B CN108008627 B CN 108008627B CN 201711325553 A CN201711325553 A CN 201711325553A CN 108008627 B CN108008627 B CN 108008627B
Authority
CN
China
Prior art keywords
pid
control
output
parameters
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711325553.4A
Other languages
Chinese (zh)
Other versions
CN108008627A (en
Inventor
孙歧峰
任辉
段友祥
李洪强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN201711325553.4A priority Critical patent/CN108008627B/en
Publication of CN108008627A publication Critical patent/CN108008627A/en
Application granted granted Critical
Publication of CN108008627B publication Critical patent/CN108008627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B11/00Automatic controllers
    • G05B11/01Automatic controllers electric
    • G05B11/36Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential
    • G05B11/42Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential for obtaining a characteristic which is both proportional and time-dependent, e.g. P. I., P. I. D.
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/027Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a parallel optimized reinforcement learning self-adaptive PID control method, which is characterized by comprising the following steps: step S1: discretizing a transfer function by using matlab software through a zero-order keeper method, and initializing controller parameters and M control threads to perform parallel learning; step S2: defining a transfer function for transferring the input signal to S1, calculating an output value, and taking the difference value of the input signal and the output signal as an input vector of a control algorithm; and step S3: transferring the input vector to an improved self-adaptive PID controller for training, and iterating for N times to obtain a trained model; and step S4: carrying out control test by using the trained model, and recording input and output signals and the change value of a PID parameter; step S5: visual test data and control effect comparison. The invention better solves the problems of the prior self-adaptive PID, and improves the stability and the learning efficiency of the algorithm by utilizing the characteristic of multi-thread parallel learning of A3C learning.

Description

Parallel optimization reinforcement learning self-adaptive PID control method
Technical Field
The invention relates to a self-adaptive PID control method, belongs to the technical field of control, and particularly relates to an improved self-adaptive PID (proportional-integral-derivative) control algorithm based on a parallel optimized actuator evaluator.
Background
A PID (Proportional/Integral/Differential) control system is a linear controller, which is controlled according to the deviation principle, and has the advantages of simple principle, strong robustness, simple setting, no need of obtaining an accurate mathematical model of an object, and the like, so that the PID control system is the most commonly used control system in industrial control. In the engineering practice of PID control system parameter setting, particularly the engineering practice of PID control parameter setting of a linear, time-invariant and weak time-lag system, the traditional setting method obtains abundant experience and is widely applied. However, in the practical industrial process control engineering practice, many controlled objects have the characteristics of time-varying uncertainty, pure hysteresis and the like, and the control process mechanism is complex; under the influence of factors such as noise and load disturbance, process parameters and even model structures can change. Therefore, online adjustment of PID parameters is required to meet the requirement of real-time control. Under the condition, the traditional parameter setting method is difficult to meet the requirements of engineering practice, and shows great limitations.
Adaptive PID control techniques are an effective way to address such problems. The adaptive PID control model takes advantage of both the adaptive control philosophy and the conventional PID controller. Firstly, the method is an adaptive controller and has the advantages of automatically identifying a controlled process, automatically setting controller parameters, adapting to parameter change of the controlled process and the like; and secondly, the controller has the advantages of simple structure, good robustness, high reliability and the like of the conventional PID controller. Due to the advantages, the industrial process control device is ideal in engineering practice. After the adaptive PID control is proposed, the fuzzy adaptive PID controller, the neural network adaptive PID controller and the Actor-Critic adaptive PID controller are successively proposed under the research of a wide range of scholars.
For example, document 1: liu Guo Rong, yang Xianhui fuzzy adaptive PID controller [ J ] control and decision, 1995 (6) proposed a fuzzy rule based adaptive PID controller, whose main idea is: when the system gives sudden change, state interference or structure interference, the transient response can be divided into 9 conditions, and after the system response is obtained at each sampling moment, the control strength can be properly increased or decreased by using a fuzzy control method according to the given condition and the change trend of the system response at the moment and the existing system control knowledge so as to prevent the response from changing towards the given direction and enable the output to tend to be given as soon as possible. However, the control method requires the experience of professionals and parameter optimization to control a complex system, and the inaccurate control effect set by the fuzzy rule cannot achieve the satisfactory effect.
Document 2 discloses a study [ J ] of BP neural network-based PID parameter self-tuning, a systematic simulation report, 2005 proposes adaptive PID control based on BP neural network, and its control idea is: the neural network identifier transmits the control deviation back to the neural network neurons to correct the weight of the neural network, the deviation between the set input of the object and the actual output of the object is reversely transmitted to the neural network controller after passing through the identifier, the neural network identifier corrects the network weight by using an error signal, and the neural network identifier can gradually follow the change of a system after being learned for many times. The method generally adopts supervised learning to optimize parameters, but teacher signals are difficult to acquire.
Document 3 shows a loose learning, populus, adaptive PID control [ J ] based on actuator-evaluator learning, a control theory and application, 2011 proposes adaptive PID control of an Actor-critical structure. The control idea is as follows: the PID parameter is adaptively adjusted by using the model-free online learning capability of AC learning, the strategy function of the Actor and the value function of Critic are learned simultaneously by adopting one RBF network, the defect that the traditional PID controller is difficult to adjust the parameter online in real time is overcome, and the method has the advantages of strong response speed adaptive capability and the like. But instability of the AC learning structure itself often causes the algorithm to be difficult to converge.
Patent CN201510492758 discloses an actuator adaptive PID control method, which combines an expert PID controller and a fuzzy PID controller and is respectively connected with an actuator, and the actuator selects the expert PID controller or the fuzzy PID controller according to current state information and expected information, and although this controller can reduce overshoot and has the characteristic of high control precision, this controller still needs a lot of prior knowledge of professionals to make a decision on the use of the controller.
Disclosure of Invention
The invention aims at: aiming at the characteristics of adaptive PID control, a method of adaptive PID control (A3C) based on parallel optimization actuator evaluator learning is provided, and the method is used for controlling a system in industry. The invention better solves the problems of the prior self-adaptive PID, and improves the stability and the learning efficiency of the algorithm by utilizing the characteristic of multi-thread parallel learning of A3C learning. The adaptive PID controller based on A3C has the advantages of high response speed, strong adaptive capacity, strong anti-interference capacity and the like.
The self-adaptive PID control method based on the parallel optimization and the learning of the actuator evaluator comprises the following steps:
step S1: using MATLAB (MATLAB, commercial mathematical software produced by MathWorks company in America) software to define a continuous transfer function of any order of a controlled system, discretizing the continuous transfer function by a zero-order keeper method to obtain a discretized transfer function of a user-defined time interval, initializing controller parameters and M control threads to perform parallel learning, wherein the parameters mainly comprise BP neural network parameters and PID control environment parameters, and each thread is an independent control Agent;
step S2: in step S1, after initializing a BP neural network weight parameter and a control object of a PID controller, defining a discrete input signal RIN, sequentially transmitting the discrete input signal into a discrete transfer function according to a defined time interval, calculating an output value of the transfer function, and taking a difference value of the input signal and the output signal as an input vector x (t) of an A3C adaptive PID control algorithm;
and step S3: transmitting the input vector x (t) obtained in the step S2 into a built A3C self-adaptive PID control system for iterative training, and obtaining a trained model after iterating for N times;
step S31: calculating the current error e (t), the primary error delta e (t) and the secondary error delta e 2 (t) as input vector x (t) = [ e (t), Δ 2 e(t)] T And normalizing the same by a sigmod function;
step S32: and transmitting the input vector to an Actor network of each thread, and obtaining a new parameter of the PID. The Actor network outputs the mean value and the variance of Gaussian distribution of three parameters of the PID instead of directly outputting the parameter value of the PID, and estimates the three parameter values through the Gaussian distribution of the three parameters, wherein when o =1,2,3, the output layer outputs the mean value of the PID parameters, and when o =4,5,6, the output layer outputs the variance of the PID parameters.Wherein the Actor network is a BP neural network with 3 layers: input with layer 1 as input layer and layer 2 as hidden layer
Figure BDA0001505583910000031
Output ho of hidden layer k (t)=min(max(hi k (t),0),6)k=1,2,3…20
Layer 3 is the output layer, the input of which
Figure BDA0001505583910000032
Output of the output layer
Figure BDA0001505583910000033
Step S33: and the new PID parameters are given to the controller to obtain control output, the control error is calculated, and the reward value is calculated according to the environment reward function R (t). R (t) = alpha 1 r 1 (t)+α 2 r 2 (t)
Figure BDA0001505583910000034
Figure BDA0001505583910000035
Vector value x' (t) to the next state;
step S34: the reward function R (t), the current state vector x (t), and the next state vector x' (t) are passed to the Critic network, which is similar to the Actor network except that there is only one output node. Critic network outputs the state values and calculates the TD error, δ TD =r(t)+γV(S t+1 ,W v ′)-V(S t ,W v ′);
Step S35: after the TD error is calculated, each Actor-critical network in the A3C structure does not directly update the own network weight, but updates Actor-critical network parameters stored in a central brain (Global-net) by using the own gradient, wherein the updating mode is that
Figure BDA0001505583910000036
W v =W vc dW v Wherein W is a Actor network weight, W 'stored for central brain' a Weight of Actor network for each AC fabric, W v Critic network weight, W 'stored by the Central brain' v Critic network weight, α, representing each AC structure a Alpha is the learning rate of Actor c For Critic learning rate, the central brain will pass an up-to-date parameter to each AC structure after update;
step S36: in order to complete a training process, the loop is iterated for N times, the training is quitted, and the model is saved.
And step S4: carrying out control test by using the trained model, and recording the change values of the input signal, the output signal and the PID parameter;
step S41: transmitting the input signal defined in the step S1 to a control model of the thread with the highest trained reward function;
step S42: s41, calculating current, primary and secondary errors as input vectors, inputting the input vectors into the selected control model, wherein the difference from the training process is that only PID parameter adjustment quantity output by an Actor network is needed, and the adjusted PID parameter is transmitted to a controller to obtain the output of the controller;
step S43: the input signal, the output signal, and the PID parameter variation value obtained in step S42 are saved.
Step S5: and (4) visualizing the experimental data obtained in the step S4 by using Matlab, wherein the experimental data comprise input signals, output signals and change values of PID parameters of the controller, and comparing the change values with fuzzy adaptive PID control and AC-PID adaptive PID control to control the effect.
Drawings
FIG. 1 is a schematic process flow diagram of the present invention.
FIG. 2 is a block diagram of an improved adaptive PID controller
FIG. 3 is an output signal of the improved controller using a step signal as an input signal
FIG. 4 shows the control quantity of the controller after improvement
FIG. 5 is a control error of the improved adaptive PID controller
FIG. 6 is a parameter adjustment curve of the A3C adaptive PID controller
FIG. 7 is a comparison of an improved controller with a fuzzy, AC architecture adaptive PID controller
FIG. 8 comparison and analysis of control experiments of different controllers
Detailed Description
The invention is further described below using MATLAB software in conjunction with FIGS. 1-5: the specific implementation scheme of the adaptive PID control based on the parallel optimization and the learning of the actuator evaluator comprises the following steps as shown in FIG. 1:
(1) And initializing parameters. Is selected by the control system as
Figure BDA0001505583910000041
A third-order transfer function, the discrete time is set to be 0.001s, and the transfer function after discretization by adopting Z change is as follows: youtt (k) = -den (2) youtt (k-1) -den (3) youtt (k-2) -den (4) youtt (k-1) + num (2) u (k-1) + num (3) u (k-2) + num (4) u (k-3), the input signal is a step signal with the value equal to 1.0, the number of single training steps is 1000 steps, the time is 1.0s, 4 threads are initialized to represent 4 independent adaptive PID controllers, and training is carried out.
(2) An input vector is calculated. E (t) = rin (0) -yurt (0) =1.0 when t =0; e (t-1) =0; e (t-2) =0 input vector x (t) = [ e (t), Δ 2 e(t)] T Wherein e (t) = rin-yourt =1.0 Δ e (t) = e (t) -e (t-1) =1.0 Δ 2 e (t) = e (t) -2 × e (t-1) + e (t-2) =1.0; calculated x (t) = [1.0,1.0] T Normalizing by sigmod function to obtain final input vector x [ t ]]=[0.73,0.73,0.73] T
(3) And (5) training the model. The improved adaptive PID controller structure is shown in FIG. 2, after calculating the state vector, firstly transmitting the state vector to the Actor network, the Actor network outputting the mean value mu and variance sigma of the three parameters P, I and D, obtaining the actual parameter values of P, I and D according to Gaussian sampling, assigning the new parameter values to the incremental PID controller, and the controller calculating the control quantity u (t) according to the error and the new PID parameter
u(t)=u(t-1)+Δu(t)=u(t-1)+K I (t)e(t)+K P (t)Δe(t)+K D (t)Δ 2 e(t)
The discrete transfer function acted by the control quantity calculates an output signal value yourt (t + 1), an error value and a state vector at the next time t +1 according to the process of (1). In addition, the environment reward function calculates the reward value of the control Agent according to the error, and the reward function is as follows:
R(t)=α 1 r 1 (t)+α 2 r 2 (t)
Figure BDA0001505583910000051
where α 1=0.6, α 2=0.4, e (t) =0.001
The reward function is an important component of reinforcement learning, after the reward value is obtained, the reward value and the state vector of the next moment are transmitted to the criticic network, the criticic network outputs the state values of the t moment and the t +1 moment, and the TD error is calculated, wherein the calculation formula is as follows: delta TD =r(t)+γV(S t+1 ,W v ′)-V(S t ,W v ′),W v ' is Critic network weight. Because the operation speeds of the threads are not synchronous, each controller does not fix the sequence to update the Actor network and Critic network parameters stored in Global Net in fig. 2, and the update formula is as follows:
Figure BDA0001505583910000052
Figure BDA0001505583910000053
wherein W a Actor network weight, W 'stored by the Central brain' a Weight of Actor network for each AC fabric, W v Critic network weight, W 'stored for Central brain' v Critic network weight, α, representing each AC structure a =0.001 learning rate of Actor, α c Where 0.01 is Critic's learning rate, the algorithm reaches a steady state after 3000 iterations after one training has been completed.
(4) And collecting experimental data. And (3) using the trained controller model, and selecting the thread with the highest accumulated reward as a test controller when controlling the test because 4 threads are set for control training. And (4) carrying out control test according to the control parameters set in the step (1), wherein the control time is 1s, namely, carrying out 1000 times of control. According to the calculation mode in the step (2), calculating a state vector, transmitting the state vector into a trained model, enabling the Critic network to be out of action in the control test process, outputting P, I and D parameter values by an Actor, and storing the values of yourt, rin, u, P, I and D for visual analysis in the control test process.
(5) And (6) visualizing the data. And (3) performing visualization analysis on the data stored in the step (4) by using a matlab software visualization tool: as shown in FIG. 3, FIG. 3 shows y With an output value of outt, the controller can reach a steady state in less than 0.2s and has a fast regulation capability. Fig. 4 shows the output signal of the control variable of the controller, from which it can be derived that the controller can reach a stable state very quickly. Fig. 5 shows the control error of the controller, where the control error is equal to the input signal amount minus the output signal amount. Fig. 6 shows the variation of the parameters of the controllers P, I, and D, and it can be seen that there are different degrees of adjustment for the 3 parameters before the system is stabilized, and the parameters are not changed after the system is stabilized. The same control object and input signals are used for carrying out experimental comparison on the fuzzy adaptive PID controller and the Actor-critical adaptive PID controller, the signal output comparison diagram of the three controllers can be seen as an attached figure 7, the detailed control analysis can be seen as an attached figure 8, and as shown in the figure 8, the controller of the invention has smaller overshoot but faster response speed as the fuzzy PID controller while not needing too much professional priori knowledge, and has higher learning speed than the AC-PID controller, and the overshoot and the response speed both have great advantages.
The invention aims to solve the problems of the traditional adaptive PID controller, the fuzzy adaptive PID and the expert adaptive PID controller need related knowledge of a large number of professionals, and teacher signals of the neural network adaptive PID controller are difficult to obtain. And the learning algorithm is used for parallel learning in multiple threads of the CPU, so that the learning rate of the AC-PID controller is greatly improved, and a better control effect is achieved. The specific control effect comparison can be seen in fig. 7, and fig. 7 shows three controllers selected: the fuzzy PID controller, the AC-PID controller and the A3C-PID controller of the invention carry out control comparison under the same parameters, and detailed control analysis can be seen in the attached figure 8: the controller of the invention does not need too many professionals' prior knowledge, has smaller overshoot but faster response speed as the fuzzy PID controller, and has higher learning speed than the AC-PID controller, and both overshoot and response speed have great advantages.
The present invention is not limited to the above embodiments, and various other equivalent modifications, substitutions and alterations can be made without departing from the basic technical concept of the invention according to the common technical knowledge and conventional means in the field.

Claims (2)

1. A reinforcement learning self-adaptive PID control method for parallel optimization is characterized by comprising the following steps:
step S1: using MATLAB software to define an arbitrary order continuous transfer function of a controlled system, discretizing the transfer function by a zero order keeper method to obtain a discretized transfer function with a custom time interval, initializing controller parameters and M control threads for parallel learning, wherein the parameters mainly comprise BP neural network parameters and PID control environment parameters, and each thread is an independent control Agent;
step S2: after initializing a BP neural network weight parameter and a control object of a PID controller, defining a discrete input signal RIN, sequentially transmitting the discrete input signal into a discrete transfer function according to a defined time interval, calculating an output value of the transfer function, and taking a difference value of the input signal and the output signal as an input vector x (t) of an A3C self-adaptive PID control algorithm;
and step S3: transmitting the input vector x (t) obtained in the step S2 into a built A3C self-adaptive PID control system for iterative training, and obtaining a trained model after iterating for N times;
step S31: calculating the current error e (t), the primary error delta e (t) and the secondary error delta e 2 (t) as input vector x (t) = [ e (t), Δ 2 e(t)]T, and normalizing the T by a sigmod function;
step S32: the method comprises the steps of transmitting an input vector to an Actor network of each thread and obtaining new parameters of the PID, wherein the Actor network does not directly output parameter values of the PID but outputs the mean value and variance of Gaussian distribution of three parameters of the PID, the three parameter values are estimated through the Gaussian distribution of the three parameters, when o =1,2,3, the mean value of the PID parameters is output by an output layer, and when o =4,5,6, the variance of the PID parameters is output, wherein the Actor network is a BP neural network with 3 layers: input with layer 1 as input layer and layer 2 as hidden layer
Figure FDA0003848595440000011
The output hok (t) = min (max (hik (t), 0), 6) k =1,2,3 \823020of the hidden layer,
layer 3 is the output layer, the input of which
Figure FDA0003848595440000012
Output of the output layer
Figure FDA0003848595440000013
Step S33: giving a new PID parameter to a controller to obtain control output, calculating a control error, and calculating a reward value according to an environment reward function R (t), wherein R (t) = alpha 1R1 (t) + alpha 2R2 (t) until a vector value x' (t) of a next state;
Figure FDA0003848595440000021
Figure FDA0003848595440000022
step S34: passing the reward function R (t), the current state vector x (t), and the next state vector x ' (t) to a Critic network, which is similar in structure to the Actor network except that there is only one output node, the Critic network primarily outputs the state value and calculates the TD error, δ TD = R (t) + γ V (St +1, wv ') -V (St, wv ');
step S35: after the TD error is calculated, each Actor-critical network in the A3C structure does not directly update its own network weight, but updates Actor-critical network parameters stored in the central brain (Global-net) by using its own gradient, in a manner of Wat +1= Wat + α adWat, wvt +1= Wvt + α cdWvt, where t and t +1 represent different times, wa is the Actor network weight stored in the central brain,
Figure FDA0003848595440000023
w' a is the weight of the Actor network for each AC fabric, wv is the Critic network weight stored by the central brain,
Figure FDA0003848595440000024
w' v represents a Critic network weight of each AC structure, alpha a is the learning rate of the Actor, alpha c is the learning rate of the Critic, and after updating, the central brain transmits a latest parameter to each AC structure;
step S36: in order to complete a training process, the loop is iterated for N times, the training is quitted, and the model is saved;
and step S4: carrying out control test by using the trained model, and recording the change values of the input signal, the output signal and the PID parameter;
step S5: and (4) visualizing the experimental data obtained in the step S4 by using Matlab, wherein the experimental data comprise input signals, output signals and change values of PID parameters of the controller, and comparing the change values with fuzzy adaptive PID control and AC-PID adaptive PID control to control the effect.
2. The robust learning adaptive PID control method of the parallel optimization according to the claim 1, wherein the step S4 comprises the following steps:
step S41: transmitting the input signal defined in the step S1 to a control model of the thread with the highest trained reward function;
step S42: s41, calculating current, primary and secondary errors as input vectors, inputting the input vectors into the selected control model, wherein the difference from the training process is that only PID parameter adjustment quantity output by an Actor network is needed, and the adjusted PID parameter is transmitted to a controller to obtain the output of the controller;
step S43: the input signal, the output signal, and the PID parameter variation value obtained in step S42 are saved.
CN201711325553.4A 2017-12-13 2017-12-13 Parallel optimization reinforcement learning self-adaptive PID control method Active CN108008627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711325553.4A CN108008627B (en) 2017-12-13 2017-12-13 Parallel optimization reinforcement learning self-adaptive PID control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711325553.4A CN108008627B (en) 2017-12-13 2017-12-13 Parallel optimization reinforcement learning self-adaptive PID control method

Publications (2)

Publication Number Publication Date
CN108008627A CN108008627A (en) 2018-05-08
CN108008627B true CN108008627B (en) 2022-10-28

Family

ID=62058629

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711325553.4A Active CN108008627B (en) 2017-12-13 2017-12-13 Parallel optimization reinforcement learning self-adaptive PID control method

Country Status (1)

Country Link
CN (1) CN108008627B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107346138B (en) * 2017-06-16 2020-05-05 武汉理工大学 Unmanned ship lateral control method based on reinforcement learning algorithm
CN109063823B (en) * 2018-07-24 2022-06-07 北京工业大学 Batch A3C reinforcement learning method for exploring 3D maze by intelligent agent
CN108803348B (en) * 2018-08-03 2021-07-13 北京深度奇点科技有限公司 PID parameter optimization method and PID parameter optimization device
CN109521669A (en) * 2018-11-12 2019-03-26 中国航空工业集团公司北京航空精密机械研究所 A kind of turning table control methods of self-tuning based on intensified learning
CN109696830B (en) * 2019-01-31 2021-12-03 天津大学 Reinforced learning self-adaptive control method of small unmanned helicopter
CN110308655B (en) * 2019-07-02 2020-10-23 西安交通大学 Servo system compensation method based on A3C algorithm
CN110376879B (en) * 2019-08-16 2022-05-10 哈尔滨工业大学(深圳) PID type iterative learning control method based on neural network
CN112631120B (en) * 2019-10-09 2022-05-17 Oppo广东移动通信有限公司 PID control method, device and video coding and decoding system
CN111079936B (en) * 2019-11-06 2023-03-14 中国科学院自动化研究所 Wave fin propulsion underwater operation robot tracking control method based on reinforcement learning
CN111856920A (en) * 2020-07-24 2020-10-30 重庆红江机械有限责任公司 A3C-PID-based self-adaptive rail pressure adjusting method and storage medium
CN112162861B (en) * 2020-09-29 2024-04-19 广州虎牙科技有限公司 Thread allocation method, thread allocation device, computer equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102588129A (en) * 2012-02-07 2012-07-18 上海艾铭思汽车控制系统有限公司 Optimization cooperative control method for discharge of nitrogen oxides and particles of high-pressure common-rail diesel

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102588129A (en) * 2012-02-07 2012-07-18 上海艾铭思汽车控制系统有限公司 Optimization cooperative control method for discharge of nitrogen oxides and particles of high-pressure common-rail diesel

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
A Proposal of Adaptive PID Controller Based on Reinforcement Learning;WANG Xue-song等;《Journal of China University of Mining & Technology》;20070331;全文 *
基于AC-PID控制器的焊接机器人仿真;张超等;《焊接技术》;20130728(第07期);全文 *
基于执行器-评价器学习的自适应PID控制;陈学松等;《控制理论与应用》;20110815(第08期);全文 *
多目标执行依赖启发式动态规划励磁控制;林小峰等;《电力系统及其自动化学报》;20120615(第03期);全文 *
强化学习及其在机器人系统中的应用研究;陈学松;《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》;20111015;全文 *

Also Published As

Publication number Publication date
CN108008627A (en) 2018-05-08

Similar Documents

Publication Publication Date Title
CN108008627B (en) Parallel optimization reinforcement learning self-adaptive PID control method
CN111474965B (en) Fuzzy neural network-based method for predicting and controlling water level of series water delivery channel
Chen et al. Adaptive optimal tracking control of an underactuated surface vessel using actor–critic reinforcement learning
Yang et al. Control of nonaffine nonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks
Perrusquia et al. Discrete-time H 2 neural control using reinforcement learning
CN115167102A (en) Reinforced learning self-adaptive PID control method based on parallel dominant motion evaluation
Ferrari et al. Adaptive feedback control by constrained approximate dynamic programming
CN109062040B (en) PID (proportion integration differentiation) predicting method based on system nesting optimization
Nguyen et al. On-policy and off-policy Q-learning strategies for spacecraft systems: An approach for time-varying discrete-time without controllability assumption of augmented system
CN105676645A (en) Double-loop water tank liquid level prediction control method based on function type weight RBF-ARX model
Xia et al. Adaptive quantized output feedback DSC of uncertain systems with output constraints and unmodeled dynamics based on reduced-order K-filters
Han et al. Symmetric actor–critic deep reinforcement learning for cascade quadrotor flight control
Scheurenberg et al. Data Enhanced Model Predictive Control of a Coupled Tank System
Hager et al. Adaptive Neural network control of a helicopter system with optimal observer and actor-critic design
CN106033189A (en) Flight robot pose nerve network prediction controller
CN111240201B (en) Disturbance suppression control method
Kamalapurkar et al. State following (StaF) kernel functions for function approximation part II: Adaptive dynamic programming
CN116880191A (en) Intelligent control method of process industrial production system based on time sequence prediction
Kosmatopoulos Control of unknown nonlinear systems with efficient transient performance using concurrent exploitation and exploration
CN116594288A (en) Control method and system based on longhorn beetle whisker fuzzy PID
Park et al. Linear quadratic tracker with integrator using integral reinforcement learning
EP2778947B1 (en) Sequential deterministic optimization based control system and method
CN114193458A (en) Robot control method based on Gaussian process online learning
Abouheaf et al. Neurofuzzy reinforcement learning control schemes for optimized dynamical performance
Kang et al. Adaptive fuzzy finite‐time prescribed performance control for uncertain nonlinear systems with actuator saturation and unmodeled dynamics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant