CN113325804B - Q learning extended state observer design method of motion control system - Google Patents

Q learning extended state observer design method of motion control system Download PDF

Info

Publication number
CN113325804B
CN113325804B CN202110637860.6A CN202110637860A CN113325804B CN 113325804 B CN113325804 B CN 113325804B CN 202110637860 A CN202110637860 A CN 202110637860A CN 113325804 B CN113325804 B CN 113325804B
Authority
CN
China
Prior art keywords
learning
state
time
action
extended state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110637860.6A
Other languages
Chinese (zh)
Other versions
CN113325804A (en
Inventor
薛文超
汤国杰
方海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Academy of Mathematics and Systems Science of CAS
Original Assignee
Academy of Mathematics and Systems Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Academy of Mathematics and Systems Science of CAS filed Critical Academy of Mathematics and Systems Science of CAS
Priority to CN202110637860.6A priority Critical patent/CN113325804B/en
Publication of CN113325804A publication Critical patent/CN113325804A/en
Application granted granted Critical
Publication of CN113325804B publication Critical patent/CN113325804B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/18Numerical control [NC], i.e. automatically operating machines, in particular machine tools, e.g. in a manufacturing environment, so as to execute positioning, movement or co-ordinated operations by means of programme data in numerical form
    • G05B19/414Structure of the control system, e.g. common controller or multiprocessor systems, interface to servo, programmable interface controller
    • G05B19/4142Structure of the control system, e.g. common controller or multiprocessor systems, interface to servo, programmable interface controller characterised by the use of a microprocessor
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/34Director, elements to supervisory
    • G05B2219/34013Servocontroller

Abstract

The invention provides a Q learning extended state observer design method of a motion control system, which comprises the following steps: 1. designing an extended state observer according to a discrete form of a mathematical model of a motion control system: 2. and designing a Q learning algorithm, namely 3, adjusting the parameters of the extended state observer through Q learning. The system can be a hybrid system which is common in practice and has continuous object and sampling output, and a corresponding discrete ESO structure and parameter design method is directly provided; noise and disturbance models are not needed to optimize the parameters of the extended state observer, the parameters are adjusted in real time in a data-driven mode, and compared with the traditional constant ESO, the method has the capability of tracking internal uncertain dynamics and external disturbance more accurately; the adjustment ranges of the four parameters of the Q learning part are quantitatively and explicitly given. The parameter adjustment theory can ensure the stability of the extended state observer and can reduce the cost spent on adjusting parameters in actual engineering.

Description

Q learning extended state observer design method of motion control system
Technical Field
The invention belongs to the technical field of a design method of an extended observer of a motion control system, and particularly relates to an Extended State Observer (ESO) and Q learning parameter adjusting technology of the motion control system.
Background
In the past decades, estimating the state of the motion control system by designing an observer and implementing state feedback in the control law has proved to be an effective control method. However, most observers still have their limitations, such as relying on system models, which can only estimate states, single external disturbances, and cannot handle complex uncertainties including internal unknown dynamics and external disturbances, as described in reference [1 ]. Based on the above problems, korean scholars in china propose an extended state observer which estimates an internal uncertain dynamic state and an external disturbance as an extended state "total disturbance" without depending on a model, as described in reference [2 ]. For the parameter adjustment problem of the ESO, a reference [3] provides a 'bandwidth method' for a linear extended state observer and is widely applied, reference [4] to [5] respectively analyzes the linear ESO and the nonlinear ESO of a constant parameter, and provides a parameter adjustment method capable of ensuring the stability of the ESO, and reference [6] provides a Kalman gain optimization adjustment method for the ESO by using the statistical characteristic and the uncertainty change range information of noise.
Currently, the main ESO parameter adjustments are mainly constant gain and optimization methods that rely on model information. The characteristics of the actual motion control system and the external environment change with time, so that the noise characteristics and uncertain dynamic characteristics of the actual motion control system change, and the assumption that model information is known is difficult. Therefore, the design of the extended state observer of the motion control system needs to be independent of model information, a gain adjustment method for learning and optimizing by using data on line is realized, and the rapid and accurate estimation of the state and the uncertainty is realized.
Disclosure of Invention
The technical problem solved by the invention is as follows: aiming at a motion control system, the extended state observer based on Q learning algorithm parameter adjustment is designed to realize effective real-time estimation on the system state and total disturbance, and real-time measurement data of the system is utilized to drive real-time optimization of the gain of the extended state observer, so that the estimation capability and the steady-state performance of the extended state observer are enhanced.
Consider the following mathematical model of a motion control system:
Figure BDA0003105908220000021
wherein t represents time, x1(t) E R represents the position of the moving object at time t, x2(t) E R represents the speed of the moving object at the time t, b represents input gain, u (t) E R represents the input of the system at the time t, d (x) (t), t) E R represents total disturbance composed of uncertain dynamics and external disturbance in the system at the time t, and c (t) E R represents the derivative of the total disturbance at the time t; x is the number of1(kh)、x2(kh) denotes the position and velocity of the moving object at time t-kh, where h is the sampling period, k denotes the kth sample, y denotes the k-th samplei(kh) denotes xi(kh) measured value, vi(kh) isThe measurement noise (i ═ 1,2) for the corresponding channel.
Considering that the state of the motion control system can be measured, the design goal is to enable the extended state observer to adjust the parameters of the extended state observer according to real-time data, so that the total disturbance d (x (t), t) can be tracked quickly and accurately, and the sensitivity to noise is reduced.
The technical solution of the invention comprises the following three steps:
step (I): the extended state observer is designed according to the discrete form of (1.1):
since the sampling and control inputs are discrete in a real system, a discrete approximation form of (1.1) needs to be considered:
Figure BDA0003105908220000031
x1((k+1)h)、x2((k +1) h) denotes the position and speed of the moving object at time t ═ k +1 h, where h is the sampling period, k +1 denotes the (k +1) th sample, yi((k +1) h) represents xiMeasured value of ((k +1) h), vi((k +1) h) is the measurement noise of the corresponding channel (i ═ 1, 2). b represents the input gain, u (kh) e R represents the input of the system at the time t-kh, d (x (kh), kh) e R represents the total disturbance consisting of the uncertain dynamics inside the system and the external disturbance at the time t-kh, and c (kh) e R represents the derivative of the total disturbance at the time t-kh.
According to (1.2), the linear extended state observer is designed as follows:
Figure BDA0003105908220000032
wherein, beta1(kh),β2(kh),β3(kh) to observe the gain, the design method is
β1(kh)=3ω(kh),β2(kh)=3ω2(kh),β3(kh)=ω3(kh),ω(kh)>0. (1.4)
Figure BDA0003105908220000033
And
Figure BDA0003105908220000034
respectively represent x2The estimated values of (kh), d, (kh) and c (kh), ω (kh) is referred to as the "observer bandwidth" at time t — kh, which is the parameter to be adjusted.
Step (II): designing a Q learning algorithm:
the Q learning algorithm comprises four main components of state, action, reward and state-action value function: based on the current state of the system, selecting corresponding action according to the value of the state-action value function, obtaining corresponding reward, and updating the value of the state-action value function according to the reward.
Designing a state space S and a dynamic space Lambda according to actual conditions, wherein Lambda is a bounded real number set, and the maximum value and the minimum value are respectively recorded as
Figure BDA0003105908220000041
a; for all states S e S and actions a e Λ, the corresponding state-action value function Q (S, a) is initialized to 0, and the discount factor γ e (0,1) and the sequence of learning rates satisfying the following condition are selected
Figure BDA0003105908220000042
Figure BDA0003105908220000043
For the state-action value function Q, the following update criterion is employed:
Figure BDA0003105908220000044
wherein the subscript n denotes the nth time at state sjSelects action aj(ii) a s' represents the next state to transition to after selecting action a; the lower subscript j indicates the jth Q learning. During the operation of the system, sampling is carried out one every q timesThe bandwidth from t jqh to t (j +1) qh is a constant, and is recorded as Q learning
ωj=ω(t),t∈[jqh,(j+1)qh),j=1,2,... (1.7)
And calculating the state of the system by the following formula when the j-th Q learning is carried out, selecting an action and calculating the reward:
the state is as follows: the j-th Q learning state is defined as sj=[sj,1,sj,2]Wherein s isj,1And sj,2Is defined as:
Figure BDA0003105908220000045
and (4) action: action a at jth Q learningjThe selection rules are as follows:
Figure BDA0003105908220000051
rewarding: the reward calculation formula obtained by the j-th Q learning is as follows:
Figure BDA0003105908220000052
wherein, the lambda epsilon (0,1) is the reward parameter,
Figure BDA0003105908220000053
and
Figure BDA0003105908220000054
are respectively sj+1,2And rj,2Normalized value, sj+1,2R is as defined in (1.8)2,jTo represent
Figure BDA0003105908220000055
Of (2) i.e. the variance of
Figure BDA0003105908220000056
Step (three): adjusting parameters of an extended state observer through Q-learning
When the system runs to time t jqh, j 1,2jAnd selecting action a according to (1.9)jThen, the observer bandwidth is adjusted by the following rule:
Figure BDA0003105908220000057
Figure BDA0003105908220000058
andωthe bandwidth upper and lower limits are set in advance.
After adjusting the bandwidth, the reward function r is calculated according to (1.10)jAnd the next state sj+1And updating the Q value function according to the formula (1.6).
In order to ensure the stability of the extended state observer,
Figure BDA0003105908220000059
ω
Figure BDA00031059082200000510
andathe following conditions are satisfied:
Figure BDA00031059082200000511
wherein epsilon, M and M are intermediate variables, and the calculation method comprises the following steps:
Figure BDA0003105908220000061
compared with the prior art, the invention has the advantages that:
1. the system is a hybrid system which is characterized in that a common actual object is continuous and the output is sampling, and a corresponding discrete ESO structure and parameter design method is directly provided;
2. noise and disturbance models are not needed to optimize the parameters of the extended state observer, the parameters are adjusted in real time in a data-driven mode, and compared with the traditional constant ESO, the method has the capability of tracking internal uncertain dynamics and external disturbance more accurately;
3. the adjustment ranges of the four parameters of the Q learning part are quantitatively and explicitly given. The parameter adjustment theory can ensure the stability of the extended state observer and can reduce the cost spent on adjusting parameters in actual engineering.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is an estimation error curve (first total disturbance) of a different bandwidth extended state observer for a disturbance.
FIG. 3 is a plot of estimated error for a different bandwidth extended state observer versus disturbance (second total disturbance).
Fig. 4 is an estimation error curve (third total disturbance) of a different bandwidth extended state observer to disturbance.
Description of the symbols
t: the running time of the motion control system, t ∈ [0, ∞);
h: the sampling period of the motion control system is h epsilon to R;
x1(t): position of moving object at time t, x1(t)∈R;
x2(t): speed, x, of moving object at time t2(t)∈R;
u (t): the control input of the motion control system at the time t, u (t) epsilon R;
yi(kh): output of motion control system at time t-kh, yi(kh)∈R,
i=1,2,k=1,2,...;
vi(kh): the measurement noise of the motion control system at the time t & kh, v (kh) epsilon Rn,
i=1,2,k=1,2,...;
d (x (t), t): the sum of uncertain dynamics and external disturbance in the motion control system at the moment t;
c (t): derivatives of d (x (t), t);
Figure BDA0003105908220000071
x2an estimated value of (d);
Figure BDA0003105908220000072
an estimate of d;
Figure BDA0003105908220000073
c, an estimated value;
ω: expanding the bandwidth and the parameters to be adjusted of the state observer;
Figure BDA0003105908220000074
ω: upper and lower limits of bandwidth;
sj: the j-th Q learning state;
aj: the action selected when Q learning is performed for the jth time;
rj: reward obtained after the j time of Q learning selection action;
Figure BDA0003105908220000075
a: action ajUpper and lower limits of (3).
Detailed Description
To test the applicability of the Q-learning extended state observer for motion control systems, we performed simulation experiments. Consider the following motion control system:
Figure BDA0003105908220000076
and three categories "total perturbations":
Figure BDA0003105908220000081
wherein the first type is constant disturbance, independent of both time and system state; the second type is piecewise linear disturbance, which depends on time only; the third category is the composite of nonlinear dynamics and periodic external disturbances.
Designing an extended state observer according to the step (I):
Figure BDA0003105908220000082
wherein
β1=3ω,β2=3ω23=ω3,ω>0,
Figure BDA0003105908220000083
According to the formula (1.13) in the step (III), the parameters of the Q learning extended state observer are designed to be
Figure BDA0003105908220000084
ω=0.5,
Figure BDA0003105908220000085
a=-0.5,h=0.001,q=100. (1.19)
The action set in the simulation is selected as { -0.5,0,0.5}, and other parameters are set as
h=0.001,q=100,λ=0.4,
Figure BDA0003105908220000086
γ=0.9. (1.20)
Calculating the current state s according to the formula in the step (II) every 100 times of samplingjAnd selecting action a according to the formula (1.10)jThe observer bandwidth is then adjusted according to (1.12). The observer bandwidth remains unchanged for other sampling instants. While adjusting the bandwidthAnd (3) calculating the reward of the last action from (1.10) and updating the Q value according to (1.6) in the step (three), wherein the normalization method adopts a method of dividing the current value by the magnitude order when calculating the reward function.
Fig. 2 to 4 are simulation results of the Q-learning extended state observer under three types of "total disturbances", where the initial bandwidth of the Q-learning extended state observer is set to 5, and compared with the extended state observer with a constant bandwidth.
In the first case, d (x (t), t) is a constant perturbation of 0.15. When the bandwidth is 10, the observer is seriously influenced by noise, so that the estimation error always oscillates between [ -0.5 and 0.5 ]; when the bandwidth is taken as 2, the observer reaches a steady state after about 10s, and the estimation error at the steady state is between [ -0.01,0.01 ]; when the bandwidth is adjusted through Q learning, the observer reaches a steady state in 1s, and an estimation error is between [ -0.04,0.04] in the steady state, the QESO is poor in adjustment effect due to insufficient data in an initial stage, but good performance is finally obtained through adjustment of Q learning in a subsequent stage.
In the second case, d (x (t), t) is a piecewise linear perturbation. When the bandwidth is 10, when the disturbance change is fast, the observer can track the disturbance fast, but when the disturbance change is slow, the influence of noise is obvious, and the estimation error is always between [ -0.5,0.5 ]; when the bandwidth is taken as 2, the observer can obtain a good estimation effect on a part with slow disturbance change, but has a large tracking error due to insufficient tracking capability of a part with fast change, and the overall tracking error is [ -0.5,0.5 ]. When the bandwidth is adjusted through Q learning, the observer can obtain better balance between fast tracking disturbance and noise influence suppression, poor adjustment effect caused by insufficient data in the initial stage is eliminated, the estimation error is always maintained between-0.2 and 0.2 after the steady state is started at 10s, and the estimation effect is best.
In the third case, the uncertain dynamics in the system are a composite of the nonlinear system dynamics and the periodic external disturbances. When the bandwidth is 10, the influence of noise is obvious, and the estimation error is always between [ -0.5,0.5 ]; when the bandwidth is taken as 2, the tracking error is large due to rapid disturbance change, and the overall tracking error is between [ -0.5,0.5 ]; when the bandwidth is adjusted through Q learning, poor adjustment effect caused by insufficient data in the initial stage is eliminated, and the estimation error is kept between [ -0.2 and 0.2] all the time after the bandwidth enters the steady state at 1.5s, so that the estimation effect is best.
Reference to the literature
[1]Chi-TsongChen.Linear system theory and design[M].Holt,Rinchart and Winston,1984.
[2] Han Jingqing, a type of extended state observer for uncertain objects [ J ] control and decision, 1995,000(001):85-88.
[3]Gao Z.Scaling and bandwidth-parameterization based controller tuning[C]//IEEE.IEEE,2003.
[4]Xue W,Yi H.Performance analysis of active disturbance rejection tracking control for a class of uncertain LTI systems[J].Isa Transactions,2015,58:133-154.
[5]Guo B Z,Zhao Z L.On the convergence of an extended state observer for nonlinear systems with uncertainty[J].Systems&Control Letters,2011,60(6):420-430.
[6]Bai W,Xue W,Huang Y,et al.On extended state based Kalman filter design for a class of nonlinear time-varying uncertain systems[J].Science China Information Sciences,2018,61(04):1-16.

Claims (2)

1. A Q learning extended state observer design method of a motion control system is based on the following motion control system mathematical model:
Figure FDA0003499795570000011
wherein t represents time, x1(t) E R represents the position of the moving object at time t, x2(t) E R represents the speed of the moving object at the time t, b represents input gain, u (t) E R represents the input of the system at the time t, d (x) (t), t) E R represents total disturbance composed of uncertain dynamics and external disturbance in the system at the time t, and c (t) E R represents the derivative of the total disturbance at the time t; x is the number of1(kh)、x2(kh) denotes the position and velocity of the moving object at time t-kh, where h is the sampling period, k denotes the kth sample, y denotes the k-th samplei(kh) denotes xi(kh) measured value, vi(kh) is the measurement noise of the corresponding channel, i is 1, 2;
the method is characterized by comprising the following three steps:
the method comprises the following steps: the extended state observer is designed according to the discrete form of equation (1.1):
since the sampling and control inputs are discrete in a real system, a discrete approximation of equation (1.1) needs to be considered:
Figure FDA0003499795570000012
x1((k+1)h)、x2((k +1) h) denotes the position and speed of the moving object at time t ═ k +1 h, where h is the sampling period, k +1 denotes the (k +1) th sample, yi((k +1) h) represents xiMeasured value of ((k +1) h), vi(k +1) h) is the measurement noise of the corresponding channel; b represents an input gain, u (kh) e R represents the input of the system at the moment of t-kh, d (x (kh), kh) e R represents the total disturbance consisting of uncertain dynamic states and external disturbance inside the system at the moment of t-kh, and c (kh) e R represents the derivative of the total disturbance at the moment of t-kh;
according to equation (1.2), the linear extended state observer is designed as follows:
Figure FDA0003499795570000021
wherein, beta1(kh),β2(kh),β3(kh) to observe the gain, the design method is
β1(kh)=3ω(kh),β2(kh)=3ω2(kh),β3(kh)=ω3(kh),ω(kh)>0. (1.4)
Figure FDA0003499795570000022
And
Figure FDA0003499795570000023
respectively represent x2(kh), estimated values of d (kh) and c (kh), where ω (kh) is referred to as the observer bandwidth at the time t — kh, which is a parameter to be adjusted;
step two: designing a Q learning algorithm:
the Q learning algorithm comprises four components of state, action, reward and state-action value function:
based on the current state of the system, selecting corresponding action according to the value of the state-action value function, obtaining corresponding reward, and updating the value of the state-action value function according to the reward;
designing a state space S and a dynamic space Lambda according to actual conditions, wherein Lambda is a bounded real number set, and the maximum value and the minimum value are respectively recorded as
Figure FDA0003499795570000024
a(ii) a For all states S e S and actions a e Λ, the corresponding state-action value function Q (S, a) is initialized to 0, and the discount factor γ e (0,1) and the sequence of learning rates satisfying the following condition are selected
Figure FDA0003499795570000025
Figure FDA0003499795570000026
For the state-action value function Q, the following update criterion is employed:
Figure FDA0003499795570000031
wherein the subscript n denotes the nth time at state sjSelects action aj(ii) a s' represents the next state to transition to after selecting action a; the lower subscript j indicates that Q learning is performed for the jth time; in the system runningIn the process, Q learning is performed every Q samples, and the bandwidth from time t jqh to time t (j +1) qh is a constant and is recorded as
ωj=ω(t),t∈[jqh,(j+1)qh),j=1,2,... (1.7)
And calculating the state of the system by the following formula when the j-th Q learning is carried out, selecting an action and calculating the reward:
the state is as follows: the j-th Q learning state is defined as sj=[sj,1,sj,2]Wherein s isj,1And sj,2Is defined as:
Figure FDA0003499795570000032
and (4) action: action a at jth Q learningjThe selection rules are as follows:
Figure FDA0003499795570000033
rewarding: the reward calculation formula obtained by the j-th Q learning is as follows:
Figure FDA0003499795570000034
wherein, the lambda epsilon (0,1) is the reward parameter,
Figure FDA0003499795570000035
and
Figure FDA0003499795570000036
are respectively sj+1,2And rj,2Normalized value, sj+1,2R is as defined for formula (1.8)j,2To represent
Figure FDA0003499795570000037
Of (2) i.e. the variance of
Figure FDA0003499795570000041
Step (three): adjusting parameters of an extended state observer through Q-learning
When the system is running to time t jqh, j 1,2jAnd selecting action a according to the formula (1.9)jThen, the observer bandwidth is adjusted by the following rule:
Figure FDA0003499795570000042
Figure FDA0003499795570000043
andωthe bandwidth is an upper and lower limit set in advance;
after adjusting the bandwidth, the reward function r is calculated according to the formula (1.10)jAnd the next state sj+1And updating the Q value function according to the formula (1.6).
2. The method of claim 1, wherein the method comprises the following steps: in order to ensure the stability of the extended state observer,
Figure FDA0003499795570000044
ω
Figure FDA0003499795570000048
andathe following conditions are satisfied:
Figure FDA0003499795570000046
wherein epsilon, M and M are intermediate variables, and the calculation method comprises the following steps:
Figure FDA0003499795570000047
CN202110637860.6A 2021-06-08 2021-06-08 Q learning extended state observer design method of motion control system Active CN113325804B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110637860.6A CN113325804B (en) 2021-06-08 2021-06-08 Q learning extended state observer design method of motion control system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110637860.6A CN113325804B (en) 2021-06-08 2021-06-08 Q learning extended state observer design method of motion control system

Publications (2)

Publication Number Publication Date
CN113325804A CN113325804A (en) 2021-08-31
CN113325804B true CN113325804B (en) 2022-03-29

Family

ID=77420322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110637860.6A Active CN113325804B (en) 2021-06-08 2021-06-08 Q learning extended state observer design method of motion control system

Country Status (1)

Country Link
CN (1) CN113325804B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109932905A (en) * 2019-03-08 2019-06-25 辽宁石油化工大学 A kind of optimal control method of the Observer State Feedback based on non-strategy
CN111278704A (en) * 2018-03-20 2020-06-12 御眼视觉技术有限公司 System and method for navigating a vehicle
CN112000009A (en) * 2020-07-27 2020-11-27 南京理工大学 Material transfer device reinforcement learning control method based on state and disturbance estimation
CN112508172A (en) * 2020-11-23 2021-03-16 北京邮电大学 Space flight measurement and control adaptive modulation method based on Q learning and SRNN model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6538766B2 (en) * 2017-07-18 2019-07-03 ファナック株式会社 Machine learning apparatus, servo motor control apparatus, servo motor control system, and machine learning method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111278704A (en) * 2018-03-20 2020-06-12 御眼视觉技术有限公司 System and method for navigating a vehicle
CN109932905A (en) * 2019-03-08 2019-06-25 辽宁石油化工大学 A kind of optimal control method of the Observer State Feedback based on non-strategy
CN112000009A (en) * 2020-07-27 2020-11-27 南京理工大学 Material transfer device reinforcement learning control method based on state and disturbance estimation
CN112508172A (en) * 2020-11-23 2021-03-16 北京邮电大学 Space flight measurement and control adaptive modulation method based on Q learning and SRNN model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
参数未知的离散系统Q-学习优化状态估计与控制;李金娜 等;《控制与决策》;20201231;第35卷(第12期);第2889-2897页 *

Also Published As

Publication number Publication date
CN113325804A (en) 2021-08-31

Similar Documents

Publication Publication Date Title
CN108900346B (en) Wireless network flow prediction method based on LSTM network
CN111413872B (en) Air cavity pressure rapid active disturbance rejection method based on extended state observer
CN111812975A (en) Generalized predictive control method for pumped storage unit speed regulation system based on fuzzy model identification
CN114742278A (en) Building energy consumption prediction method and system based on improved LSTM
CN113325804B (en) Q learning extended state observer design method of motion control system
CN115759322A (en) Urban rail transit passenger flow prediction and influence analysis method
CN111240201B (en) Disturbance suppression control method
CN109946979A (en) A kind of self-adapting regulation method of servo-system sensitivity function
CN114355848A (en) Tension detection and intelligent control system
Mao et al. Auxiliary model-based iterative estimation algorithms for nonlinear systems using the covariance matrix adaptation strategy
CN105677936B (en) The adaptive recurrence multistep forecasting method of electromechanical combined transmission system demand torque
CN116300440A (en) DC-DC converter control method based on TD3 reinforcement learning algorithm
Xu et al. Data-driven plant-model mismatch quantification for MIMO MPC systems with feedforward control path
CN113988415A (en) Medium-and-long-term power load prediction method
CN112733372B (en) Fuzzy logic strong tracking method for load modeling
Song et al. Real-time adjustment way of reservoir schedule forecasting projects based on improved variable oblivion factor least square arithmetic coupling Kalman filters
Yao et al. Approaches to model and control nonlinear systems by RBF neural networks
CN114567288B (en) Distribution collaborative nonlinear system state estimation method based on variable decibels
CN116316590A (en) Confidence domain policy optimized power system stabilizer active disturbance rejection control method
CN114666230B (en) Correction factor-based network flow gray prediction method
Sun et al. Model Free Adaptive Control Algorithm based on GRU network
Zheng et al. Green Simulation Based Policy Optimization with Partial Historical Trajectory Reuse
CN114822025B (en) Traffic flow combined prediction method
CN112859588B (en) Control device and method for rapidly reducing lead bismuth fast reactor waste heat discharge temperature
Carloni A norm-optimal Kalman iterative learning control for precise UAV trajectory tracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant