CN108181816A - A kind of synchronization policy update method for optimally controlling based on online data - Google Patents

A kind of synchronization policy update method for optimally controlling based on online data Download PDF

Info

Publication number
CN108181816A
CN108181816A CN201810010374.XA CN201810010374A CN108181816A CN 108181816 A CN108181816 A CN 108181816A CN 201810010374 A CN201810010374 A CN 201810010374A CN 108181816 A CN108181816 A CN 108181816A
Authority
CN
China
Prior art keywords
control
optimally controlling
update method
online data
controlling based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810010374.XA
Other languages
Chinese (zh)
Inventor
魏阿龙
刘春生
孙景亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN201810010374.XA priority Critical patent/CN108181816A/en
Publication of CN108181816A publication Critical patent/CN108181816A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The present invention relates to a kind of synchronization policies based on online data to update method for optimally controlling, belongs to intelligent control and optimum control field.This method comprises the following steps:1st, it initializes system mode, determine three NN activation primitives, and arbitrary initial value is assigned to its weights;Set data acquisition length and stopping criterion for iteration;2nd, the arbitrary control input of selection acts on system with interference noise;3rd, sampling system current state and control, noise inputs are distinguished with fixed rate, and calculates algorithm correlation intermediate variable;4th, judge whether data are effective, are, carry out in next step;Otherwise step 1 is jumped to;5th, three NN weights are updated;6th, judge whether to meet stopping criterion for iteration, be to export result;Otherwise step 5 is jumped to.Method proposed by the present invention solves Dependence Problem of the traditional control method to plant model, alleviates the immense pressure of controller solution, while increases the robustness of system.

Description

A kind of synchronization policy update method for optimally controlling based on online data
Technical field
The present invention relates to a kind of synchronization policies based on online data to update method for optimally controlling, belong to intelligent control with most Excellent control field.
Background technology
Dynamic Programming (DP) is to solve a kind of systems approach of dynamic optimization and Optimal Control Problem.It depends on optimality Principle, and optimal control policy is found by a cost function.And this cost function to meet Hamilton-Jacobi- Graceful (HJB) equation of Bell (corresponding optimum control) or Hamilton-Jacobi-Walter Isaacs (HJI) equation (corresponding differential game). In various control applications, there are many interference phenomenons in system, negative consequence is played in control performance.HControl provides One powerful algorithmic tool reduces the influence that system is interfered.According to Differential Game Theory, H is foundController is suitable In solving zero-sum two-person game (ZSG), wherein controller attempts to minimize performance indicator under the disturbance of maximum possible.However, Due to HJB HJI equation unintentional nonlinearity properties, obtain its analytic solutions and be nearly impossible.
Recently, an algorithm newly proposed is referred to as adaptive/approximate Dynamic Programming (ADP), solves and asked for ZSG The various optimal control problems of topic.Its basic thought is come estimate cost function, on time using an approximation to function structure Between just always solve DP problems, so as to avoid " dimension calamity " problem, one is provided for NONLINEAR OPTIMAL CONTROL and differential game A convenient, effective solution.In practical engineering application, the accurate kinetic model of controlled device is typically unknown. Some researchers identify unknown dynamic using neural network (NN), then find optimal solution using ADP on identification network. However, the Identification Errors in network are unfavorable to the optimality of final controller.Training identification network, which also increases, to be calculated as This, increases learning time.Therefore, people more thirst for a method for optimally controlling for not depending on system model completely.
It is further noted that strategy of on-line iteration is also the popular approach of control design case, off-line strategy more cenotype is discussed below Compared with where the advantage of line interation.
Online method needs to act on system number using tactful μ in the value function for calculating target control strategy μ According to.However on-line learning algorithm uses approximate target control strategy in practical execution(rather than practical target Policy μ) generation data learn its value function, optimum control is iterated to calculate, this can seriously affect the learning direction of strategy.Cause For the estimation value function of these states not represented fully (that is, the state for being not under optimum control) may be Height inaccuracy, in other words, strategy of on-line learning method learns its value function using the data of " inaccuracy ", will increase Add accumulated error.This is referred to as " insufficient exploration ", be in online alternative manner one it is especially acute the problem of.
In industrial circle, scientific and technical innovation and progress show two it is prominent the characteristics of.One is got over production system scale Come bigger, operation becomes increasingly complex, and more and more real systems, which are faced with, is difficult to set up accurate industrial processes mould Type meets the difficulty of control design case demand.Another feature is that a large amount of data storage in industrial processes, but does not obtain It efficiently uses.It is therefore proposed that the nonlinear system offline iteration optimal control problem based on data have great importance and Challenge.
Invention content
Meet the difficulty of control design case demand and Industrial Engineering for accurate industrial processes model is difficult to set up The problems such as mass data of generation is not utilized effectively, the present invention propose a kind of synchronization policy update based on online data most Excellent control method.
The present invention is adopted the following technical scheme that solve its technical problem:
A kind of synchronization policy update method for optimally controlling based on online data, includes the following steps:
Step 1:Initialization system mode determines three NN activation primitives, and assigns arbitrary initial value to its weights;Set data Acquisition length and stopping criterion for iteration;
Step 2:The arbitrary control input of selection acts on system with interference noise;
Step 3:With fixed rate difference sampling system current state and control, noise inputs, and calculate algorithm phase Close intermediate variable;
Step 4:Judge whether data are effective, are, carry out in next step;Otherwise step 1 is jumped to;
Step 5:Update three NN weights;
Step 6:Judge whether to meet stopping criterion for iteration, be, export result;Otherwise step 5 is jumped to.
Beneficial effects of the present invention are as follows:
1st, method proposed by the present invention solves Dependence Problem of the traditional control method to plant model, alleviates control Device processed solves the immense pressure of (the mainly solutions of partial differential equation), while increases the robustness of system.2nd, the present invention carries The method gone out is easy to implement.It performs in two stages, the acquisition of online data and offline synchronized update.Use what is arbitrarily allowed Strategy acts on system so that system is safer.
3rd, off-line strategy proposed by the present invention more new algorithm can be based on generating by other non-targeted optimal policy behaviors Data realize, and be not necessarily target strategy, the ability of " exploration " increased in learning process.Simultaneously as generating Error will not be generated during data, so as to reduce the error of accumulation.
4th, it is globally optimal solution that the present invention, which finally acquires,.
Description of the drawings
Fig. 1, which is that the present invention is based on the synchronization policies of online data, to update method for optimally controlling flow chart.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, with reference to attached drawing, the present invention is done further It is described in detail.
The invention discloses a kind of synchronization policies based on online data to update method for optimally controlling, independent of controlled pair The model information of elephant, while increase the robustness of system.The present invention considers the inhibition to system interference noise simultaneously.
The system model that the present invention is studied can be expressed asWherein x ∈ RnIt is that n dimension states become Amount, u ∈ RmIt is that m dimension controls input, v ∈ RqExternal noise interference, f (x) ∈ R are tieed up for qnFor system dynamic matrix, g (x) ∈ Rn×m Input matrix in order to control, k (x) ∈ Rn×qFor noise inputs matrix, system initial state x=x0
HThe purpose of control is to find a controller so that performance index function Minimum defines r (x, u, v)=Q (x)+uTRu-γ2vTThe positive semidefinite matrix of v, and Q (x) >=0, R>0 is positive definite matrix, γ >=γ* >=0, scalar γ be noise suppression gain, γ*Represent minimum γ values existing for optimal solution.
The present invention is directed to solve optimal state feed-back control device.Given control u=u (x (t)) and noise disturbance v=v (x (t)), corresponding value function is
Based on Differential Game Theory, H is foundController is equivalent to solve two people ZSG, wherein control signal attempts to minimize Performance indicator, and noise jamming attempts to maximize performance indicator.Optimal state feed-back control u*With disturbance v*Corresponding optimal value letter Number isAnd receive assorted (Nash) conditionSaddle point solution.
Method for optimally controlling flow chart is updated Fig. 1 shows the present invention is based on the synchronization policies of online data.The offline plan Slightly synchronized update algorithm on-line acquisition system operation data first, then carries out offline iteration study.Details are as follows for specific steps:
Step 1:Initialization.The selected control strategy u ' arbitrarily allowed, disturbance v ' and its corresponding exploration noise eu,evIt stays System to be acted on.Data acquisition length L and sampling period Δ t is set, then it is T=L* Δs t to understand the online data sampling time. Given stopping criterion for iteration (error threshold of front and rear iteration twice) ε.Value function is enabled to be expressed asControl law isInterferenceWherein NN activation primitives It can voluntarily select, usually hyperbolic tangent function tanh () and polynomial function etc., N1、N2、N3Represent corresponding NN activation letter Several numbers, Respectively correspond to the weight matrix of value function, control strategy and perturbation strategy NN.Arbitrarily Given NN initial weightsM corresponds to control input dimension, and q corresponds to the input of noise jamming Dimension.
Step 2:The control u=u '+e that previous step is selectedu, interference v=v '+evAct on system.
Step 3:It acquires and calculates related intermediate variable.
Online real-time collecting { δ1,…δ6, it is as follows to embody form
Wherein:t0For the time that sampling timing starts, t1=t0+ Δ t, tL=t0+ L Δs t, tL-1=t0+ (L-1) Δ t,
When data collection time reaches T, stop sampling, and calculate two variables of following formula
WhereinRepresent Kronecker (Kronecker) Product Operator, vec () stretches operator, I for matrix columnK2For K2 ties up unit matrix, IK3Unit matrix, W are tieed up for K32,iThe weight matrix of control strategy NN, W during for ith iteration3,iFor ith The weight matrix of perturbation strategy NN during iteration, Wherein:For 3 weight matrixs are combined the weight vector newly formed afterwards.
Step 4:JudgeWith the presence or absence of pseudoinverse, i.e.,It is whether reversible.If in the presence of carrying out in next step.It is no Then, it resets and explores noise eu,ev, jump to step 2.
Step 5:Use iterative formulaSynchronized update value function, control strategy NN and perturbation strategy NN weights;
Step 6:By formulaJudge whether iteration restrains.If convergence is exported as a result, obtaining optimal controller (including maximum interference) isWherein W2,*、W3,*Represent the final value of iteration convergence.If no Convergence then jumps to step 5 and continues to update.
It is worth noting that value function NN weights W1,iInstrumentality is only played in an iterative process.Because it is updated in iteration It is not used in the process, is intended only as least square solution and is presented.
Method steps mentioned above has carried out the purpose of the present invention, technical solution and advantageous effect further in detail Illustrate, every modification within the spirit and principles in the present invention, made, equivalent replacement, improvement etc. should be included in the present invention Protection domain within.

Claims (5)

1. a kind of synchronization policy update method for optimally controlling based on online data, which is characterized in that include the following steps:
Step 1:Initialization system mode determines three NN activation primitives, and assigns arbitrary initial value to its weights;Set data acquisition Length and stopping criterion for iteration;
Step 2:The arbitrary control input of selection acts on system with interference noise;
Step 3:With fixed rate difference sampling system current state and control, noise inputs, and calculate in algorithm correlation Between variable;
Step 4:Judge whether data are effective, are, carry out in next step;Otherwise step 1 is jumped to;
Step 5:Update three NN weights;
Step 6:Judge whether to meet stopping criterion for iteration, be, export result;Otherwise step 5 is jumped to.
2. a kind of synchronization policy update method for optimally controlling based on online data according to claim 1, feature exist In three NN activation primitives described in step 1 areN1、N2、N3Represent corresponding The number of NN activation primitives, R>0 is positive definite matrix.
3. a kind of synchronization policy update method for optimally controlling based on online data according to claim 2, feature exist In the activation primitive is hyperbolic tangent function tanh ().
4. a kind of synchronization policy update method for optimally controlling based on online data according to claim 2, feature exist In the activation primitive is polynomial function.
5. a kind of synchronization policy update method for optimally controlling based on online data according to claim 1, feature exist In data acquisition length described in step 1 are L.
CN201810010374.XA 2018-01-05 2018-01-05 A kind of synchronization policy update method for optimally controlling based on online data Pending CN108181816A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810010374.XA CN108181816A (en) 2018-01-05 2018-01-05 A kind of synchronization policy update method for optimally controlling based on online data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810010374.XA CN108181816A (en) 2018-01-05 2018-01-05 A kind of synchronization policy update method for optimally controlling based on online data

Publications (1)

Publication Number Publication Date
CN108181816A true CN108181816A (en) 2018-06-19

Family

ID=62549887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810010374.XA Pending CN108181816A (en) 2018-01-05 2018-01-05 A kind of synchronization policy update method for optimally controlling based on online data

Country Status (1)

Country Link
CN (1) CN108181816A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109375514A (en) * 2018-11-30 2019-02-22 沈阳航空航天大学 A kind of optimal track control device design method when the injection attacks there are false data
CN111273543A (en) * 2020-02-15 2020-06-12 西北工业大学 PID optimization control method based on strategy iteration
CN112947078A (en) * 2021-02-03 2021-06-11 浙江工业大学 Servo motor intelligent optimization control method based on value iteration

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6532454B1 (en) * 1998-09-24 2003-03-11 Paul J. Werbos Stable adaptive control using critic designs
CN103324085A (en) * 2013-06-09 2013-09-25 中国科学院自动化研究所 Optimal control method based on supervised reinforcement learning
CN106354010A (en) * 2016-09-29 2017-01-25 中国科学院自动化研究所 Adaptive optimal control method and adaptive optimal control system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6532454B1 (en) * 1998-09-24 2003-03-11 Paul J. Werbos Stable adaptive control using critic designs
CN103324085A (en) * 2013-06-09 2013-09-25 中国科学院自动化研究所 Optimal control method based on supervised reinforcement learning
CN106354010A (en) * 2016-09-29 2017-01-25 中国科学院自动化研究所 Adaptive optimal control method and adaptive optimal control system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YUANHENG ZHU.ET: "Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data", 《IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS》 *
刘念等: "自适应动态规划算法在飞行器追逃中的应用", 《飞行力学》 *
孙景亮等: "基于自适应动态规划的导弹制导律研究综述", 《自动化学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109375514A (en) * 2018-11-30 2019-02-22 沈阳航空航天大学 A kind of optimal track control device design method when the injection attacks there are false data
CN111273543A (en) * 2020-02-15 2020-06-12 西北工业大学 PID optimization control method based on strategy iteration
CN112947078A (en) * 2021-02-03 2021-06-11 浙江工业大学 Servo motor intelligent optimization control method based on value iteration

Similar Documents

Publication Publication Date Title
CN107168324B (en) Robot path planning method based on ANFIS fuzzy neural network
CN108181816A (en) A kind of synchronization policy update method for optimally controlling based on online data
CN104392143B (en) Method for predicting fault trends of steam turbines by aid of adaptive quantum neural networks
CN110450156B (en) Optimal design method of self-adaptive fuzzy controller of multi-degree-of-freedom mechanical arm system
CN112904728A (en) Mechanical arm sliding mode control trajectory tracking method based on improved approach law
CN109901403A (en) A kind of face autonomous underwater robot neural network S control method
CN110687800B (en) Data-driven self-adaptive anti-interference controller and estimation method thereof
Liu et al. A fault diagnosis intelligent algorithm based on improved BP neural network
CN103870892B (en) Method and system for achieving railway locomotive operation control from off-line mode to on-line mode
CN103399488B (en) Multiple Model Control Method based on self study
CN110083167A (en) A kind of path following method and device of mobile robot
CN107193210B (en) Adaptive learning preset performance control method of nonlinear system
CN101390024A (en) Operation control method, operation control device and operation control system
CN111176122A (en) Underwater robot parameter self-adaptive backstepping control method based on double BP neural network Q learning technology
CN111880546A (en) Virtual guide ship automatic berthing control method based on self-adaptive dynamic programming algorithm
CN109176519A (en) A method of improving the Robot Visual Servoing control response time
CN101276207A (en) Multivariable non-linear system prediction function control method based on Hammerstein model
CN107065541A (en) A kind of system ambiguous network optimization PID PFC control methods of coking furnace furnace pressure
CN104331080B (en) For the fixed point track path planing method of mobile robot
CN110262222A (en) A kind of Interval System optimum interval PID controller parameter optimization method
Kusumoputro et al. Performance characteristics of an improved single neuron PID controller using additional error of an inversed control signal
CN115344047A (en) Robot switching type predictive control trajectory tracking method based on neural network model
Guolian et al. Multiple-model predictive control based on fuzzy adaptive weights and its application to main-steam temperature in power plant
Feng et al. On hydrologic calculation using artificial neural networks
Fang et al. Design and simulation of neuron PID controller

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180619