CN110083063B - Multi-body optimization control method based on non-strategy Q learning - Google Patents
Multi-body optimization control method based on non-strategy Q learning Download PDFInfo
- Publication number
- CN110083063B CN110083063B CN201910352788.5A CN201910352788A CN110083063B CN 110083063 B CN110083063 B CN 110083063B CN 201910352788 A CN201910352788 A CN 201910352788A CN 110083063 B CN110083063 B CN 110083063B
- Authority
- CN
- China
- Prior art keywords
- learning
- strategy
- game
- algorithm
- zero
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000005457 optimization Methods 0.000 title claims abstract description 17
- 230000006870 function Effects 0.000 claims abstract description 22
- 238000004088 simulation Methods 0.000 claims abstract description 10
- 238000012795 verification Methods 0.000 claims abstract description 3
- 239000011159 matrix material Substances 0.000 claims description 10
- 239000000126 substance Substances 0.000 claims description 8
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 238000013461 design Methods 0.000 claims description 5
- 238000013480 data collection Methods 0.000 claims 1
- 238000011217 control strategy Methods 0.000 abstract description 9
- 230000002787 reinforcement Effects 0.000 description 10
- 238000001514 detection method Methods 0.000 description 8
- 239000003795 chemical substances by application Substances 0.000 description 5
- 238000011160 research Methods 0.000 description 3
- 230000005284 excitation Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 208000001613 Gambling Diseases 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 239000003245 coal Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002309 gasification Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 208000011580 syndromic disease Diseases 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Feedback Control In General (AREA)
Abstract
Based on non-strategyQThe invention discloses a learning multi-body optimization control method, relates to an optimization control method, and provides a non-strategy aiming at the problems of discrete linear non-zero sum gameQAnd (5) learning an algorithm. First, non-zero and game optimization problems are presented and the value functions defined from individual performance indicators are strictly proven to be linear quadratic. Then, based on the dynamic planning sumQLearning method, giving out non-strategyQAnd (4) learning algorithm to obtain the approximate optimal solution of non-zero sum game and realize the global Nash balance of the system. Finally, the validity of the simulation verification method is calculated. The invention is used for solving the problems of multiple non-zero and game of the linear discrete system, and the effectiveness of the algorithm is verified by simulation; the invention relates to game theory and non-strategyThe learning method is integrated, and a non-strategy is provided under the framework of non-zero game and gameLearning algorithm, learning optimal control strategy, realizing global Nash of the whole systemAnd (4) equalizing.
Description
Technical Field
The invention relates to an optimization control method, in particular to a multi-body optimization control method based on non-strategy Q learning.
Background
Adaptive Dynamic Programming (ADP) is a method for solving an approximate optimal solution, and is widely applied to the current optimal control. The method utilizes the solution of Hamilton-Jacobi-Bellman equation to solve the approximate optimal solution of the system through iteration. A large number of literature reports exist for researching the optimization control problem of the model-free system by adopting the self-adaptive dynamic programming. Researching and calculating the self-adaptive optimal control of a linear continuous time system of completely unknown system dynamics; researching the H-infinity control problem of the data-driven nonlinear distributed parameter system; researching a controlled adaptive dynamic programming algorithm and the stability thereof; researching data-driven strategy gradient self-adaptive dynamic programming optimal control; and researching the self-adaptive dynamic planning of the optimal tracking control of the unknown nonlinear system for coal gasification. Learning an optimal control strategy by adopting a reinforcement learning method is widely applied, the system performance is optimized, the control input constraint condition is met, certain indexes of the system performance are realized, and the feedback control under reinforcement learning and approximate dynamic planning is researched; researching feedback control based on reinforcement learning, and setting an adaptive optimal controller by using a natural decision method; researching linear quadratic tracking control of a continuous system partially unknown based on reinforcement learning; and researching the optimal tracking control of the unknown constrained input system based on the nonlinear part of the integral reinforcement learning.
An off-policy learning method is a learning method that can learn an optimal control strategy without depending on a model, using only collected system data in iterative update. It has three significant advantages compared to the strategy (on-policy) learning method: 1) the defect of insufficient exploration on the system is overcome; 2) no interference with the operation of the system during the learning process and no update of the interference input in a prescribed manner is required; 3) when solving the exact solution under the condition of satisfying the continuous excitation, there is no deviation even if the detection noise is added to the system input. It is noted that the literature adopts a strategy learning method to research the optimal control problem of the system. Documents that employ the non-policy learning method include: researching the H-infinity control problem of the model-free non-strategy reinforcement learning continuous system; researching the optimal operation control of the double-time-scale industrial process under the non-strategy reinforcement learning; researching the H-infinity control problem of a linear discrete system under the non-strategy reinforcement learning; researching the H infinite control design problem of non-strategy reinforcement learning; and (3) researching the optimal control of the affine nonlinear discrete system under the non-strategy interleaving Q learning. A multi-agent cooperative control system, or a dynamic system with multiple decision quantities and multiple control inputs, is widely available in modern social production. In the non-zero-sum game, each agent needs to adopt an optimal control strategy to optimize performance indexes by itself. Documents that solve the game problem by using a non-strategy learning method include: researching non-strategy reinforcement learning in multi-agent graph synchronous game; the optimal control problem of the unknown system of two individuals with disturbance under non-strategic learning is studied, and the like.
The learners already adopt a non-strategic Q learning method to research the optimal self-adaptive control strategy of a single system, so whether the non-strategic Q learning method can research the optimal control problem of a plurality of system games, and how to design the non-strategic Q learning method under the condition that a system model is unknown realizes the Nash of a plurality of systemsEqualization, which is the problem concerned by the present invention, has not been reported in the related literature.
The invention aims to provide a multi-body optimization control method based on non-strategy Q learning, and provides a non-strategy Q learning method which is used for solving the problems of multi-body nonzero and game of a linear discrete system and verifying the effectiveness of the method in a simulation way; the invention integrates game theory and non-strategy Q learning method, and provides the non-strategy under the framework of non-zero gameLearning optimal control strategies to achieve global Nash for the entire systemAnd (4) equalizing.
The purpose of the invention is realized by the following technical scheme:
a multi-body optimization control method based on non-strategy Q learning firstly provides a non-zero and game optimization problem, strictly proves that a value function defined according to individual performance indexes is a linear quadratic form, then provides a non-strategy Q learning algorithm based on a dynamic programming and Q learning method, obtains an approximate optimal solution of the non-zero and game, and realizes the global Nash balance of a system; the algorithm does not require that the parameters of the system model are known, and can completely utilize measurable data to learn Nash equilibrium solution; finally, calculating the effectiveness of the simulation verification method;
the method comprises the following specific steps:
1) firstly, describing discrete linear non-zero and game problems, and then proving that the value function of an individual is a linear quadratic form;
2) solving a non-zero sum game, and providing a non-strategy Q learning algorithm;
3) and (4) performing example simulation, namely performing program simulation by using the newly proposed algorithm to prove the effectiveness of the algorithm and the convergence of data.
According to the multi-body optimization control method based on non-strategic Q learning, the value function of the proved individual is a linear quadratic form, and the following linear discrete system equation is considered:whereinAndis a control input to the control unit,is an adaptive matrix. Design state feedback controllerMinimizing each individualOwn performance index,Wherein the content of the first and second substances,。
the multi-body optimization control method based on non-strategy Q learning comprises the steps of providing a model-free strategy Q learning algorithm,
the Q function matrix H in equation (19) is learned to obtain optimum control gains for a plurality of individuals.
According to the multi-body optimization control method based on non-strategic Q learning, the validity of the non-strategic Q learning algorithm is proved through the example simulation.
The invention has the advantages and effects that:
the invention integrates game theory and non-strategy Q learning method, provides the non-strategy Q learning method under the framework of non-zero and game, learns the optimal control strategy, and realizes the global Nash of the whole systemAnd (4) equalizing. Firstly, defining controllers of a plurality of intelligent agents through dynamic programming, and then obtaining a game Bellman equation based on a non-policy Q function Obtaining a non-strategy Q learning method, and finally verifying the effectiveness of the method by an algorithm.
Drawings
FIG. 1 shows the convergence of H under the strategy Q learning method;
FIG. 2 shows K convergence under a strategy Q learning method;
FIG. 3 is a system state x of a first scenario under the strategy Q learning method;
FIG. 4 shows a system state x of a second scheme under the strategy Q learning method;
FIG. 5 shows a system state x of a third scenario under the strategy Q learning method;
FIG. 6 convergence of H under the non-strategic Q learning method;
FIG. 7 convergence of K under the non-strategic Q learning method;
FIG. 8 System State x under the non-strategic Q learning approach.
Detailed Description
The present invention will be described in detail with reference to examples.
1. Problem elucidation
The discrete linear non-zero and gambling problems are first explained and then the individual value functions are proven to be linear quadratic.
whereinAndis a control input to the control unit,is an adaptive matrix. Design state feedback controllerMinimizing each individualOwn performance index:Wherein the content of the first and second substances,。
problem 1: the performance indexes are as follows:
the constraint conditions are as follows:
respectively defining an optimal value function according to the performance index formula (3)And the optimal Q function is:
theorem 1: for game problem 1, if control is enteredIs an allowable control, the optimum function and the optimum Q function can be expressed as quadratic forms as follows:and
Andwherein the content of the first and second substances,further, in the method, the number of the main components is more than one,
2. Solving non-zero sum games
The invention mainly provides a non-strategy Q learning method. It is well known that the basis of gaming is nash equilibrium.
Definition 1: nash equilibrium.
Then this n-ary policy groupThen the nash equilibrium of n-element limited game under the generalized form is formed. And N-tupleIt is the nash equilibrium result of this n-gram.
From equations (5) and (6), the following bellman equation based on the Q function is obtained according to the dynamic programming:
then, the game Bellman equation of the optimal Q function is subjected to partial derivation, and the optimal control gain of each individual can be obtained。
will be in formula (18)The Rika equation (26) is substituted to obtain the optimal Q functionThe equation:
the following have been demonstrated in the literatureEnsuring that system formula (1) realizes Nash equilibrium.
Note 1: it can be seen from the equations (18) and (20) that neither the bellman equation nor the ricatt equation for the optimal Q function, in which the matrices H are coupled to each other and the K values are also coupled to each other, is well solved. Therefore, the strategy Q learning algorithm is given below.
2.1 strategy Q learning Algorithm
The following gives a model-free strategy Q learning algorithm, which learns the Q function matrix H in equation (19) to obtain the optimal control gains for multiple individuals.
Algorithm 1: strategy Q learning algorithm
1. Initialization: giving initial values of control gains for a plurality of bodies. Wherein,Is an iteration index;
3. and (3) policy updating:
wherein the control gain term of the ith individual can be expressed as:
Note 3: because the strategy Q learning algorithm has deviation, but the non-strategy Q learning algorithm has a plurality of advantages compared with the strategy Q learning algorithm, the deviation can be eliminated. Therefore, the following subsection proposes that the non-strategy Q learning algorithm has a plurality of advantages and can eliminate deviation. Therefore, the following subsection presents a non-strategic Q learning algorithm
A Q function-based non-policy algorithm is provided, and the algorithm is a model-free data-driven algorithm and is used for solving the problems of non-zero and game of multiple individuals.
From equation (21), it follows:
in this formula,Is a behavior control strategy, used to generate data,the target control strategy of the individual who needs to learn. When the state trajectory of the system is equation (27), it can be derived:
due to the fact thatAndthere are corresponding relations (14) and (15). Further, it can be deduced that:
the method is simplified and can be obtained:
and 2, algorithm: non-strategy Q learning algorithm
3. initialization: giving initial values of control gains for a plurality of bodiesAnd the system formula (1) must be stable. Wherein,Is an iteration index;
4. implementing a Q learning algorithm: using the data obtained in the first step, updating by iteratively solving the equation (31) through the algorithmA value of (d);
5. if it is notThe process is stopped and the process is stopped,is a very small value. If not, then,and returns to the third step.
Note 4: the solution of equation (31) is equivalent to the solution of equation (21), and it is confirmed that it converges to the optimal solution。
4. Example simulation
In this section, an example is given of a program simulation using the newly proposed algorithm to demonstrate the effectiveness of the algorithm and the convergence of the data.
Consider a more complex, linearly discrete system with two individuals playing a non-zero and game.
Sampling timeSelectingAndselecting. First, the mostFunction equation and optimizationCorresponding to function equationMatrix sumMatrix, and optimal control gain for agentAndin aThe true solution is solved by using an iterative algorithm depending on a system model.
It is well known that the addition of detection noise is necessary by ensuring sufficient excitation conditions, solving equation (16) accurately. The detection noise added by the invention has the following three types:
the first scheme is as follows:
scheme II:
the third scheme is as follows:
wherein
Table 1: three game states under detection noise
It can be seen from table 1 that the non-strategic Q-learning algorithm is not affected by the detection noise disturbance. And the strategy Q learning algorithm is relatively greatly influenced by the detection noise interference. The effectiveness of the non-policy Q learning algorithm is demonstrated.
FIG. 1 and FIG. 2 are matrices under the strategy Q learning algorithm, respectivelyAndand controlling the gainAndfig. 3, 4 and 5 are convergence images of the system state x under three different detection noises of the strategy Q learning algorithm. FIG. 6 and FIG. 7 are matrices under the non-strategic Q learning algorithm, respectivelyAndand controlling the gainAndfig. 8 is a convergence image of the system state x under the non-policy Q learning algorithm.
Claims (2)
1. A multi-body optimization control method based on non-strategy Q learning is characterized in that the method firstly provides a non-zero and game optimization problem, strictly proves that a value function defined according to individual performance indexes is a linear quadratic form, then provides a non-strategy Q learning algorithm based on a dynamic programming and Q learning method, obtains an approximate optimal solution of the non-zero and game and realizes the global Nash balance of a system; the algorithm does not require that the parameters of the system model are known, and can completely utilize measurable data to learn Nash equilibrium solution; finally, calculating the effectiveness of the simulation verification method;
the method comprises the following specific steps:
1) firstly, describing discrete linear non-zero and game problems, and then proving that the value function of an individual is a linear quadratic form;
2) solving a non-zero sum game, and providing a non-strategy Q learning algorithm;
3) performing example simulation, namely providing an example, and performing program simulation by using a newly proposed algorithm to prove the effectiveness of the algorithm and the convergence of data;
the non-strategy Q learning algorithm gives out a strategy Q learning algorithm without a model,
q function matrix in learning equation (19)Thereby obtaining the optimal control gains of a plurality of individuals;
The value function of the proved individual is a linear quadratic form, and the following linear discrete system equation is considered:
whereinAndis a control input to the control unit,is an adaptive matrix; design state feedback controllerMinimizing each individualOwn performance index;
n-ary policy groupSo that the Nash equilibrium of the n-element limited game under the generalized form is formed;
the non-policy Q learning algorithm
2) initialization: giving initial values of control gains for a plurality of bodiesAnd must be made systematicCan be stable; wherein,Is an iteration index;
3) implementing a Q learning algorithm: using the data from the first step, iteratively solving by an algorithmUpdateA value of (d);
4) if it is notThe process is stopped and the process is stopped,is a very small number; if not, then the mobile terminal can be switched to the normal mode,and returning to the third step;
Wherein the content of the first and second substances,
2. the method according to claim 1, wherein the example simulation proves the effectiveness of the non-strategic Q learning algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910352788.5A CN110083063B (en) | 2019-04-29 | 2019-04-29 | Multi-body optimization control method based on non-strategy Q learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910352788.5A CN110083063B (en) | 2019-04-29 | 2019-04-29 | Multi-body optimization control method based on non-strategy Q learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110083063A CN110083063A (en) | 2019-08-02 |
CN110083063B true CN110083063B (en) | 2022-08-12 |
Family
ID=67417405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910352788.5A Active CN110083063B (en) | 2019-04-29 | 2019-04-29 | Multi-body optimization control method based on non-strategy Q learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110083063B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110782011B (en) * | 2019-10-21 | 2023-11-24 | 辽宁石油化工大学 | Distributed optimization control method of networked multi-agent system based on reinforcement learning |
CN111882101A (en) * | 2020-05-25 | 2020-11-03 | 北京信息科技大学 | Control method based on supply chain system consistency problem under switching topology |
CN111624882B (en) * | 2020-06-01 | 2023-04-18 | 北京信息科技大学 | Zero and differential game processing method for supply chain system based on reverse-thrust design method |
CN112180730B (en) * | 2020-10-10 | 2022-03-01 | 中国科学技术大学 | Hierarchical optimal consistency control method and device for multi-agent system |
CN112947084B (en) * | 2021-02-08 | 2022-09-23 | 重庆大学 | Model unknown multi-agent consistency control method based on reinforcement learning |
CN113364386B (en) * | 2021-05-26 | 2023-03-21 | 潍柴动力股份有限公司 | H-infinity current control method and system based on reinforcement learning of permanent magnet synchronous motor |
CN114200834B (en) * | 2021-11-30 | 2023-06-30 | 辽宁石油化工大学 | Optimal tracking control method for model-free off-track strategy in batch process in packet loss environment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107949025A (en) * | 2017-11-02 | 2018-04-20 | 南京南瑞集团公司 | A kind of network selecting method based on non-cooperative game |
CN109121105A (en) * | 2018-09-17 | 2019-01-01 | 河海大学 | Operator's competition slice intensified learning method based on Markov Game |
-
2019
- 2019-04-29 CN CN201910352788.5A patent/CN110083063B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107949025A (en) * | 2017-11-02 | 2018-04-20 | 南京南瑞集团公司 | A kind of network selecting method based on non-cooperative game |
CN109121105A (en) * | 2018-09-17 | 2019-01-01 | 河海大学 | Operator's competition slice intensified learning method based on Markov Game |
Non-Patent Citations (4)
Title |
---|
H∞ Control for Discrete-time Linear Systems by Integrating Off-policy Q-learning and Zero-sum Game*;Jinna Li 等;《ICCA》;20180823;第817-822页 * |
Optimal Adaptive Control and Differential Games by Reinforcement Leanring Principles;Warren Dixon;《IEEE》;20140514;第17-18、195-235页 * |
Warren Dixon.Optimal Adaptive Control and Differential Games by Reinforcement Leanring Principles.《IEEE》.2014,第17-18、195-235页. * |
混合多Agent环境下动态策略强化学习算法;肖正 等;《小型微型计算机系统》;20090731;第30卷(第7期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110083063A (en) | 2019-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110083063B (en) | Multi-body optimization control method based on non-strategy Q learning | |
Fu et al. | Online solution of two-player zero-sum games for continuous-time nonlinear systems with completely unknown dynamics | |
Chen et al. | Generalized Hamilton–Jacobi–Bellman formulation-based neural network control of affine nonlinear discrete-time systems | |
Wu et al. | Fuzzy adaptive event-triggered control for a class of uncertain nonaffine nonlinear systems with full state constraints | |
CN107272403A (en) | A kind of PID controller parameter setting algorithm based on improvement particle cluster algorithm | |
CN110083064B (en) | Network optimal tracking control method based on non-strategy Q-learning | |
Nikdel et al. | Improved Takagi–Sugeno fuzzy model-based control of flexible joint robot via Hybrid-Taguchi genetic algorithm | |
CN101390024A (en) | Operation control method, operation control device and operation control system | |
Zhao et al. | Neural network-based fixed-time sliding mode control for a class of nonlinear Euler-Lagrange systems | |
Hashemi et al. | Integrated fault estimation and fault tolerant control for systems with generalized sector input nonlinearity | |
Mu et al. | An ADDHP-based Q-learning algorithm for optimal tracking control of linear discrete-time systems with unknown dynamics | |
CN113325717B (en) | Optimal fault-tolerant control method, system, processing equipment and storage medium based on interconnected large-scale system | |
CN116661307A (en) | Nonlinear system actuator fault PPB-SIADP fault-tolerant control method | |
CN114839880A (en) | Self-adaptive control method based on flexible joint mechanical arm | |
Zong et al. | Input-to-state stability-modular command filtered back-stepping control of strict-feedback systems | |
CN111624882B (en) | Zero and differential game processing method for supply chain system based on reverse-thrust design method | |
CN116880191A (en) | Intelligent control method of process industrial production system based on time sequence prediction | |
Vamvoudakis et al. | Non-zero sum games: Online learning solution of coupled Hamilton-Jacobi and coupled Riccati equations | |
Gao et al. | Robust resilient control for parametric strict feedback systems with prescribed output and virtual tracking errors | |
CN113485099B (en) | Online learning control method of nonlinear discrete time system | |
CN112346342B (en) | Single-network self-adaptive evaluation design method of non-affine dynamic system | |
CN108181808B (en) | System error-based parameter self-tuning method for MISO partial-format model-free controller | |
Wakitani et al. | Design of a cmac-based pid controller using operating data | |
WO2019086243A1 (en) | Randomized reinforcement learning for control of complex systems | |
CN108803314A (en) | A kind of NEW TYPE OF COMPOSITE tracking and controlling method of Chemical Batch Process |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20190802 Assignee: Liaoning Hengyi special material Co.,Ltd. Assignor: Liaoming Petrochemical University Contract record no.: X2023210000276 Denomination of invention: A Multi individual Optimization Control Method Based on Non Policy Q-Learning Granted publication date: 20220812 License type: Common License Record date: 20231130 |
|
EE01 | Entry into force of recordation of patent licensing contract |