CN113281999A - Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning - Google Patents
Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning Download PDFInfo
- Publication number
- CN113281999A CN113281999A CN202110441572.3A CN202110441572A CN113281999A CN 113281999 A CN113281999 A CN 113281999A CN 202110441572 A CN202110441572 A CN 202110441572A CN 113281999 A CN113281999 A CN 113281999A
- Authority
- CN
- China
- Prior art keywords
- flight
- unmanned aerial
- aerial vehicle
- learning
- strategy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 35
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000002787 reinforcement Effects 0.000 title claims abstract description 26
- 238000013526 transfer learning Methods 0.000 title claims abstract description 21
- 230000009471 action Effects 0.000 claims abstract description 37
- 238000012546 transfer Methods 0.000 claims abstract description 31
- 238000013507 mapping Methods 0.000 claims abstract description 11
- 238000012937 correction Methods 0.000 claims abstract description 8
- 230000007613 environmental effect Effects 0.000 claims abstract description 7
- 238000013135 deep learning Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 26
- 238000004088 simulation Methods 0.000 claims description 17
- 238000013528 artificial neural network Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 11
- 230000007704 transition Effects 0.000 claims description 11
- 230000008901 benefit Effects 0.000 claims description 6
- 230000005012 migration Effects 0.000 claims description 5
- 238000013508 migration Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 2
- 230000006872 improvement Effects 0.000 claims description 2
- 238000005457 optimization Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims description 2
- RZVHIXYEVGDQDX-UHFFFAOYSA-N 9,10-anthraquinone Chemical compound C1=CC=C2C(=O)C3=CC=CC=C3C(=O)C2=C1 RZVHIXYEVGDQDX-UHFFFAOYSA-N 0.000 description 10
- 230000000694 effects Effects 0.000 description 5
- 230000002411 adverse Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000004888 barrier function Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The invention discloses an unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning, which comprises the steps of (1) establishing an unmanned aerial vehicle simulator environment; (2) constructing an environment transfer model based on deep learning, and randomly initializing mapping in the environment transfer model; (3) constructing a learning-enhanced A3C algorithm, and randomly initializing a flight strategy of the algorithm; (4) constructing an environment inverse transfer model based on deep learning; (5) collecting flight data obtained by operating an unmanned aerial vehicle to fly by an unmanned aerial vehicle operator and a strategy in a real environment; (6) updating the environment transfer model based on the real flight data; (7) using and performing transfer learning based on action correction, correcting a flight strategy, and executing in a simulator to obtain simulated flight data; (8) based on the simulated flight data, the flight strategy is updated using the A3C algorithm, while the environmental reverse transfer model is updated. Until the strategy converges. And finally obtaining the strategy as the initial flight strategy of the real unmanned aerial vehicle.
Description
Technical Field
The invention relates to an unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning, and belongs to the technical field of unmanned aerial vehicle autonomous flight control.
Background
Autonomous flight control of an unmanned aerial vehicle in various, complex and rapidly changing environments has always been a difficult point in the field of unmanned aerial vehicle flight control. The conventional flight control is realized by compiling flight control rules in a manual mode, namely, all possible situations in the flight process of the unmanned aerial vehicle are considered in advance, and the listed situations are processed one by one through modes such as feedback control, rule compiling and the like by combining with the professional knowledge and experience of experts in the field of the unmanned aerial vehicle. However, first, the rule writing requires a large amount of labor cost; secondly, if the mutual influence among various situations cannot be considered, the flight control fails; finally, the autonomous flight control of the unmanned aerial vehicle needs to consider high-dimensional sensing information such as radar and a camera, and the processing of the high-dimensional information is a huge challenge for the conventional flight control of the unmanned aerial vehicle.
In recent years, intensive learning has been successful in autonomous control in both a simulation environment such as an electronic game and a real environment such as a robot arm. The reinforcement learning training needs a large number of samples, the samples are obtained in a real environment, and the disadvantages of high risk, low speed, high equipment cost and the like exist, so that a simulator is needed for simulating the real environment. In some related researches in the field of unmanned aerial vehicles, a simulation simulator for simulating a real environment is constructed through a reinforcement learning algorithm, so that the simulated unmanned aerial vehicle can perform a large number of trial and error in the simulation environment, and an autonomous flight strategy is obtained through learning.
However, the simulator environment and the simulated drone are inevitably different from the real environment and the real drone, so that the flight strategy learned in the simulator is unlikely to bring promotion to the performance of the drone in the real environment. In order to solve the above-mentioned problem of human cost in the conventional autonomous flight control and the problem of difference between the simulator and the real environment in the reinforcement learning, one method is to combine the reinforcement learning and the transfer learning. Unmanned aerial vehicle carries out a large amount of trial and error study through the mode of reinforcement study in the simulator environment to the mode through migration study reduces the adverse effect that the difference of simulator environment and reality environment brought. It is critical which transfer learning algorithm is selected for a particular problem. A commonly used transfer learning algorithm is Domain adaptation (Domain adaptation), which requires optimal flight data in a real environment, and requires that a simulated flight trajectory of an unmanned aerial vehicle has the same distribution as the optimal flight trajectory in the real environment when training a flight strategy of the unmanned aerial vehicle in a simulator. The rest large amount of non-optimal flight data cannot be fully utilized by the domain-adaptive transfer learning, and the optimal flight data of the real environment is not easy to acquire.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides an unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning, aiming at the problems that the unmanned aerial vehicle autonomous flight control cannot process complex and variable environments due to manual rule control, and flight strategies cannot be applied to real environments due to unavoidable difference between simulation environments for carrying out unmanned aerial vehicle flight strategy training by using reinforcement learning algorithms and real environments. The adverse effect of simulator and reality difference on flight strategy is reduced by combining the transfer learning based on the action correction based on the reinforcement learning. The optimal flight strategy obtained by learning of the invention has smaller difference with the actual environment, and the flight action is smooth, so that the method can be used as a better initial flight strategy or an auxiliary flight strategy in the actual unmanned aerial vehicle.
The technical scheme is as follows: an unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning is characterized in that flight data in a real environment are collected, and a state transition model of the real environment is obtained through learning; and simultaneously training the flight strategy of the unmanned aerial vehicle and the reverse transfer model of the simulator environment in the simulator, and correcting the flight action to be executed in the simulator by using the transfer model of the real environment and the reverse transfer model of the simulation environment. The method comprises the following steps:
(1) creating an unmanned aerial vehicle simulator environment; (2) construction of an environmental transfer model f based on deep learningαI.e. the mapping of "current state-current action" to the next state and randomly initializing the mapping; (3) constructing a strengthened learning A3C algorithm and randomly initializing a flight strategy pi thereofθ(ii) a (4) Construction of environment reverse transfer model f 'based on deep learning'βI.e., the mapping of "current state-next state" to the current action, and randomly initializing the mapping; (5) collecting unmanned aerial vehicle operators and policies piθOperating the unmanned aerial vehicle to fly in a real environment to obtain flight data, namely track data formed by continuous 'state-action' pairs; (6) updating an environmental transfer model f based on real flight dataα(ii) a (7) Using fαAnd f'βPerforming a transfer learning based on a motion correction (grouped Action Transformation) to correct a flight strategy piθObtaining a flight strategy pi ', and executing pi' in a simulator to obtain simulated flight data; (8) updating flight strategy pi by using A3C algorithm based on simulated flight dataθUpdating environmental reverse transfer model f 'at the same time'β. Repeating (5) - (8) until strategy piθAnd (6) converging. Finally, the strategy pi is obtainedθAs an initial flight strategy for a real drone.
And constructing a simulation simulator based on the aerodynamic model, the unmanned aerial vehicle model and the possible flight scene and flight mission encountered by the unmanned aerial vehicle, and visualizing by using a Unreal4 game engine. The simulation simulator comprises an unmanned aerial vehicle, a flight scene and a flight task, wherein in the simulation simulator, the flight state of the unmanned aerial vehicle changes along with the time in the flight process, and the simulation environment can continuously generate various obstacles. The process can be approximated as a Markov Decision Process (MDP) represented by a five-tuple < S, A, P, R, γ > where S is the state space, A is the motion space, P is the state transition probability, R is the single step reward derived from the environment, and γ is the discount factor for the accumulated reward. The observation information provided by the simulator includes the relative orientation and distance of the target, the lane offset, the wireless and radar detection information, etc.
Flight strategy pi using drone operator and simulatorθControlling the unmanned aerial vehicle, collecting flight data of the unmanned aerial vehicle in the real environment, extracting all triples (s, a, s '), wherein s is the current state, a is the current action, and s' is the next state, and obtaining a data set D for training a state transition model of the real environmentreal={(s1,a1,s2),(s2,a2,s3),...,(sn-1,an-1,sn)}。
Using the current state-current action pair as a feature (feature) and the next state as a label (label), performing regression learning, and training a state transition model f in a real environmentα. By minimizing the transfer loss function:and updating the model parameters.
Repeatedly, state s is input into flight strategy π in A3CθThe output is obtained as action a, and action a ' is obtained as f ' by using the action correction 'β(s,fα(s, a)), an action a' is performed in the simulator resulting in a new state s, and a prize value r. Until enough data sets were collected to train the A3C algorithm model: d {(s)1,a1,r1,s2),(s2,a2,r2,s3),...,(sn-1,an-1,rn-1,sn) Data set of the inverse transfer model f': dsim={(s1,s2,a1),(s2,s3,a2),...,(sn-1,sn,an-1)}。
In A3C reinforcement learning algorithm flight strategy training, a deep neural network is used as a flight strategy piθ(actor) and evaluation network (critic). In the actor-critic algorithm framework, commentsThe family is responsible for evaluating the actors, and the actors perform and perform skill improvement according to the evaluation. The A3C (Advantage operator-critic) algorithm is based on the operator-critic and is improved by two steps: (1) A3C uses an Advantage function (Advantage function) as critic, which reduces the variance of critic, thereby reducing the variance of strategy gradient and further stabilizing training; (2) the data samples are asynchronously collected and the actor and the critic are asynchronously updated by using a plurality of simulation environments, so that more samples can be collected in unit time, the network updating is performed, the training speed is increased, and the variance of the critic gradient of the actor is further reduced. The dominance function of A3C is: advantage (s, a) ═ Q (s, a) -Vφ(s) where Q is a function of the state action value, V is a value network, and φ is a neural network parameter of V,
Updating a strategic neural network (Actor) and a value neural network (criticic) based on an A3C algorithm based on the collected data set, defining a loss function of the value network in A3C as:define the actor network pi in A3C networkθThe loss function of (d) is: j. the design is a squareπ(θ)=-E(s,a)~D[logπθ(a|s)(Q(s,a)-Vφ(s))]Training an optimization strategy by iteratively sampling data and optimizing the loss function, minimizing the loss function Jπ(theta) and Jv(phi), r (s, a) is a reward function given by the environment, phioldOld values stored for phi, pi being flight strategy piθFor short. .
Training reverse transfer model f 'of simulator environment based on regression learning algorithm'βThe neural network parameters are updated by minimizing the following loss function:
until the model converges or a maximum number of iterations is reached. The finally obtained flight strategy model is applied to the actual unmanned aerial vehicle, and the effect of the flight strategy model is observed.
Compared with the prior art, the invention has the following advantages:
1. the invention uses the deep reinforcement learning algorithm to enable the unmanned aerial vehicle to have the capability of autonomous flight in the complex and variable flight environment, and the realization is more efficient than the traditional control mode of manually writing rules.
2. According to the invention, by means of the migration learning algorithm based on the Action correction (grouped Action Transformation), the adverse effect on the flight strategy caused by the difference between the real environment and the simulator environment is reduced, and the strategy obtained by training in the simulator can be better used as the initial flight strategy of the real unmanned aerial vehicle.
3. The migration learning algorithm used by the invention can utilize the non-optimal flight data of the unmanned aerial vehicle in the real environment, so that the algorithm has higher robustness.
Drawings
FIG. 1 is an overall frame diagram of the present invention;
FIG. 2 is a training flow diagram of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
The unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning is used for acquiring flight data in a real environment and learning to obtain a state transition model of the real environment; and simultaneously training the flight strategy of the unmanned aerial vehicle and the reverse transfer model of the simulator environment in the simulator, and correcting the flight action to be executed in the simulator by using the transfer model of the real environment and the reverse transfer model of the simulation environment. Comprises the following steps:
the method comprises the following steps:
a simulation simulator is constructed based on the aerodynamic model, the unmanned aerial vehicle model and the flight scenario and flight mission that the unmanned aerial vehicle may encounter, and is visualized using the Unreal4 game engine. In the simulator, unmanned aerial vehicle is along with time lapse in the flight process, and the flight state of self can change, and various complicated barriers also can be constantly produced to the simulated environment. The process can be approximated as a Markov Decision Process (MDP) represented by a five-tuple < S, A, P, R, γ > where S is the state space, A is the motion space, P is the state transition probability, R is the single step reward derived from the environment, and γ is the discount factor for the accumulated reward.
Step two:
constructing and initializing a model of reinforcement learning A3C Algorithm, an inverse transfer model f 'of a simulator Environment'βAnd a transfer model f of the real environmentα. Wherein f'βFor the mapping of "Current State-Next State" to Current action, fαIs the mapping of the "current state-current action" pair to the next state.
Step three:
flight strategy pi using drone operator and simulatorθControlling the unmanned aerial vehicle, collecting flight data of the unmanned aerial vehicle in the real environment, extracting all triples (s, a, s '), wherein s is the current state, a is the current action, and s' is the next state, and obtaining a data set D for training a state transition model of the real environmentreal={(s1,a1,s2),(s2,a2,s3),...,(sn-1,an-1,sn)}。
Step four:
according to the data obtained in the third step, taking the current state-current action pair as a feature (feature) and the next state as a label (label), performing regression learning, and training the state transition model f of the real environmentα. By minimizing the loss function:
step five:
repeatedly, state s is input into flight strategy π in A3CθThe output is obtained as action a, and action a ' is obtained as f ' by using the action correction 'β(s,fα(s, a)), an action a' is performed in the simulator resulting in a new state s, and a prize value r. Until enough data sets were collected to train the A3C algorithm model:
D={(s1,a1,r1,s2),(s2,a2,r2,s3),...,(sn-1,an-1,rn-1,sn)}
and reverse transfer model f'αThe data set of (a):
Dsim={(s1,s2,a1),(s2,s3,a2),...,(sn-1,sn,an-1)}。
step six:
based on the data collected in step five, the strategic neural network (Actor) and the value neural network (Critic) are updated based on the A3C algorithm by minimizing the loss function J described belowπ(theta) and Jv(φ):
Jπ(θ)=-E(s,a)~D[logπθ(a|s)(Q(s,a)-Vφ(s))]
Wherein Q is a function of the state action value, a function of the V state value,
V(s)=E(s,a,s',r)~D[r(s,a)+Vφ(s')],
Q(s,a)=r(s,a)+Εs'~p(s,a)[Vφ(s')]
training reverse transfer model f 'of simulator environment based on regression learning algorithm'βBy minimizing the following loss function:
and repeating the third, fourth, fifth and sixth steps until the model converges or the maximum iteration times. The finally obtained flight strategy model is applied to the actual unmanned aerial vehicle, and the effect of the flight strategy model is observed. The overall algorithm flow is shown in algorithm 1 below.
Claims (8)
1. An unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning is characterized by comprising the following steps:
(1) creating an unmanned aerial vehicle simulator environment;
(2) construction of an environmental transfer model f based on deep learningαI.e. the mapping of "current state-current action" to the next state and randomly initializing the mapping;
(3) constructing a strengthened learning A3C algorithm and randomly initializing a flight strategy pi thereofθ;
(4) Construction of environment reverse transfer model f 'based on deep learning'βI.e., the mapping of "current state-next state" to the current action, and randomly initializing the mapping;
(5) collecting unmanned aerial vehicle operator and flight strategy piθOperating the unmanned aerial vehicle to fly in a real environment to obtain flight data, namely track data formed by continuous 'state-action' pairs;
(6) updating an environmental transfer model f based on real flight dataα(ii) a (7) Using fαAnd f'βPerforming transfer learning based on action correction, correcting flight strategy piθObtaining a flight strategy pi ', and executing pi' in a simulator to obtain simulated flight data;
(8) updating flight strategy pi by using A3C algorithm based on simulated flight dataθUpdating environmental reverse transfer model f 'at the same time'β;
Repeating (5) - (8) until strategy piθConverging; finally, the strategy pi is obtainedθAs an initial flight strategy for a real drone.
2. The unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning of claim 1, wherein a simulation simulator is constructed based on an aerodynamic model, an unmanned aerial vehicle model and an unmanned aerial vehicle encountering flight scene and flight mission, and is visualized by using a Unreal4 game engine; the simulation simulator comprises an unmanned aerial vehicle, a flight scene and a flight task, wherein in the simulation simulator, the flight state of the unmanned aerial vehicle changes along with the time lapse in the flight process, and the simulation environment can continuously generate various obstacles; the process is expressed by a Markov decision process, and is expressed by a quintuple < S, A, P, R, gamma, wherein S is a state space, A is an action space, P is a state transition probability, R is a single step reward obtained from the environment, and gamma is a discount factor of the accumulated reward.
3. The reinforcement learning and migration learning-based unmanned aerial vehicle autonomous flight training method according to claim 1, characterized in that a unmanned aerial vehicle operator and simulator flight strategy pi is usedθControlling the unmanned aerial vehicle, collecting flight data of the unmanned aerial vehicle in the real environment, extracting all triples (s, a, s '), wherein s is the current state, a is the current action, and s' is the next state, and obtaining a data set D for training a state transition model of the real environmentreal={(s1,a1,s2),(s2,a2,s3),...,(sn-1,an-1,sn)}。
4. The unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning of claim 1, wherein the state of the real environment is trained by performing regression learning with a current state-current action pair as a feature and a next state as a labelState transition model fαBy minimizing the transfer loss function:and updating the neural network parameter alpha of the transfer model.
5. The unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning of claim 1, wherein state s is repeatedly input into flight strategy pi in A3CθThe output is obtained as action a, and action a ' is obtained as f ' by using the action correction 'β(s,fα(s, a)), an action a' is performed in the simulator resulting in a new state s, and a prize value r. Until enough data sets were collected to train the A3C algorithm model: d {(s)1,a1,r1,s2),(s2,a2,r2,s3),...,(sn-1,an-1,rn-1,sn) And reverse transfer model f'βThe data set of (a): dsim={(s1,s2,a1),(s2,s3,a2),...,(sn-1,sn,an-1)}。
6. The unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning of claim 1, wherein in the training of the flight strategy of the A3C reinforcement learning algorithm, a deep neural network is used as a network model of the flight strategy and an evaluation network; the A3C algorithm is based on the actor-critic with two improvements: (1) A3C uses the merit function as critic; (2) asynchronously collecting data samples and asynchronously updating an actor and a critic using a plurality of simulation environments; the dominance function of A3C is: advantage (s, a) ═ Q (s, a) -Vφ(s) where Q is a function of the state action value, V is the value network, phi is the neural network parameter of V,
7. The unmanned aerial vehicle autonomous flight training method based on reinforcement learning and migration learning of claim 5, wherein based on the collected data set, the strategy neural network (Actor) and the value neural network (Critic) are updated based on the A3C algorithm, and a loss function of the value network in A3C is defined as:define the actor network pi in A3C networkθThe loss function of (d) is: j. the design is a squareπ(θ)=-E(s,a)~D[logπθ(a|s)(Q(s,a)-Vφ(s))]Training an optimization strategy by iteratively sampling data and optimizing the loss function, minimizing the loss function Jπ(theta) and Jv(phi), where r (s, a) is a reward function given by the environment, phioldOld values stored for phi, pi being flight strategy piθFor short.
8. The unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning of claim 5, wherein an inverse transfer model f 'of a simulator environment is trained based on a regression learning algorithm'βUpdating the neural network parameter β by minimizing the following inverse transfer loss function:
until the model converges or the maximum number of iterations is reached; finally obtained flight strategy piθAnd the method is applied to the real unmanned aerial vehicle.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110441572.3A CN113281999A (en) | 2021-04-23 | 2021-04-23 | Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110441572.3A CN113281999A (en) | 2021-04-23 | 2021-04-23 | Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113281999A true CN113281999A (en) | 2021-08-20 |
Family
ID=77277239
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110441572.3A Pending CN113281999A (en) | 2021-04-23 | 2021-04-23 | Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113281999A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113885549A (en) * | 2021-11-23 | 2022-01-04 | 江苏科技大学 | Four-rotor attitude trajectory control method based on dimension cutting PPO algorithm |
CN114290339A (en) * | 2022-03-09 | 2022-04-08 | 南京大学 | Robot reality migration system and method based on reinforcement learning and residual modeling |
CN114609925A (en) * | 2022-01-14 | 2022-06-10 | 中国科学院自动化研究所 | Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish |
WO2023142316A1 (en) * | 2022-01-25 | 2023-08-03 | 南方科技大学 | Flight decision generation method and apparatus, computer device, and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109948781A (en) * | 2019-03-21 | 2019-06-28 | 中国人民解放军国防科技大学 | Continuous action online learning control method and system for automatic driving vehicle |
CN111695690A (en) * | 2020-07-30 | 2020-09-22 | 航天欧华信息技术有限公司 | Multi-agent confrontation decision-making method based on cooperative reinforcement learning and transfer learning |
US20200372410A1 (en) * | 2019-05-23 | 2020-11-26 | Uber Technologies, Inc. | Model based reinforcement learning based on generalized hidden parameter markov decision processes |
CN112162564A (en) * | 2020-09-25 | 2021-01-01 | 南京大学 | Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm |
CN112215328A (en) * | 2020-10-29 | 2021-01-12 | 腾讯科技(深圳)有限公司 | Training of intelligent agent, and action control method and device based on intelligent agent |
-
2021
- 2021-04-23 CN CN202110441572.3A patent/CN113281999A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109948781A (en) * | 2019-03-21 | 2019-06-28 | 中国人民解放军国防科技大学 | Continuous action online learning control method and system for automatic driving vehicle |
US20200372410A1 (en) * | 2019-05-23 | 2020-11-26 | Uber Technologies, Inc. | Model based reinforcement learning based on generalized hidden parameter markov decision processes |
CN111695690A (en) * | 2020-07-30 | 2020-09-22 | 航天欧华信息技术有限公司 | Multi-agent confrontation decision-making method based on cooperative reinforcement learning and transfer learning |
CN112162564A (en) * | 2020-09-25 | 2021-01-01 | 南京大学 | Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm |
CN112215328A (en) * | 2020-10-29 | 2021-01-12 | 腾讯科技(深圳)有限公司 | Training of intelligent agent, and action control method and device based on intelligent agent |
Non-Patent Citations (2)
Title |
---|
ARMANDOVIEIRA等: "《深度学习商业应用开发指南 从对话机器人到医疗图像处理》", 31 August 2019, 北京航空航天大学出版社 * |
郭宪: "仿生机器人运动步态控制:强化学习方法综述", 《仿生机器人运动步态控制:强化学习方法综述》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113885549A (en) * | 2021-11-23 | 2022-01-04 | 江苏科技大学 | Four-rotor attitude trajectory control method based on dimension cutting PPO algorithm |
CN113885549B (en) * | 2021-11-23 | 2023-11-21 | 江苏科技大学 | Four-rotor gesture track control method based on dimension clipping PPO algorithm |
CN114609925A (en) * | 2022-01-14 | 2022-06-10 | 中国科学院自动化研究所 | Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish |
CN114609925B (en) * | 2022-01-14 | 2022-12-06 | 中国科学院自动化研究所 | Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish |
WO2023142316A1 (en) * | 2022-01-25 | 2023-08-03 | 南方科技大学 | Flight decision generation method and apparatus, computer device, and storage medium |
CN114290339A (en) * | 2022-03-09 | 2022-04-08 | 南京大学 | Robot reality migration system and method based on reinforcement learning and residual modeling |
CN114290339B (en) * | 2022-03-09 | 2022-06-21 | 南京大学 | Robot realistic migration method based on reinforcement learning and residual modeling |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chebotar et al. | Closing the sim-to-real loop: Adapting simulation randomization with real world experience | |
CN113281999A (en) | Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning | |
Lopes et al. | Intelligent control of a quadrotor with proximal policy optimization reinforcement learning | |
CN111260027B (en) | Intelligent agent automatic decision-making method based on reinforcement learning | |
Li et al. | Learning unmanned aerial vehicle control for autonomous target following | |
CN110442129B (en) | Control method and system for multi-agent formation | |
Qi et al. | Towards latent space based manipulation of elastic rods using autoencoder models and robust centerline extractions | |
CN111260026B (en) | Navigation migration method based on meta reinforcement learning | |
CN109940614B (en) | Mechanical arm multi-scene rapid motion planning method integrating memory mechanism | |
CN113467515B (en) | Unmanned aerial vehicle flight control method based on virtual environment simulation reconstruction and reinforcement learning | |
CN112605973A (en) | Robot motor skill learning method and system | |
Celik et al. | Specializing versatile skill libraries using local mixture of experts | |
Hussein et al. | Deep reward shaping from demonstrations | |
CN115990891B (en) | Robot reinforcement learning assembly method based on visual teaching and virtual-actual migration | |
Bai et al. | Variational dynamic for self-supervised exploration in deep reinforcement learning | |
Xu et al. | Learning strategy for continuous robot visual control: A multi-objective perspective | |
Chen et al. | An overview of robust reinforcement learning | |
Luo et al. | Balance between efficient and effective learning: Dense2sparse reward shaping for robot manipulation with environment uncertainty | |
Wang et al. | RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback | |
Li et al. | Prudent policy gradient with auxiliary actor in multi-degree-of-freedom robotic tasks | |
CN110990769B (en) | Attitude migration algorithm system suitable for multi-degree-of-freedom robot | |
CN111582299B (en) | Self-adaptive regularization optimization processing method for image deep learning model identification | |
Yu et al. | Adaptively shaping reinforcement learning agents via human reward | |
Hong et al. | Dynamics-aware metric embedding: Metric learning in a latent space for visual planning | |
Zheng et al. | Uncalibrated visual servo system based on Kalman filter optimized by improved STOA |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210820 |
|
RJ01 | Rejection of invention patent application after publication |