CN113281999A - Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning - Google Patents

Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning Download PDF

Info

Publication number
CN113281999A
CN113281999A CN202110441572.3A CN202110441572A CN113281999A CN 113281999 A CN113281999 A CN 113281999A CN 202110441572 A CN202110441572 A CN 202110441572A CN 113281999 A CN113281999 A CN 113281999A
Authority
CN
China
Prior art keywords
flight
unmanned aerial
aerial vehicle
learning
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110441572.3A
Other languages
Chinese (zh)
Inventor
俞扬
詹德川
周志华
黄军富
庞竟成
张云天
管聪
陈雄辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110441572.3A priority Critical patent/CN113281999A/en
Publication of CN113281999A publication Critical patent/CN113281999A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses an unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning, which comprises the steps of (1) establishing an unmanned aerial vehicle simulator environment; (2) constructing an environment transfer model based on deep learning, and randomly initializing mapping in the environment transfer model; (3) constructing a learning-enhanced A3C algorithm, and randomly initializing a flight strategy of the algorithm; (4) constructing an environment inverse transfer model based on deep learning; (5) collecting flight data obtained by operating an unmanned aerial vehicle to fly by an unmanned aerial vehicle operator and a strategy in a real environment; (6) updating the environment transfer model based on the real flight data; (7) using and performing transfer learning based on action correction, correcting a flight strategy, and executing in a simulator to obtain simulated flight data; (8) based on the simulated flight data, the flight strategy is updated using the A3C algorithm, while the environmental reverse transfer model is updated. Until the strategy converges. And finally obtaining the strategy as the initial flight strategy of the real unmanned aerial vehicle.

Description

Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning
Technical Field
The invention relates to an unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning, and belongs to the technical field of unmanned aerial vehicle autonomous flight control.
Background
Autonomous flight control of an unmanned aerial vehicle in various, complex and rapidly changing environments has always been a difficult point in the field of unmanned aerial vehicle flight control. The conventional flight control is realized by compiling flight control rules in a manual mode, namely, all possible situations in the flight process of the unmanned aerial vehicle are considered in advance, and the listed situations are processed one by one through modes such as feedback control, rule compiling and the like by combining with the professional knowledge and experience of experts in the field of the unmanned aerial vehicle. However, first, the rule writing requires a large amount of labor cost; secondly, if the mutual influence among various situations cannot be considered, the flight control fails; finally, the autonomous flight control of the unmanned aerial vehicle needs to consider high-dimensional sensing information such as radar and a camera, and the processing of the high-dimensional information is a huge challenge for the conventional flight control of the unmanned aerial vehicle.
In recent years, intensive learning has been successful in autonomous control in both a simulation environment such as an electronic game and a real environment such as a robot arm. The reinforcement learning training needs a large number of samples, the samples are obtained in a real environment, and the disadvantages of high risk, low speed, high equipment cost and the like exist, so that a simulator is needed for simulating the real environment. In some related researches in the field of unmanned aerial vehicles, a simulation simulator for simulating a real environment is constructed through a reinforcement learning algorithm, so that the simulated unmanned aerial vehicle can perform a large number of trial and error in the simulation environment, and an autonomous flight strategy is obtained through learning.
However, the simulator environment and the simulated drone are inevitably different from the real environment and the real drone, so that the flight strategy learned in the simulator is unlikely to bring promotion to the performance of the drone in the real environment. In order to solve the above-mentioned problem of human cost in the conventional autonomous flight control and the problem of difference between the simulator and the real environment in the reinforcement learning, one method is to combine the reinforcement learning and the transfer learning. Unmanned aerial vehicle carries out a large amount of trial and error study through the mode of reinforcement study in the simulator environment to the mode through migration study reduces the adverse effect that the difference of simulator environment and reality environment brought. It is critical which transfer learning algorithm is selected for a particular problem. A commonly used transfer learning algorithm is Domain adaptation (Domain adaptation), which requires optimal flight data in a real environment, and requires that a simulated flight trajectory of an unmanned aerial vehicle has the same distribution as the optimal flight trajectory in the real environment when training a flight strategy of the unmanned aerial vehicle in a simulator. The rest large amount of non-optimal flight data cannot be fully utilized by the domain-adaptive transfer learning, and the optimal flight data of the real environment is not easy to acquire.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides an unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning, aiming at the problems that the unmanned aerial vehicle autonomous flight control cannot process complex and variable environments due to manual rule control, and flight strategies cannot be applied to real environments due to unavoidable difference between simulation environments for carrying out unmanned aerial vehicle flight strategy training by using reinforcement learning algorithms and real environments. The adverse effect of simulator and reality difference on flight strategy is reduced by combining the transfer learning based on the action correction based on the reinforcement learning. The optimal flight strategy obtained by learning of the invention has smaller difference with the actual environment, and the flight action is smooth, so that the method can be used as a better initial flight strategy or an auxiliary flight strategy in the actual unmanned aerial vehicle.
The technical scheme is as follows: an unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning is characterized in that flight data in a real environment are collected, and a state transition model of the real environment is obtained through learning; and simultaneously training the flight strategy of the unmanned aerial vehicle and the reverse transfer model of the simulator environment in the simulator, and correcting the flight action to be executed in the simulator by using the transfer model of the real environment and the reverse transfer model of the simulation environment. The method comprises the following steps:
(1) creating an unmanned aerial vehicle simulator environment; (2) construction of an environmental transfer model f based on deep learningαI.e. the mapping of "current state-current action" to the next state and randomly initializing the mapping; (3) constructing a strengthened learning A3C algorithm and randomly initializing a flight strategy pi thereofθ(ii) a (4) Construction of environment reverse transfer model f 'based on deep learning'βI.e., the mapping of "current state-next state" to the current action, and randomly initializing the mapping; (5) collecting unmanned aerial vehicle operators and policies piθOperating the unmanned aerial vehicle to fly in a real environment to obtain flight data, namely track data formed by continuous 'state-action' pairs; (6) updating an environmental transfer model f based on real flight dataα(ii) a (7) Using fαAnd f'βPerforming a transfer learning based on a motion correction (grouped Action Transformation) to correct a flight strategy piθObtaining a flight strategy pi ', and executing pi' in a simulator to obtain simulated flight data; (8) updating flight strategy pi by using A3C algorithm based on simulated flight dataθUpdating environmental reverse transfer model f 'at the same time'β. Repeating (5) - (8) until strategy piθAnd (6) converging. Finally, the strategy pi is obtainedθAs an initial flight strategy for a real drone.
And constructing a simulation simulator based on the aerodynamic model, the unmanned aerial vehicle model and the possible flight scene and flight mission encountered by the unmanned aerial vehicle, and visualizing by using a Unreal4 game engine. The simulation simulator comprises an unmanned aerial vehicle, a flight scene and a flight task, wherein in the simulation simulator, the flight state of the unmanned aerial vehicle changes along with the time in the flight process, and the simulation environment can continuously generate various obstacles. The process can be approximated as a Markov Decision Process (MDP) represented by a five-tuple < S, A, P, R, γ > where S is the state space, A is the motion space, P is the state transition probability, R is the single step reward derived from the environment, and γ is the discount factor for the accumulated reward. The observation information provided by the simulator includes the relative orientation and distance of the target, the lane offset, the wireless and radar detection information, etc.
Flight strategy pi using drone operator and simulatorθControlling the unmanned aerial vehicle, collecting flight data of the unmanned aerial vehicle in the real environment, extracting all triples (s, a, s '), wherein s is the current state, a is the current action, and s' is the next state, and obtaining a data set D for training a state transition model of the real environmentreal={(s1,a1,s2),(s2,a2,s3),...,(sn-1,an-1,sn)}。
Using the current state-current action pair as a feature (feature) and the next state as a label (label), performing regression learning, and training a state transition model f in a real environmentα. By minimizing the transfer loss function:
Figure BDA0003035258540000031
and updating the model parameters.
Repeatedly, state s is input into flight strategy π in A3CθThe output is obtained as action a, and action a ' is obtained as f ' by using the action correction 'β(s,fα(s, a)), an action a' is performed in the simulator resulting in a new state s, and a prize value r. Until enough data sets were collected to train the A3C algorithm model: d {(s)1,a1,r1,s2),(s2,a2,r2,s3),...,(sn-1,an-1,rn-1,sn) Data set of the inverse transfer model f': dsim={(s1,s2,a1),(s2,s3,a2),...,(sn-1,sn,an-1)}。
In A3C reinforcement learning algorithm flight strategy training, a deep neural network is used as a flight strategy piθ(actor) and evaluation network (critic). In the actor-critic algorithm framework, commentsThe family is responsible for evaluating the actors, and the actors perform and perform skill improvement according to the evaluation. The A3C (Advantage operator-critic) algorithm is based on the operator-critic and is improved by two steps: (1) A3C uses an Advantage function (Advantage function) as critic, which reduces the variance of critic, thereby reducing the variance of strategy gradient and further stabilizing training; (2) the data samples are asynchronously collected and the actor and the critic are asynchronously updated by using a plurality of simulation environments, so that more samples can be collected in unit time, the network updating is performed, the training speed is increased, and the variance of the critic gradient of the actor is further reduced. The dominance function of A3C is: advantage (s, a) ═ Q (s, a) -Vφ(s) where Q is a function of the state action value, V is a value network, and φ is a neural network parameter of V,
Figure BDA0003035258540000032
is a constant for the discount factor.
Updating a strategic neural network (Actor) and a value neural network (criticic) based on an A3C algorithm based on the collected data set, defining a loss function of the value network in A3C as:
Figure BDA0003035258540000033
define the actor network pi in A3C networkθThe loss function of (d) is: j. the design is a squareπ(θ)=-E(s,a)~D[logπθ(a|s)(Q(s,a)-Vφ(s))]Training an optimization strategy by iteratively sampling data and optimizing the loss function, minimizing the loss function Jπ(theta) and Jv(phi), r (s, a) is a reward function given by the environment, phioldOld values stored for phi, pi being flight strategy piθFor short. .
Training reverse transfer model f 'of simulator environment based on regression learning algorithm'βThe neural network parameters are updated by minimizing the following loss function:
Figure BDA0003035258540000041
until the model converges or a maximum number of iterations is reached. The finally obtained flight strategy model is applied to the actual unmanned aerial vehicle, and the effect of the flight strategy model is observed.
Compared with the prior art, the invention has the following advantages:
1. the invention uses the deep reinforcement learning algorithm to enable the unmanned aerial vehicle to have the capability of autonomous flight in the complex and variable flight environment, and the realization is more efficient than the traditional control mode of manually writing rules.
2. According to the invention, by means of the migration learning algorithm based on the Action correction (grouped Action Transformation), the adverse effect on the flight strategy caused by the difference between the real environment and the simulator environment is reduced, and the strategy obtained by training in the simulator can be better used as the initial flight strategy of the real unmanned aerial vehicle.
3. The migration learning algorithm used by the invention can utilize the non-optimal flight data of the unmanned aerial vehicle in the real environment, so that the algorithm has higher robustness.
Drawings
FIG. 1 is an overall frame diagram of the present invention;
FIG. 2 is a training flow diagram of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
The unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning is used for acquiring flight data in a real environment and learning to obtain a state transition model of the real environment; and simultaneously training the flight strategy of the unmanned aerial vehicle and the reverse transfer model of the simulator environment in the simulator, and correcting the flight action to be executed in the simulator by using the transfer model of the real environment and the reverse transfer model of the simulation environment. Comprises the following steps:
the method comprises the following steps:
a simulation simulator is constructed based on the aerodynamic model, the unmanned aerial vehicle model and the flight scenario and flight mission that the unmanned aerial vehicle may encounter, and is visualized using the Unreal4 game engine. In the simulator, unmanned aerial vehicle is along with time lapse in the flight process, and the flight state of self can change, and various complicated barriers also can be constantly produced to the simulated environment. The process can be approximated as a Markov Decision Process (MDP) represented by a five-tuple < S, A, P, R, γ > where S is the state space, A is the motion space, P is the state transition probability, R is the single step reward derived from the environment, and γ is the discount factor for the accumulated reward.
Step two:
constructing and initializing a model of reinforcement learning A3C Algorithm, an inverse transfer model f 'of a simulator Environment'βAnd a transfer model f of the real environmentα. Wherein f'βFor the mapping of "Current State-Next State" to Current action, fαIs the mapping of the "current state-current action" pair to the next state.
Step three:
flight strategy pi using drone operator and simulatorθControlling the unmanned aerial vehicle, collecting flight data of the unmanned aerial vehicle in the real environment, extracting all triples (s, a, s '), wherein s is the current state, a is the current action, and s' is the next state, and obtaining a data set D for training a state transition model of the real environmentreal={(s1,a1,s2),(s2,a2,s3),...,(sn-1,an-1,sn)}。
Step four:
according to the data obtained in the third step, taking the current state-current action pair as a feature (feature) and the next state as a label (label), performing regression learning, and training the state transition model f of the real environmentα. By minimizing the loss function:
Figure BDA0003035258540000051
step five:
repeatedly, state s is input into flight strategy π in A3CθThe output is obtained as action a, and action a ' is obtained as f ' by using the action correction 'β(s,fα(s, a)), an action a' is performed in the simulator resulting in a new state s, and a prize value r. Until enough data sets were collected to train the A3C algorithm model:
D={(s1,a1,r1,s2),(s2,a2,r2,s3),...,(sn-1,an-1,rn-1,sn)}
and reverse transfer model f'αThe data set of (a):
Dsim={(s1,s2,a1),(s2,s3,a2),...,(sn-1,sn,an-1)}。
step six:
based on the data collected in step five, the strategic neural network (Actor) and the value neural network (Critic) are updated based on the A3C algorithm by minimizing the loss function J described belowπ(theta) and Jv(φ):
Jπ(θ)=-E(s,a)~D[logπθ(a|s)(Q(s,a)-Vφ(s))]
Figure BDA0003035258540000052
Wherein Q is a function of the state action value, a function of the V state value,
V(s)=E(s,a,s',r)~D[r(s,a)+Vφ(s')],
Q(s,a)=r(s,a)+Εs'~p(s,a)[Vφ(s')]
training reverse transfer model f 'of simulator environment based on regression learning algorithm'βBy minimizing the following loss function:
Figure BDA0003035258540000061
and repeating the third, fourth, fifth and sixth steps until the model converges or the maximum iteration times. The finally obtained flight strategy model is applied to the actual unmanned aerial vehicle, and the effect of the flight strategy model is observed. The overall algorithm flow is shown in algorithm 1 below.
Figure BDA0003035258540000062
Figure BDA0003035258540000071

Claims (8)

1. An unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning is characterized by comprising the following steps:
(1) creating an unmanned aerial vehicle simulator environment;
(2) construction of an environmental transfer model f based on deep learningαI.e. the mapping of "current state-current action" to the next state and randomly initializing the mapping;
(3) constructing a strengthened learning A3C algorithm and randomly initializing a flight strategy pi thereofθ
(4) Construction of environment reverse transfer model f 'based on deep learning'βI.e., the mapping of "current state-next state" to the current action, and randomly initializing the mapping;
(5) collecting unmanned aerial vehicle operator and flight strategy piθOperating the unmanned aerial vehicle to fly in a real environment to obtain flight data, namely track data formed by continuous 'state-action' pairs;
(6) updating an environmental transfer model f based on real flight dataα(ii) a (7) Using fαAnd f'βPerforming transfer learning based on action correction, correcting flight strategy piθObtaining a flight strategy pi ', and executing pi' in a simulator to obtain simulated flight data;
(8) updating flight strategy pi by using A3C algorithm based on simulated flight dataθUpdating environmental reverse transfer model f 'at the same time'β
Repeating (5) - (8) until strategy piθConverging; finally, the strategy pi is obtainedθAs an initial flight strategy for a real drone.
2. The unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning of claim 1, wherein a simulation simulator is constructed based on an aerodynamic model, an unmanned aerial vehicle model and an unmanned aerial vehicle encountering flight scene and flight mission, and is visualized by using a Unreal4 game engine; the simulation simulator comprises an unmanned aerial vehicle, a flight scene and a flight task, wherein in the simulation simulator, the flight state of the unmanned aerial vehicle changes along with the time lapse in the flight process, and the simulation environment can continuously generate various obstacles; the process is expressed by a Markov decision process, and is expressed by a quintuple < S, A, P, R, gamma, wherein S is a state space, A is an action space, P is a state transition probability, R is a single step reward obtained from the environment, and gamma is a discount factor of the accumulated reward.
3. The reinforcement learning and migration learning-based unmanned aerial vehicle autonomous flight training method according to claim 1, characterized in that a unmanned aerial vehicle operator and simulator flight strategy pi is usedθControlling the unmanned aerial vehicle, collecting flight data of the unmanned aerial vehicle in the real environment, extracting all triples (s, a, s '), wherein s is the current state, a is the current action, and s' is the next state, and obtaining a data set D for training a state transition model of the real environmentreal={(s1,a1,s2),(s2,a2,s3),...,(sn-1,an-1,sn)}。
4. The unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning of claim 1, wherein the state of the real environment is trained by performing regression learning with a current state-current action pair as a feature and a next state as a labelState transition model fαBy minimizing the transfer loss function:
Figure FDA0003035258530000021
and updating the neural network parameter alpha of the transfer model.
5. The unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning of claim 1, wherein state s is repeatedly input into flight strategy pi in A3CθThe output is obtained as action a, and action a ' is obtained as f ' by using the action correction 'β(s,fα(s, a)), an action a' is performed in the simulator resulting in a new state s, and a prize value r. Until enough data sets were collected to train the A3C algorithm model: d {(s)1,a1,r1,s2),(s2,a2,r2,s3),...,(sn-1,an-1,rn-1,sn) And reverse transfer model f'βThe data set of (a): dsim={(s1,s2,a1),(s2,s3,a2),...,(sn-1,sn,an-1)}。
6. The unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning of claim 1, wherein in the training of the flight strategy of the A3C reinforcement learning algorithm, a deep neural network is used as a network model of the flight strategy and an evaluation network; the A3C algorithm is based on the actor-critic with two improvements: (1) A3C uses the merit function as critic; (2) asynchronously collecting data samples and asynchronously updating an actor and a critic using a plurality of simulation environments; the dominance function of A3C is: advantage (s, a) ═ Q (s, a) -Vφ(s) where Q is a function of the state action value, V is the value network, phi is the neural network parameter of V,
Figure FDA0003035258530000022
gamma is 0 < gamma < 1, is a discount factor, and is a constant。
7. The unmanned aerial vehicle autonomous flight training method based on reinforcement learning and migration learning of claim 5, wherein based on the collected data set, the strategy neural network (Actor) and the value neural network (Critic) are updated based on the A3C algorithm, and a loss function of the value network in A3C is defined as:
Figure FDA0003035258530000023
define the actor network pi in A3C networkθThe loss function of (d) is: j. the design is a squareπ(θ)=-E(s,a)~D[logπθ(a|s)(Q(s,a)-Vφ(s))]Training an optimization strategy by iteratively sampling data and optimizing the loss function, minimizing the loss function Jπ(theta) and Jv(phi), where r (s, a) is a reward function given by the environment, phioldOld values stored for phi, pi being flight strategy piθFor short.
8. The unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning of claim 5, wherein an inverse transfer model f 'of a simulator environment is trained based on a regression learning algorithm'βUpdating the neural network parameter β by minimizing the following inverse transfer loss function:
Figure FDA0003035258530000031
until the model converges or the maximum number of iterations is reached; finally obtained flight strategy piθAnd the method is applied to the real unmanned aerial vehicle.
CN202110441572.3A 2021-04-23 2021-04-23 Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning Pending CN113281999A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110441572.3A CN113281999A (en) 2021-04-23 2021-04-23 Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110441572.3A CN113281999A (en) 2021-04-23 2021-04-23 Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning

Publications (1)

Publication Number Publication Date
CN113281999A true CN113281999A (en) 2021-08-20

Family

ID=77277239

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110441572.3A Pending CN113281999A (en) 2021-04-23 2021-04-23 Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning

Country Status (1)

Country Link
CN (1) CN113281999A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113885549A (en) * 2021-11-23 2022-01-04 江苏科技大学 Four-rotor attitude trajectory control method based on dimension cutting PPO algorithm
CN114290339A (en) * 2022-03-09 2022-04-08 南京大学 Robot reality migration system and method based on reinforcement learning and residual modeling
CN114609925A (en) * 2022-01-14 2022-06-10 中国科学院自动化研究所 Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish
WO2023142316A1 (en) * 2022-01-25 2023-08-03 南方科技大学 Flight decision generation method and apparatus, computer device, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948781A (en) * 2019-03-21 2019-06-28 中国人民解放军国防科技大学 Continuous action online learning control method and system for automatic driving vehicle
CN111695690A (en) * 2020-07-30 2020-09-22 航天欧华信息技术有限公司 Multi-agent confrontation decision-making method based on cooperative reinforcement learning and transfer learning
US20200372410A1 (en) * 2019-05-23 2020-11-26 Uber Technologies, Inc. Model based reinforcement learning based on generalized hidden parameter markov decision processes
CN112162564A (en) * 2020-09-25 2021-01-01 南京大学 Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm
CN112215328A (en) * 2020-10-29 2021-01-12 腾讯科技(深圳)有限公司 Training of intelligent agent, and action control method and device based on intelligent agent

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948781A (en) * 2019-03-21 2019-06-28 中国人民解放军国防科技大学 Continuous action online learning control method and system for automatic driving vehicle
US20200372410A1 (en) * 2019-05-23 2020-11-26 Uber Technologies, Inc. Model based reinforcement learning based on generalized hidden parameter markov decision processes
CN111695690A (en) * 2020-07-30 2020-09-22 航天欧华信息技术有限公司 Multi-agent confrontation decision-making method based on cooperative reinforcement learning and transfer learning
CN112162564A (en) * 2020-09-25 2021-01-01 南京大学 Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm
CN112215328A (en) * 2020-10-29 2021-01-12 腾讯科技(深圳)有限公司 Training of intelligent agent, and action control method and device based on intelligent agent

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ARMANDOVIEIRA等: "《深度学习商业应用开发指南 从对话机器人到医疗图像处理》", 31 August 2019, 北京航空航天大学出版社 *
郭宪: "仿生机器人运动步态控制:强化学习方法综述", 《仿生机器人运动步态控制:强化学习方法综述 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113885549A (en) * 2021-11-23 2022-01-04 江苏科技大学 Four-rotor attitude trajectory control method based on dimension cutting PPO algorithm
CN113885549B (en) * 2021-11-23 2023-11-21 江苏科技大学 Four-rotor gesture track control method based on dimension clipping PPO algorithm
CN114609925A (en) * 2022-01-14 2022-06-10 中国科学院自动化研究所 Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish
CN114609925B (en) * 2022-01-14 2022-12-06 中国科学院自动化研究所 Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish
WO2023142316A1 (en) * 2022-01-25 2023-08-03 南方科技大学 Flight decision generation method and apparatus, computer device, and storage medium
CN114290339A (en) * 2022-03-09 2022-04-08 南京大学 Robot reality migration system and method based on reinforcement learning and residual modeling
CN114290339B (en) * 2022-03-09 2022-06-21 南京大学 Robot realistic migration method based on reinforcement learning and residual modeling

Similar Documents

Publication Publication Date Title
Chebotar et al. Closing the sim-to-real loop: Adapting simulation randomization with real world experience
CN113281999A (en) Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning
Lopes et al. Intelligent control of a quadrotor with proximal policy optimization reinforcement learning
CN111260027B (en) Intelligent agent automatic decision-making method based on reinforcement learning
Li et al. Learning unmanned aerial vehicle control for autonomous target following
CN110442129B (en) Control method and system for multi-agent formation
Qi et al. Towards latent space based manipulation of elastic rods using autoencoder models and robust centerline extractions
CN111260026B (en) Navigation migration method based on meta reinforcement learning
CN109940614B (en) Mechanical arm multi-scene rapid motion planning method integrating memory mechanism
CN113467515B (en) Unmanned aerial vehicle flight control method based on virtual environment simulation reconstruction and reinforcement learning
CN112605973A (en) Robot motor skill learning method and system
Celik et al. Specializing versatile skill libraries using local mixture of experts
Hussein et al. Deep reward shaping from demonstrations
CN115990891B (en) Robot reinforcement learning assembly method based on visual teaching and virtual-actual migration
Bai et al. Variational dynamic for self-supervised exploration in deep reinforcement learning
Xu et al. Learning strategy for continuous robot visual control: A multi-objective perspective
Chen et al. An overview of robust reinforcement learning
Luo et al. Balance between efficient and effective learning: Dense2sparse reward shaping for robot manipulation with environment uncertainty
Wang et al. RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback
Li et al. Prudent policy gradient with auxiliary actor in multi-degree-of-freedom robotic tasks
CN110990769B (en) Attitude migration algorithm system suitable for multi-degree-of-freedom robot
CN111582299B (en) Self-adaptive regularization optimization processing method for image deep learning model identification
Yu et al. Adaptively shaping reinforcement learning agents via human reward
Hong et al. Dynamics-aware metric embedding: Metric learning in a latent space for visual planning
Zheng et al. Uncalibrated visual servo system based on Kalman filter optimized by improved STOA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210820

RJ01 Rejection of invention patent application after publication