CN113885576A - Unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning - Google Patents

Unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning Download PDF

Info

Publication number
CN113885576A
CN113885576A CN202111267805.9A CN202111267805A CN113885576A CN 113885576 A CN113885576 A CN 113885576A CN 202111267805 A CN202111267805 A CN 202111267805A CN 113885576 A CN113885576 A CN 113885576A
Authority
CN
China
Prior art keywords
plane
environment
unmanned aerial
aerial vehicle
formation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111267805.9A
Other languages
Chinese (zh)
Inventor
赵启
阴浩博
曹红波
甄子洋
龚华军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202111267805.9A priority Critical patent/CN113885576A/en
Publication of CN113885576A publication Critical patent/CN113885576A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • G05D1/104Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to the technical field of multi-agent control, and particularly discloses an unmanned aerial vehicle formation control method based on deep reinforcement learning. Mainly disclosed an unmanned aerial vehicle formation environment based on deep reinforcement learning and an unmanned aerial vehicle formation controller design based on dual Q learning, its characteristics include the following steps: 1) establishing a relative kinematics model of unmanned aerial vehicle formation according to a kinematics equation of a long plane and a wing plane and a small disturbance principle; 2) establishing an unmanned aerial vehicle formation environment which accords with actual conditions, wherein the unmanned aerial vehicle formation environment comprises a state space, a wing plane action library (comprising two actions of speed and course), instruction conversion, a reward function and an end condition, and realizing that the environment can be transplanted to other algorithm verification; 3) a formation controller based on double Q learning is designed, the controller controls the speed and the course at the same time, and the wing plane is enabled to follow a captain plane and maintain the desired formation. In practical application, a corresponding wing plane command can be formed according to the self characteristics of the unmanned aerial vehicle so as to meet the requirement of precise formation control of the unmanned aerial vehicle.

Description

Unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of multi-reinforcement learning control, in particular to an unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning.
Background
The unmanned aerial vehicle as an unmanned aerial vehicle can complete a preset task only by remote wireless control or a control program set in advance. Because of its advantages such as low cost, high flexibility and mobility, unmanned aerial vehicles have been widely used in civilian and military applications. However, with the increasing complexity of the environment and tasks, the performance of a single unmanned aerial vehicle cannot meet the actual use requirements, and the unmanned aerial vehicle formation formed by multiple unmanned aerial vehicles has the advantages of the single unmanned aerial vehicle, and has the characteristics of wide area range, high investigation and attack success rate and the like. Formation of drones is becoming the primary vehicle for performing tasks.
However, currently, common modern control methods usually require an accurate model to design a controller to realize unmanned aerial vehicle formation, and the difficulty in accurately modeling the system is very high in practical situations, and besides, the application range of the control methods is also limited by influences of sensor errors, environmental disturbances and the like, so that an intensified learning method is introduced to realize intelligent control of unmanned aerial vehicle formation.
In the reinforcement learning algorithm, the dual Q learning algorithm has very wide application in the fields of track planning, cooperative decision, single machine control and the like at present by virtue of the advantages of simplicity, easiness in use, good convergence and the like.
Disclosure of Invention
The invention provides unmanned aerial vehicle formation environment construction and a formation controller design based on deep reinforcement learning, so that a wing plane can learn speed to follow a leader and maintain a desired formation distance.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses an unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning, which comprises the following steps:
step 1, supposing that the same speed first-order speed retainer and second-order course retainer automatic pilot models are adopted by the bureaucratic planes, and then the relative kinematics models of the unmanned aerial vehicle formation are obtained through linearization according to the small disturbance principle;
step 2, designing a state space S, a wing plane action library a, an instruction conversion function and a reward function of a formation environment;
step 3, establishing a formation controller based on double Q learning by using the formation environment in the step 2, and training the designed environment; the input is the state S of establishing environment, the output is wing plane action, and then the action output by the controller is converted into a specific command and then is input to the wing plane.
Further, in step 1, the relative kinematics model of the formation of the unmanned aerial vehicle has the following specific expression:
Figure BDA0003327503680000021
in the formula (1), an inertial north-east-ground is taken as a basic coordinate system, any point on the ground is taken as an origin, the direction pointing to the north pole is an Ox axis, the direction pointing to the east is perpendicular to the Ox axis, and the direction pointing to the east is an Oy axis; l and F respectively represent a pilot model with a first-order speed retainer and a second-order course retainer which are the same; tau isvRepresents a velocity time constant; tau isψaψbRepresenting a course time constant; v represents drone speed; psi represents the drone heading angle; v. ofLc,vFcRespectively representing the speed commands of a long plane and a bureaucratic plane; psiLcFcRespectively representing the course commands of the Youji and the Liao plane; x represents the distance between a long plane and a bureaucratic plane in the x direction; y represents the distance from a longicorn to a bureaucratic machine in the y direction; a is arctan (y)0/x0),x0,y0The distance between the fixed plane and the wing plane is the x and y direction.
Further, in step 2, the state space S of the design environment, the bureaucratic action library a, the command conversion and the reward function. The method specifically comprises the following steps:
step 2.1, the y-direction distance between the long plane and the wing plane, the error between the actual distance and the expected distance, the relative speed and the integral thereof, and the relative course angle and the integral thereof are selected as a joint state space S, and the corresponding expression is as follows:
Figure BDA0003327503680000022
in the formula (2), the reaction mixture is,ev=vL-vFthe relative speed of a long plane and a bureaucratic plane; e.g. of the typeψ=ψLFIs the relative course angle of a long plane and a bureaucratic plane; e.g. of the typey=yd-y is the error of the desired y-direction distance from the actual y-direction distance; y isdA desired y-direction distance;
step 2.2, a bureaucratic action library a is established; wherein the wing-plane action library a comprises wing-plane speed actions a1And wing plane course action a2Wing plane velocity action a1Comprising deceleration, uniform speed and acceleration, wing plane course action a2Including left yaw, constant heading, and right yaw.
Establishing a bureaucratic action library a with the expression as follows:
Figure BDA0003327503680000031
and 2.3, converting a design command, converting the action of a bureaucratic machine in the formula (3) into a speed and course command and adding amplitude limitation.
Figure BDA0003327503680000032
Figure BDA0003327503680000033
Equations (4) and (5) show the command conversion in different actions, vFRepresenting the current speed of a wing plane, different actions a1The lower corresponding speed command is vd;ψFA current course angle of a wing plane, different actions a2The lower corresponding course angle instruction is psid;[vmin,vmax]Represents a range of bureaucratic velocities; [ -psi [ -phi ]maxmax]Representing a range of wing aircraft course angles;
step 2.4, designing a reward function r
Figure BDA0003327503680000041
In the formula (6)
Figure BDA0003327503680000042
The speed instruction at the last moment;
Figure BDA0003327503680000043
the course angle instruction at the last moment; t isSIs the sampling time; t is the proceeding time; and designing an environment ending condition:
Figure BDA0003327503680000044
in the formula [ ymin,ymax]Is the formation sets the y-direction minimum and maximum distances.
Further, the formation environment in the step 2 is used in the step 3, a formation controller is established based on double Q learning, and the designed environment is trained; the method specifically comprises the following steps:
the controller comprises a memory base and a neural network model, wherein the memory base is used for storing interactive information, the input of the neural network model is a state space S for establishing an environment, and the output of the neural network model is a wing plane action;
the neural network model comprises two networks with the same structure and different parameters, namely a main network and a target network, wherein the parameters are theta and theta respectively-(ii) a The main network outputs all action estimation values Q, and the target network outputs a target value y;
in each training, initializing the environment state of formation to obtain state S, inputting it into main network, outputting bureaucratic actions and inputting them into environment, and converting the actions output by controller into specific commands v by means of expressions (4) and (5)ddThen input into the wing plane to obtain new state S _ and instant reward r of wing plane, and will<S,a,r,S_>Storing in a memory bank;
when the memory bank is full, extracting a certain amount of samples to train the neural network model, wherein the expression of the neural network target value and the loss function is as follows:
y=r+γQ(S_,argmaxQ(S,a|θ)|θ-) (8)
L(θ)=E[(y-Q(S,a|θ))2] (9)
equation (8) represents the target value, equation (9) represents the loss function, and γ represents the discount rate. The expression for training the neural network parameters by using the gradient descent method is as follows:
Figure BDA0003327503680000051
θ-←θ (11)
equation (10) (11) represents neural network parameter update, equation (10) represents updating the main network parameter according to the gradient descent method, and a is the learning rate; equation (11) represents that the master network parameters are copied to the target network after a certain number of steps; the above process is repeated until the training is finished.
The invention has the beneficial effects that: the invention designs the unmanned aerial vehicle formation flying environment, designs a formation controller based on deep reinforcement learning, and the controller can enable the wing plane to independently learn the optimal strategy, and output the optimal action through the controller, so that the wing plane can follow the long plane at speed and keep the expected distance. The method can effectively improve the intelligence of the unmanned aerial vehicle, eliminate the distance error and the speed error of formation, and enable the formation to have good formation retention capability and good portability.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a control scheme of the process of the present invention.
Figure 2 is a graph of the velocity of the long-bureaucratic wing in an example of the present invention.
Fig. 3 is a graph of the distance in the y direction of a long-bureaucron-like plane in an example of embodiment of the invention.
Detailed Description
In order that those skilled in the art will better understand the technical solutions of the present invention, the present invention will be further described in detail with reference to the following detailed description.
The invention discloses an unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning, wherein a control structure diagram is shown in figure 1, and the method comprises the following steps:
step 1, assuming that the bureaucratic machines all adopt the same speed first-order speed keeper and second-order course keeper autopilot models, the expression is:
Figure BDA0003327503680000061
wherein v represents drone velocity; v. ofcRepresenting a speed command; tau isvRepresenting the velocity time constant, psi representing the drone heading angle; psicRepresenting a heading angle command; tau isψaτψbIndicating the heading time constant. And then, carrying out linearization according to a small disturbance principle to obtain a relative kinematics model of the unmanned aerial vehicle formation, wherein the corresponding expression is as follows:
Figure BDA0003327503680000062
in the formula (2), an inertial north-east-ground is used as a basic coordinate system, any point on the ground is used as an origin, the direction pointing to the north pole is an Ox axis, the direction pointing to the east on the ground is perpendicular to the Ox axis, and the direction pointing to the east is an Oy axis. L and F respectively represent a pilot model with a first-order speed retainer and a second-order course retainer which are the same; tau isvRepresents a velocity time constant; tau isψaψbRepresenting a course time constant; v represents drone speed; psi represents the drone heading angle; v. ofLc,vFcRespectively representing the speed commands of a long plane and a bureaucratic plane; psiLcFcRespectively representing the course commands of the Youji and the Liao plane; x represents the distance between a long plane and a bureaucratic plane in the x direction; y represents the distance from a longicorn to a bureaucratic machine in the y direction; a is arctan (y)0/x0),x0,y0The distance between the fixed plane and the wing plane is the x and y direction.
And 2, designing a state space of the environment, a wing plane action library, instruction conversion, a reward function and an ending condition.
Step 2.1, the y-direction distance between the long plane and the wing plane, the error between the actual distance and the expected distance, the relative speed and the integral thereof, and the relative course angle and the integral thereof are selected as a joint state space S, and the corresponding expression is as follows:
Figure BDA0003327503680000071
in the formula (3), ev=vL-vFThe relative speed of a long plane and a bureaucratic plane; e.g. of the typeψ=ψLFIs the relative course angle of a long plane and a bureaucratic plane; e.g. of the typey=yd-y is the error of the desired y-direction distance from the actual y-direction distance; y isdIs the desired y-direction distance.
Step 2.2, build up of bureaucratic action library a ═ (a)1,a2) The expression is:
Figure BDA0003327503680000072
step 2.3, converting the design instruction, converting the action in the formula (4) into a speed and course instruction and adding amplitude limitation, wherein the expression is as follows:
Figure BDA0003327503680000073
Figure BDA0003327503680000074
equations (5) and (6) show the command conversion in different actions, vFRepresenting the current speed of a wing plane, different actions a1The lower corresponding speed command is vd;ψFA current course angle of a wing plane, different actions a2The lower corresponding course angle instruction is psid。[vmin,vmax]Represents a range of bureaucratic velocities; [ -psi [ -phi ]maxmax]Representing a range of wing aircraft course angles.
Step 2.4, designing a reward function r
Figure BDA0003327503680000081
In the formula (7)
Figure BDA0003327503680000082
The speed instruction at the last moment;
Figure BDA0003327503680000083
the course angle instruction at the last moment; t isSIs the sampling time; t is the running time. And designing an environment ending condition:
Figure BDA0003327503680000084
in the formula [ ymin,ymax]Is the formation sets the y-direction minimum and maximum distances.
And 3, establishing a formation controller based on a dual Q learning algorithm by using the formation environment in the step 2, and learning the designed environment. The controller comprises a memory base and a neural network model, wherein the memory base is used for storing interactive information, the input of the neural network model is a state space S for establishing an environment, and the output of the neural network model is a wing-like motor action.
The neural network model comprises two networks with the same structure and different parameters, namely a main network and a target network, wherein the parameters are theta and theta respectively-. The master network outputs all the action estimation values Q, and the target network outputs a target value y.
The specific process is as follows: in each training, initializing the environment state of formation to obtain state S, inputting it into main network, outputting bureaucratic actions and inputting them into environment, and converting the actions output by controller into specific commands v by means of expressions (5) and (6)ddThen input into the wing plane to obtain the new state S _ and reward r of the wing plane, and will<S, a, r, S _ SaveStored in a memory bank.
When the memory bank is full, extracting a certain amount of samples to train the neural network model, wherein the expression of the neural network target value and the loss function is as follows:
y=r+γQ(S_,argmaxQ(S,a|θ)|θ-) (9)
L(θ)=E[(y-Q(S,a|θ))2] (10)
equation (9) represents the target value, equation (10) represents the loss function, and the expression for training the neural network parameters by using the gradient descent method is as follows:
Figure BDA0003327503680000091
θ-←θ (12)
equation (11) and (12) represent neural network parameter updates, equation (11) represents updating the master network parameters according to the gradient descent method, and a is the learning rate. Equation (12) represents that the master network parameters are copied to the target network after a certain number of steps. The above process is repeated until the training is finished.
The numerical simulation verification of the embodiment shows that the range of the bureaucratic wing aircraft speed is set as [30,70] m/s; the heading angle range is [ -20,20] °. At the initial moment, the velocities of the fans and the wing fans are both 50m/s, the heading angle is 0 DEG, keeping the distance in the x and y directions at 500m for forward flight. The long machine instruction and the expected formation distance are changed, and the result is obtained as shown in fig. 2 and fig. 3.
From the above simulation results it is seen that when the farm aircraft speed changes, the wing aircraft can follow at a good speed and the speed error can be kept substantially at 0.1, which is in accordance with the bonus function design. At the same time, the wing plane changes the spacing in the y direction by adjusting the course angle, the wing plane can track to the desired distance of 250m from the initial spacing of 500m, and after the command is changed, the wing plane can track to 300 m. The wing plane can independently learn the optimal strategy without prior knowledge, and the result shows the effectiveness of the designed dual Q learning controller.
The invention establishes the unmanned aerial vehicle formation motion environment according to the actual flight condition, the environment is consistent with the actual condition, and the environment can be directly transplanted to other algorithms for training and learning; the invention designs an unmanned aerial vehicle formation flying environment, and designs a formation controller based on dual Q learning, wherein the controller simultaneously controls the speed and the course, controls a wing plane to track a farm plane and maintains a desired distance.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (4)

1. An unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning is characterized by comprising the following steps:
step 1, supposing that the same speed first-order speed retainer and second-order course retainer automatic pilot models are adopted by the bureaucratic planes, and then the relative kinematics models of the unmanned aerial vehicle formation are obtained through linearization according to the small disturbance principle;
step 2, designing a state space S, a wing plane action library a, an instruction conversion function and a reward function of a formation environment;
step 3, establishing a formation controller based on double Q learning by using the formation environment in the step 2, and training the designed environment; the input is the state S of establishing environment, the output is wing plane action, and then the action output by the controller is converted into a specific command and then is input to the wing plane.
2. The method for establishing the unmanned aerial vehicle formation relative kinematics model according to claim 1, wherein the specific expression is as follows:
Figure FDA0003327503670000011
in the formula (1), an inertial north-east-ground is taken as a basic coordinate system, any point on the ground is taken as an origin, the direction pointing to the north pole is an Ox axis, the direction pointing to the east is perpendicular to the Ox axis, and the direction pointing to the east is an Oy axis; l and F respectively represent a long plane and a wing plane, bothA driver model with the same first-order speed retainer and second-order course retainer is adopted; tau isvRepresents a velocity time constant; tau isψaψbRepresenting a course time constant; v represents drone speed; psi represents the drone heading angle; v. ofLc,vFcRespectively representing the speed commands of a long plane and a bureaucratic plane; psiLcFcRespectively representing the course commands of the Youji and the Liao plane; x represents the distance between a long plane and a bureaucratic plane in the x direction; y represents the distance from a longicorn to a bureaucratic machine in the y direction; a is arctan (y)0/x0),x0,y0The distance between the fixed plane and the wing plane is the x and y direction.
3. A state space S, a bureaucratic action library a, command conversions, reward functions r and end conditions of a convoy relative kinematic environment design environment built as claimed in claim 2. The corresponding expression is:
Figure FDA0003327503670000021
in the formula (2), ev=vL-vFThe relative speed of a long plane and a bureaucratic plane; e.g. of the typeψ=ψLFIs the relative course angle of a long plane and a bureaucratic plane; e.g. of the typey=yd-y is the error of the desired y-direction distance from the actual y-direction distance; y isdA desired y-direction distance;
establishing a bureau motor action library a ═ (a)1,a2) The expression is as follows:
Figure FDA0003327503670000022
a design command is converted, a wing plane action in a formula (3) is converted into a speed and course command, and amplitude limiting is added;
Figure FDA0003327503670000023
Figure FDA0003327503670000024
equations (4) and (5) show the command conversion in different actions, vFRepresenting the current speed of a wing plane, different actions a1The lower corresponding speed command is vd;ψFA current course angle of a wing plane, different actions a2The lower corresponding course angle instruction is psid;[vmin,vmax]Represents a range of bureaucratic velocities; [ -psi [ -phi ]maxmax]Representing a range of wing aircraft course angles;
designing a reward function r:
Figure FDA0003327503670000031
in the formula (6)
Figure FDA0003327503670000032
The speed instruction at the last moment;
Figure FDA0003327503670000033
the course angle instruction at the last moment; t isSIs the sampling time; t is the running time.
Design environment end conditions:
Figure FDA0003327503670000034
in the formula [ ymin,ymax]Is the formation sets the y-direction minimum and maximum distances.
4. The unmanned aerial vehicle formation environment establishment method based on deep reinforcement learning of claim 3, wherein a formation controller is established based on dual Q learning, and a designed environment is trained. The method specifically comprises the following steps:
the controller comprises a memory base and a neural network model, wherein the memory base is used for storing interactive information, the input of the neural network model is a state space S for establishing an environment, and the output of the neural network model is a wing plane action;
the neural network model comprises two networks with the same structure and different parameters, namely a main network and a target network, wherein the parameters are theta and theta respectively-(ii) a The main network outputs all action estimation values Q, and the target network outputs a target value y;
in each training, initializing the environment state of formation to obtain state S, inputting it into main network, outputting bureaucratic actions and inputting them into environment, and converting the actions output by controller into specific commands v by means of expressions (4) and (5)ddThen input into the wing plane to obtain the new state S _ and reward r of the wing plane, and will<S,a,r,S_>Storing in a memory bank;
when the memory bank is full, extracting a certain amount of samples to train the neural network model, wherein the expression of the neural network target value and the loss function is as follows:
y=r+γQ(S_,argmaxQ(S,a|θ)|θ-) (8)
L(θ)=E[(y-Q(S,a|θ))2] (9)
equation (8) represents the target value, equation (9) represents the loss function, where γ represents the discount rate, and the neural network parameters are trained by the gradient descent method, where the expression is:
Figure FDA0003327503670000041
θ-←θ (11)
equation (10) (11) represents neural network parameter update, equation (10) represents updating the main network parameter according to the gradient descent method, and a is the learning rate; equation (11) represents that the master network parameters are copied to the target network after a certain number of steps; the above process is repeated until the training is finished.
CN202111267805.9A 2021-10-29 2021-10-29 Unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning Pending CN113885576A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111267805.9A CN113885576A (en) 2021-10-29 2021-10-29 Unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111267805.9A CN113885576A (en) 2021-10-29 2021-10-29 Unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN113885576A true CN113885576A (en) 2022-01-04

Family

ID=79014237

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111267805.9A Pending CN113885576A (en) 2021-10-29 2021-10-29 Unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113885576A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331548A (en) * 2014-10-24 2015-02-04 中国人民解放军国防科学技术大学 Method for planning flight action of unmanned aerial vehicle based on workflow
CN110007688A (en) * 2019-04-25 2019-07-12 西安电子科技大学 A kind of cluster distributed formation method of unmanned plane based on intensified learning
CN110502034A (en) * 2019-09-04 2019-11-26 中国人民解放军国防科技大学 Fixed-wing unmanned aerial vehicle cluster control method based on deep reinforcement learning
CN110502033A (en) * 2019-09-04 2019-11-26 中国人民解放军国防科技大学 Fixed-wing unmanned aerial vehicle cluster control method based on reinforcement learning
CN111077909A (en) * 2019-12-31 2020-04-28 北京理工大学 Novel unmanned aerial vehicle self-group self-consistent optimization control method based on visual information
US20200156243A1 (en) * 2018-11-21 2020-05-21 Amazon Technologies, Inc. Robotics application simulation management
CN111240353A (en) * 2020-01-07 2020-06-05 南京航空航天大学 Unmanned aerial vehicle collaborative air combat decision method based on genetic fuzzy tree
CN111857184A (en) * 2020-07-31 2020-10-30 中国人民解放军国防科技大学 Fixed-wing unmanned aerial vehicle cluster control collision avoidance method and device based on deep reinforcement learning
CN111880563A (en) * 2020-07-17 2020-11-03 西北工业大学 Multi-unmanned aerial vehicle task decision method based on MADDPG
CN111880565A (en) * 2020-07-22 2020-11-03 电子科技大学 Q-Learning-based cluster cooperative countermeasure method
CN111880567A (en) * 2020-07-31 2020-11-03 中国人民解放军国防科技大学 Fixed-wing unmanned aerial vehicle formation coordination control method and device based on deep reinforcement learning
CN112162564A (en) * 2020-09-25 2021-01-01 南京大学 Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331548A (en) * 2014-10-24 2015-02-04 中国人民解放军国防科学技术大学 Method for planning flight action of unmanned aerial vehicle based on workflow
US20200156243A1 (en) * 2018-11-21 2020-05-21 Amazon Technologies, Inc. Robotics application simulation management
CN110007688A (en) * 2019-04-25 2019-07-12 西安电子科技大学 A kind of cluster distributed formation method of unmanned plane based on intensified learning
CN110502034A (en) * 2019-09-04 2019-11-26 中国人民解放军国防科技大学 Fixed-wing unmanned aerial vehicle cluster control method based on deep reinforcement learning
CN110502033A (en) * 2019-09-04 2019-11-26 中国人民解放军国防科技大学 Fixed-wing unmanned aerial vehicle cluster control method based on reinforcement learning
CN111077909A (en) * 2019-12-31 2020-04-28 北京理工大学 Novel unmanned aerial vehicle self-group self-consistent optimization control method based on visual information
CN111240353A (en) * 2020-01-07 2020-06-05 南京航空航天大学 Unmanned aerial vehicle collaborative air combat decision method based on genetic fuzzy tree
CN111880563A (en) * 2020-07-17 2020-11-03 西北工业大学 Multi-unmanned aerial vehicle task decision method based on MADDPG
CN111880565A (en) * 2020-07-22 2020-11-03 电子科技大学 Q-Learning-based cluster cooperative countermeasure method
CN111857184A (en) * 2020-07-31 2020-10-30 中国人民解放军国防科技大学 Fixed-wing unmanned aerial vehicle cluster control collision avoidance method and device based on deep reinforcement learning
CN111880567A (en) * 2020-07-31 2020-11-03 中国人民解放军国防科技大学 Fixed-wing unmanned aerial vehicle formation coordination control method and device based on deep reinforcement learning
CN112162564A (en) * 2020-09-25 2021-01-01 南京大学 Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
HADO VAN HASSELT等: "Deep Reinforcement Learning with Double Q-Learning", 《PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-16)》 *
HADO VAN HASSELT等: "Deep Reinforcement Learning with Double Q-Learning", 《PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-16)》, 17 February 2016 (2016-02-17), pages 2094 *
HUAN HU等: "Proximal policy optimization with an integral compensator for quadrotor control", 《FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING》 *
HUAN HU等: "Proximal policy optimization with an integral compensator for quadrotor control", 《FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING》, vol. 21, no. 5, 31 May 2020 (2020-05-31), pages 777, XP037144867, DOI: 10.1631/FITEE.1900641 *
ZEZHI SUI等: "Formation Control with Collision Avoidance through Deep Reinforcement Learning", 《IJCNN 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS》 *
ZEZHI SUI等: "Formation Control with Collision Avoidance through Deep Reinforcement Learning", 《IJCNN 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS》, 19 July 2019 (2019-07-19), pages 1, XP033621746, DOI: 10.1109/IJCNN.2019.8851906 *
王晓燕等: "无人机三维编队飞行的鲁棒H∞ 控制器设计", 《控制与决策》 *
王晓燕等: "无人机三维编队飞行的鲁棒H∞ 控制器设计", 《控制与决策》, vol. 27, no. 12, 31 December 2012 (2012-12-31), pages 1907 *
相晓嘉等: "基于深度强化学习的固定翼无人机编队协调控制方法", 《航空学报》 *
相晓嘉等: "基于深度强化学习的固定翼无人机编队协调控制方法", 《航空学报》, vol. 42, no. 4, 25 April 2021 (2021-04-25), pages 524009 - 1 *

Similar Documents

Publication Publication Date Title
Rubí et al. A survey of path following control strategies for UAVs focused on quadrotors
CN110502033B (en) Fixed-wing unmanned aerial vehicle cluster control method based on reinforcement learning
Chen et al. Path planning for multi-UAV formation
CN112162564B (en) Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm
CN106774400B (en) Unmanned aerial vehicle three-dimensional track guidance method based on inverse dynamics
CN110531786B (en) Unmanned aerial vehicle maneuvering strategy autonomous generation method based on DQN
WO2006113173A1 (en) Decentralized maneuver control in heterogeneous autonomous vehicle networks
CN112198886B (en) Unmanned aerial vehicle control method for tracking maneuvering target
Nie et al. Three-dimensional path-following control of a robotic airship with reinforcement learning
Stevšić et al. Sample efficient learning of path following and obstacle avoidance behavior for quadrotors
CN112684781A (en) Multi-agent distributed model prediction control method and system
CN114089776B (en) Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
CN116225055A (en) Unmanned aerial vehicle autonomous flight path planning algorithm based on state decomposition in complex environment
CN114935943A (en) Unmanned aerial vehicle and unmanned vehicle cluster formation tracking control method and system
CN113900440B (en) Unmanned aerial vehicle control law design method and device and readable storage medium
Yang et al. A decentralised control strategy for formation flight of unmanned aerial vehicles
Montella et al. Reinforcement learning for autonomous dynamic soaring in shear winds
CN116954258A (en) Hierarchical control method and device for multi-four-rotor unmanned aerial vehicle formation under unknown disturbance
Duoxiu et al. Proximal policy optimization for multi-rotor UAV autonomous guidance, tracking and obstacle avoidance
CN115617039B (en) Event triggering-based distributed affine unmanned aerial vehicle formation controller construction method and unmanned aerial vehicle formation control method
Cordeiro et al. Non linear controller and path planner algorithm for an autonomous variable shape formation flight
CN113885576A (en) Unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning
Stingu et al. An approximate dynamic programming based controller for an underactuated 6DOF quadrotor
CN115237150A (en) Fixed-wing formation obstacle avoidance method
Verma et al. A novel trajectory tracking methodology using structured adaptive model inversion for uninhabited aerial vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination