CN113885576A - Unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning - Google Patents
Unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN113885576A CN113885576A CN202111267805.9A CN202111267805A CN113885576A CN 113885576 A CN113885576 A CN 113885576A CN 202111267805 A CN202111267805 A CN 202111267805A CN 113885576 A CN113885576 A CN 113885576A
- Authority
- CN
- China
- Prior art keywords
- plane
- environment
- unmanned aerial
- aerial vehicle
- formation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 54
- 238000000034 method Methods 0.000 title claims abstract description 22
- 230000002787 reinforcement Effects 0.000 title claims abstract description 12
- 230000006870 function Effects 0.000 claims abstract description 18
- 238000013461 design Methods 0.000 claims abstract description 14
- 238000006243 chemical reaction Methods 0.000 claims abstract description 9
- 230000009977 dual effect Effects 0.000 claims abstract description 6
- 230000014509 gene expression Effects 0.000 claims description 20
- 238000003062 neural network model Methods 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 11
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000011478 gradient descent method Methods 0.000 claims description 6
- 241001481710 Cerambycidae Species 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 abstract description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
- G05D1/104—Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention relates to the technical field of multi-agent control, and particularly discloses an unmanned aerial vehicle formation control method based on deep reinforcement learning. Mainly disclosed an unmanned aerial vehicle formation environment based on deep reinforcement learning and an unmanned aerial vehicle formation controller design based on dual Q learning, its characteristics include the following steps: 1) establishing a relative kinematics model of unmanned aerial vehicle formation according to a kinematics equation of a long plane and a wing plane and a small disturbance principle; 2) establishing an unmanned aerial vehicle formation environment which accords with actual conditions, wherein the unmanned aerial vehicle formation environment comprises a state space, a wing plane action library (comprising two actions of speed and course), instruction conversion, a reward function and an end condition, and realizing that the environment can be transplanted to other algorithm verification; 3) a formation controller based on double Q learning is designed, the controller controls the speed and the course at the same time, and the wing plane is enabled to follow a captain plane and maintain the desired formation. In practical application, a corresponding wing plane command can be formed according to the self characteristics of the unmanned aerial vehicle so as to meet the requirement of precise formation control of the unmanned aerial vehicle.
Description
Technical Field
The invention relates to the technical field of multi-reinforcement learning control, in particular to an unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning.
Background
The unmanned aerial vehicle as an unmanned aerial vehicle can complete a preset task only by remote wireless control or a control program set in advance. Because of its advantages such as low cost, high flexibility and mobility, unmanned aerial vehicles have been widely used in civilian and military applications. However, with the increasing complexity of the environment and tasks, the performance of a single unmanned aerial vehicle cannot meet the actual use requirements, and the unmanned aerial vehicle formation formed by multiple unmanned aerial vehicles has the advantages of the single unmanned aerial vehicle, and has the characteristics of wide area range, high investigation and attack success rate and the like. Formation of drones is becoming the primary vehicle for performing tasks.
However, currently, common modern control methods usually require an accurate model to design a controller to realize unmanned aerial vehicle formation, and the difficulty in accurately modeling the system is very high in practical situations, and besides, the application range of the control methods is also limited by influences of sensor errors, environmental disturbances and the like, so that an intensified learning method is introduced to realize intelligent control of unmanned aerial vehicle formation.
In the reinforcement learning algorithm, the dual Q learning algorithm has very wide application in the fields of track planning, cooperative decision, single machine control and the like at present by virtue of the advantages of simplicity, easiness in use, good convergence and the like.
Disclosure of Invention
The invention provides unmanned aerial vehicle formation environment construction and a formation controller design based on deep reinforcement learning, so that a wing plane can learn speed to follow a leader and maintain a desired formation distance.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses an unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning, which comprises the following steps:
step 1, supposing that the same speed first-order speed retainer and second-order course retainer automatic pilot models are adopted by the bureaucratic planes, and then the relative kinematics models of the unmanned aerial vehicle formation are obtained through linearization according to the small disturbance principle;
step 2, designing a state space S, a wing plane action library a, an instruction conversion function and a reward function of a formation environment;
step 3, establishing a formation controller based on double Q learning by using the formation environment in the step 2, and training the designed environment; the input is the state S of establishing environment, the output is wing plane action, and then the action output by the controller is converted into a specific command and then is input to the wing plane.
Further, in step 1, the relative kinematics model of the formation of the unmanned aerial vehicle has the following specific expression:
in the formula (1), an inertial north-east-ground is taken as a basic coordinate system, any point on the ground is taken as an origin, the direction pointing to the north pole is an Ox axis, the direction pointing to the east is perpendicular to the Ox axis, and the direction pointing to the east is an Oy axis; l and F respectively represent a pilot model with a first-order speed retainer and a second-order course retainer which are the same; tau isvRepresents a velocity time constant; tau isψa,τψbRepresenting a course time constant; v represents drone speed; psi represents the drone heading angle; v. ofLc,vFcRespectively representing the speed commands of a long plane and a bureaucratic plane; psiLc,ψFcRespectively representing the course commands of the Youji and the Liao plane; x represents the distance between a long plane and a bureaucratic plane in the x direction; y represents the distance from a longicorn to a bureaucratic machine in the y direction; a is arctan (y)0/x0),x0,y0The distance between the fixed plane and the wing plane is the x and y direction.
Further, in step 2, the state space S of the design environment, the bureaucratic action library a, the command conversion and the reward function. The method specifically comprises the following steps:
step 2.1, the y-direction distance between the long plane and the wing plane, the error between the actual distance and the expected distance, the relative speed and the integral thereof, and the relative course angle and the integral thereof are selected as a joint state space S, and the corresponding expression is as follows:
in the formula (2), the reaction mixture is,ev=vL-vFthe relative speed of a long plane and a bureaucratic plane; e.g. of the typeψ=ψL-ψFIs the relative course angle of a long plane and a bureaucratic plane; e.g. of the typey=yd-y is the error of the desired y-direction distance from the actual y-direction distance; y isdA desired y-direction distance;
step 2.2, a bureaucratic action library a is established; wherein the wing-plane action library a comprises wing-plane speed actions a1And wing plane course action a2Wing plane velocity action a1Comprising deceleration, uniform speed and acceleration, wing plane course action a2Including left yaw, constant heading, and right yaw.
Establishing a bureaucratic action library a with the expression as follows:
and 2.3, converting a design command, converting the action of a bureaucratic machine in the formula (3) into a speed and course command and adding amplitude limitation.
Equations (4) and (5) show the command conversion in different actions, vFRepresenting the current speed of a wing plane, different actions a1The lower corresponding speed command is vd;ψFA current course angle of a wing plane, different actions a2The lower corresponding course angle instruction is psid;[vmin,vmax]Represents a range of bureaucratic velocities; [ -psi [ -phi ]max,ψmax]Representing a range of wing aircraft course angles;
step 2.4, designing a reward function r
In the formula (6)The speed instruction at the last moment;the course angle instruction at the last moment; t isSIs the sampling time; t is the proceeding time; and designing an environment ending condition:
in the formula [ ymin,ymax]Is the formation sets the y-direction minimum and maximum distances.
Further, the formation environment in the step 2 is used in the step 3, a formation controller is established based on double Q learning, and the designed environment is trained; the method specifically comprises the following steps:
the controller comprises a memory base and a neural network model, wherein the memory base is used for storing interactive information, the input of the neural network model is a state space S for establishing an environment, and the output of the neural network model is a wing plane action;
the neural network model comprises two networks with the same structure and different parameters, namely a main network and a target network, wherein the parameters are theta and theta respectively-(ii) a The main network outputs all action estimation values Q, and the target network outputs a target value y;
in each training, initializing the environment state of formation to obtain state S, inputting it into main network, outputting bureaucratic actions and inputting them into environment, and converting the actions output by controller into specific commands v by means of expressions (4) and (5)d,ψdThen input into the wing plane to obtain new state S _ and instant reward r of wing plane, and will<S,a,r,S_>Storing in a memory bank;
when the memory bank is full, extracting a certain amount of samples to train the neural network model, wherein the expression of the neural network target value and the loss function is as follows:
y=r+γQ(S_,argmaxQ(S,a|θ)|θ-) (8)
L(θ)=E[(y-Q(S,a|θ))2] (9)
equation (8) represents the target value, equation (9) represents the loss function, and γ represents the discount rate. The expression for training the neural network parameters by using the gradient descent method is as follows:
θ-←θ (11)
equation (10) (11) represents neural network parameter update, equation (10) represents updating the main network parameter according to the gradient descent method, and a is the learning rate; equation (11) represents that the master network parameters are copied to the target network after a certain number of steps; the above process is repeated until the training is finished.
The invention has the beneficial effects that: the invention designs the unmanned aerial vehicle formation flying environment, designs a formation controller based on deep reinforcement learning, and the controller can enable the wing plane to independently learn the optimal strategy, and output the optimal action through the controller, so that the wing plane can follow the long plane at speed and keep the expected distance. The method can effectively improve the intelligence of the unmanned aerial vehicle, eliminate the distance error and the speed error of formation, and enable the formation to have good formation retention capability and good portability.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a control scheme of the process of the present invention.
Figure 2 is a graph of the velocity of the long-bureaucratic wing in an example of the present invention.
Fig. 3 is a graph of the distance in the y direction of a long-bureaucron-like plane in an example of embodiment of the invention.
Detailed Description
In order that those skilled in the art will better understand the technical solutions of the present invention, the present invention will be further described in detail with reference to the following detailed description.
The invention discloses an unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning, wherein a control structure diagram is shown in figure 1, and the method comprises the following steps:
step 1, assuming that the bureaucratic machines all adopt the same speed first-order speed keeper and second-order course keeper autopilot models, the expression is:
wherein v represents drone velocity; v. ofcRepresenting a speed command; tau isvRepresenting the velocity time constant, psi representing the drone heading angle; psicRepresenting a heading angle command; tau isψaτψbIndicating the heading time constant. And then, carrying out linearization according to a small disturbance principle to obtain a relative kinematics model of the unmanned aerial vehicle formation, wherein the corresponding expression is as follows:
in the formula (2), an inertial north-east-ground is used as a basic coordinate system, any point on the ground is used as an origin, the direction pointing to the north pole is an Ox axis, the direction pointing to the east on the ground is perpendicular to the Ox axis, and the direction pointing to the east is an Oy axis. L and F respectively represent a pilot model with a first-order speed retainer and a second-order course retainer which are the same; tau isvRepresents a velocity time constant; tau isψa,τψbRepresenting a course time constant; v represents drone speed; psi represents the drone heading angle; v. ofLc,vFcRespectively representing the speed commands of a long plane and a bureaucratic plane; psiLc,ψFcRespectively representing the course commands of the Youji and the Liao plane; x represents the distance between a long plane and a bureaucratic plane in the x direction; y represents the distance from a longicorn to a bureaucratic machine in the y direction; a is arctan (y)0/x0),x0,y0The distance between the fixed plane and the wing plane is the x and y direction.
And 2, designing a state space of the environment, a wing plane action library, instruction conversion, a reward function and an ending condition.
Step 2.1, the y-direction distance between the long plane and the wing plane, the error between the actual distance and the expected distance, the relative speed and the integral thereof, and the relative course angle and the integral thereof are selected as a joint state space S, and the corresponding expression is as follows:
in the formula (3), ev=vL-vFThe relative speed of a long plane and a bureaucratic plane; e.g. of the typeψ=ψL-ψFIs the relative course angle of a long plane and a bureaucratic plane; e.g. of the typey=yd-y is the error of the desired y-direction distance from the actual y-direction distance; y isdIs the desired y-direction distance.
Step 2.2, build up of bureaucratic action library a ═ (a)1,a2) The expression is:
step 2.3, converting the design instruction, converting the action in the formula (4) into a speed and course instruction and adding amplitude limitation, wherein the expression is as follows:
equations (5) and (6) show the command conversion in different actions, vFRepresenting the current speed of a wing plane, different actions a1The lower corresponding speed command is vd;ψFA current course angle of a wing plane, different actions a2The lower corresponding course angle instruction is psid。[vmin,vmax]Represents a range of bureaucratic velocities; [ -psi [ -phi ]max,ψmax]Representing a range of wing aircraft course angles.
Step 2.4, designing a reward function r
In the formula (7)The speed instruction at the last moment;the course angle instruction at the last moment; t isSIs the sampling time; t is the running time. And designing an environment ending condition:
in the formula [ ymin,ymax]Is the formation sets the y-direction minimum and maximum distances.
And 3, establishing a formation controller based on a dual Q learning algorithm by using the formation environment in the step 2, and learning the designed environment. The controller comprises a memory base and a neural network model, wherein the memory base is used for storing interactive information, the input of the neural network model is a state space S for establishing an environment, and the output of the neural network model is a wing-like motor action.
The neural network model comprises two networks with the same structure and different parameters, namely a main network and a target network, wherein the parameters are theta and theta respectively-. The master network outputs all the action estimation values Q, and the target network outputs a target value y.
The specific process is as follows: in each training, initializing the environment state of formation to obtain state S, inputting it into main network, outputting bureaucratic actions and inputting them into environment, and converting the actions output by controller into specific commands v by means of expressions (5) and (6)d,ψdThen input into the wing plane to obtain the new state S _ and reward r of the wing plane, and will<S, a, r, S _ SaveStored in a memory bank.
When the memory bank is full, extracting a certain amount of samples to train the neural network model, wherein the expression of the neural network target value and the loss function is as follows:
y=r+γQ(S_,argmaxQ(S,a|θ)|θ-) (9)
L(θ)=E[(y-Q(S,a|θ))2] (10)
equation (9) represents the target value, equation (10) represents the loss function, and the expression for training the neural network parameters by using the gradient descent method is as follows:
θ-←θ (12)
equation (11) and (12) represent neural network parameter updates, equation (11) represents updating the master network parameters according to the gradient descent method, and a is the learning rate. Equation (12) represents that the master network parameters are copied to the target network after a certain number of steps. The above process is repeated until the training is finished.
The numerical simulation verification of the embodiment shows that the range of the bureaucratic wing aircraft speed is set as [30,70] m/s; the heading angle range is [ -20,20] °. At the initial moment, the velocities of the fans and the wing fans are both 50m/s, the heading angle is 0 DEG, keeping the distance in the x and y directions at 500m for forward flight. The long machine instruction and the expected formation distance are changed, and the result is obtained as shown in fig. 2 and fig. 3.
From the above simulation results it is seen that when the farm aircraft speed changes, the wing aircraft can follow at a good speed and the speed error can be kept substantially at 0.1, which is in accordance with the bonus function design. At the same time, the wing plane changes the spacing in the y direction by adjusting the course angle, the wing plane can track to the desired distance of 250m from the initial spacing of 500m, and after the command is changed, the wing plane can track to 300 m. The wing plane can independently learn the optimal strategy without prior knowledge, and the result shows the effectiveness of the designed dual Q learning controller.
The invention establishes the unmanned aerial vehicle formation motion environment according to the actual flight condition, the environment is consistent with the actual condition, and the environment can be directly transplanted to other algorithms for training and learning; the invention designs an unmanned aerial vehicle formation flying environment, and designs a formation controller based on dual Q learning, wherein the controller simultaneously controls the speed and the course, controls a wing plane to track a farm plane and maintains a desired distance.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (4)
1. An unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning is characterized by comprising the following steps:
step 1, supposing that the same speed first-order speed retainer and second-order course retainer automatic pilot models are adopted by the bureaucratic planes, and then the relative kinematics models of the unmanned aerial vehicle formation are obtained through linearization according to the small disturbance principle;
step 2, designing a state space S, a wing plane action library a, an instruction conversion function and a reward function of a formation environment;
step 3, establishing a formation controller based on double Q learning by using the formation environment in the step 2, and training the designed environment; the input is the state S of establishing environment, the output is wing plane action, and then the action output by the controller is converted into a specific command and then is input to the wing plane.
2. The method for establishing the unmanned aerial vehicle formation relative kinematics model according to claim 1, wherein the specific expression is as follows:
in the formula (1), an inertial north-east-ground is taken as a basic coordinate system, any point on the ground is taken as an origin, the direction pointing to the north pole is an Ox axis, the direction pointing to the east is perpendicular to the Ox axis, and the direction pointing to the east is an Oy axis; l and F respectively represent a long plane and a wing plane, bothA driver model with the same first-order speed retainer and second-order course retainer is adopted; tau isvRepresents a velocity time constant; tau isψa,τψbRepresenting a course time constant; v represents drone speed; psi represents the drone heading angle; v. ofLc,vFcRespectively representing the speed commands of a long plane and a bureaucratic plane; psiLc,ψFcRespectively representing the course commands of the Youji and the Liao plane; x represents the distance between a long plane and a bureaucratic plane in the x direction; y represents the distance from a longicorn to a bureaucratic machine in the y direction; a is arctan (y)0/x0),x0,y0The distance between the fixed plane and the wing plane is the x and y direction.
3. A state space S, a bureaucratic action library a, command conversions, reward functions r and end conditions of a convoy relative kinematic environment design environment built as claimed in claim 2. The corresponding expression is:
in the formula (2), ev=vL-vFThe relative speed of a long plane and a bureaucratic plane; e.g. of the typeψ=ψL-ψFIs the relative course angle of a long plane and a bureaucratic plane; e.g. of the typey=yd-y is the error of the desired y-direction distance from the actual y-direction distance; y isdA desired y-direction distance;
establishing a bureau motor action library a ═ (a)1,a2) The expression is as follows:
a design command is converted, a wing plane action in a formula (3) is converted into a speed and course command, and amplitude limiting is added;
equations (4) and (5) show the command conversion in different actions, vFRepresenting the current speed of a wing plane, different actions a1The lower corresponding speed command is vd;ψFA current course angle of a wing plane, different actions a2The lower corresponding course angle instruction is psid;[vmin,vmax]Represents a range of bureaucratic velocities; [ -psi [ -phi ]max,ψmax]Representing a range of wing aircraft course angles;
designing a reward function r:
in the formula (6)The speed instruction at the last moment;the course angle instruction at the last moment; t isSIs the sampling time; t is the running time.
Design environment end conditions:
in the formula [ ymin,ymax]Is the formation sets the y-direction minimum and maximum distances.
4. The unmanned aerial vehicle formation environment establishment method based on deep reinforcement learning of claim 3, wherein a formation controller is established based on dual Q learning, and a designed environment is trained. The method specifically comprises the following steps:
the controller comprises a memory base and a neural network model, wherein the memory base is used for storing interactive information, the input of the neural network model is a state space S for establishing an environment, and the output of the neural network model is a wing plane action;
the neural network model comprises two networks with the same structure and different parameters, namely a main network and a target network, wherein the parameters are theta and theta respectively-(ii) a The main network outputs all action estimation values Q, and the target network outputs a target value y;
in each training, initializing the environment state of formation to obtain state S, inputting it into main network, outputting bureaucratic actions and inputting them into environment, and converting the actions output by controller into specific commands v by means of expressions (4) and (5)d,ψdThen input into the wing plane to obtain the new state S _ and reward r of the wing plane, and will<S,a,r,S_>Storing in a memory bank;
when the memory bank is full, extracting a certain amount of samples to train the neural network model, wherein the expression of the neural network target value and the loss function is as follows:
y=r+γQ(S_,argmaxQ(S,a|θ)|θ-) (8)
L(θ)=E[(y-Q(S,a|θ))2] (9)
equation (8) represents the target value, equation (9) represents the loss function, where γ represents the discount rate, and the neural network parameters are trained by the gradient descent method, where the expression is:
θ-←θ (11)
equation (10) (11) represents neural network parameter update, equation (10) represents updating the main network parameter according to the gradient descent method, and a is the learning rate; equation (11) represents that the master network parameters are copied to the target network after a certain number of steps; the above process is repeated until the training is finished.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111267805.9A CN113885576A (en) | 2021-10-29 | 2021-10-29 | Unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111267805.9A CN113885576A (en) | 2021-10-29 | 2021-10-29 | Unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113885576A true CN113885576A (en) | 2022-01-04 |
Family
ID=79014237
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111267805.9A Pending CN113885576A (en) | 2021-10-29 | 2021-10-29 | Unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113885576A (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104331548A (en) * | 2014-10-24 | 2015-02-04 | 中国人民解放军国防科学技术大学 | Method for planning flight action of unmanned aerial vehicle based on workflow |
CN110007688A (en) * | 2019-04-25 | 2019-07-12 | 西安电子科技大学 | A kind of cluster distributed formation method of unmanned plane based on intensified learning |
CN110502034A (en) * | 2019-09-04 | 2019-11-26 | 中国人民解放军国防科技大学 | Fixed-wing unmanned aerial vehicle cluster control method based on deep reinforcement learning |
CN110502033A (en) * | 2019-09-04 | 2019-11-26 | 中国人民解放军国防科技大学 | Fixed-wing unmanned aerial vehicle cluster control method based on reinforcement learning |
CN111077909A (en) * | 2019-12-31 | 2020-04-28 | 北京理工大学 | Novel unmanned aerial vehicle self-group self-consistent optimization control method based on visual information |
US20200156243A1 (en) * | 2018-11-21 | 2020-05-21 | Amazon Technologies, Inc. | Robotics application simulation management |
CN111240353A (en) * | 2020-01-07 | 2020-06-05 | 南京航空航天大学 | Unmanned aerial vehicle collaborative air combat decision method based on genetic fuzzy tree |
CN111857184A (en) * | 2020-07-31 | 2020-10-30 | 中国人民解放军国防科技大学 | Fixed-wing unmanned aerial vehicle cluster control collision avoidance method and device based on deep reinforcement learning |
CN111880563A (en) * | 2020-07-17 | 2020-11-03 | 西北工业大学 | Multi-unmanned aerial vehicle task decision method based on MADDPG |
CN111880565A (en) * | 2020-07-22 | 2020-11-03 | 电子科技大学 | Q-Learning-based cluster cooperative countermeasure method |
CN111880567A (en) * | 2020-07-31 | 2020-11-03 | 中国人民解放军国防科技大学 | Fixed-wing unmanned aerial vehicle formation coordination control method and device based on deep reinforcement learning |
CN112162564A (en) * | 2020-09-25 | 2021-01-01 | 南京大学 | Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm |
-
2021
- 2021-10-29 CN CN202111267805.9A patent/CN113885576A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104331548A (en) * | 2014-10-24 | 2015-02-04 | 中国人民解放军国防科学技术大学 | Method for planning flight action of unmanned aerial vehicle based on workflow |
US20200156243A1 (en) * | 2018-11-21 | 2020-05-21 | Amazon Technologies, Inc. | Robotics application simulation management |
CN110007688A (en) * | 2019-04-25 | 2019-07-12 | 西安电子科技大学 | A kind of cluster distributed formation method of unmanned plane based on intensified learning |
CN110502034A (en) * | 2019-09-04 | 2019-11-26 | 中国人民解放军国防科技大学 | Fixed-wing unmanned aerial vehicle cluster control method based on deep reinforcement learning |
CN110502033A (en) * | 2019-09-04 | 2019-11-26 | 中国人民解放军国防科技大学 | Fixed-wing unmanned aerial vehicle cluster control method based on reinforcement learning |
CN111077909A (en) * | 2019-12-31 | 2020-04-28 | 北京理工大学 | Novel unmanned aerial vehicle self-group self-consistent optimization control method based on visual information |
CN111240353A (en) * | 2020-01-07 | 2020-06-05 | 南京航空航天大学 | Unmanned aerial vehicle collaborative air combat decision method based on genetic fuzzy tree |
CN111880563A (en) * | 2020-07-17 | 2020-11-03 | 西北工业大学 | Multi-unmanned aerial vehicle task decision method based on MADDPG |
CN111880565A (en) * | 2020-07-22 | 2020-11-03 | 电子科技大学 | Q-Learning-based cluster cooperative countermeasure method |
CN111857184A (en) * | 2020-07-31 | 2020-10-30 | 中国人民解放军国防科技大学 | Fixed-wing unmanned aerial vehicle cluster control collision avoidance method and device based on deep reinforcement learning |
CN111880567A (en) * | 2020-07-31 | 2020-11-03 | 中国人民解放军国防科技大学 | Fixed-wing unmanned aerial vehicle formation coordination control method and device based on deep reinforcement learning |
CN112162564A (en) * | 2020-09-25 | 2021-01-01 | 南京大学 | Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm |
Non-Patent Citations (10)
Title |
---|
HADO VAN HASSELT等: "Deep Reinforcement Learning with Double Q-Learning", 《PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-16)》 * |
HADO VAN HASSELT等: "Deep Reinforcement Learning with Double Q-Learning", 《PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-16)》, 17 February 2016 (2016-02-17), pages 2094 * |
HUAN HU等: "Proximal policy optimization with an integral compensator for quadrotor control", 《FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING》 * |
HUAN HU等: "Proximal policy optimization with an integral compensator for quadrotor control", 《FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING》, vol. 21, no. 5, 31 May 2020 (2020-05-31), pages 777, XP037144867, DOI: 10.1631/FITEE.1900641 * |
ZEZHI SUI等: "Formation Control with Collision Avoidance through Deep Reinforcement Learning", 《IJCNN 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS》 * |
ZEZHI SUI等: "Formation Control with Collision Avoidance through Deep Reinforcement Learning", 《IJCNN 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS》, 19 July 2019 (2019-07-19), pages 1, XP033621746, DOI: 10.1109/IJCNN.2019.8851906 * |
王晓燕等: "无人机三维编队飞行的鲁棒H∞ 控制器设计", 《控制与决策》 * |
王晓燕等: "无人机三维编队飞行的鲁棒H∞ 控制器设计", 《控制与决策》, vol. 27, no. 12, 31 December 2012 (2012-12-31), pages 1907 * |
相晓嘉等: "基于深度强化学习的固定翼无人机编队协调控制方法", 《航空学报》 * |
相晓嘉等: "基于深度强化学习的固定翼无人机编队协调控制方法", 《航空学报》, vol. 42, no. 4, 25 April 2021 (2021-04-25), pages 524009 - 1 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rubí et al. | A survey of path following control strategies for UAVs focused on quadrotors | |
CN110502033B (en) | Fixed-wing unmanned aerial vehicle cluster control method based on reinforcement learning | |
Chen et al. | Path planning for multi-UAV formation | |
CN112162564B (en) | Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm | |
CN106774400B (en) | Unmanned aerial vehicle three-dimensional track guidance method based on inverse dynamics | |
CN110531786B (en) | Unmanned aerial vehicle maneuvering strategy autonomous generation method based on DQN | |
WO2006113173A1 (en) | Decentralized maneuver control in heterogeneous autonomous vehicle networks | |
CN112198886B (en) | Unmanned aerial vehicle control method for tracking maneuvering target | |
Nie et al. | Three-dimensional path-following control of a robotic airship with reinforcement learning | |
Stevšić et al. | Sample efficient learning of path following and obstacle avoidance behavior for quadrotors | |
CN112684781A (en) | Multi-agent distributed model prediction control method and system | |
CN114089776B (en) | Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning | |
CN116225055A (en) | Unmanned aerial vehicle autonomous flight path planning algorithm based on state decomposition in complex environment | |
CN114935943A (en) | Unmanned aerial vehicle and unmanned vehicle cluster formation tracking control method and system | |
CN113900440B (en) | Unmanned aerial vehicle control law design method and device and readable storage medium | |
Yang et al. | A decentralised control strategy for formation flight of unmanned aerial vehicles | |
Montella et al. | Reinforcement learning for autonomous dynamic soaring in shear winds | |
CN116954258A (en) | Hierarchical control method and device for multi-four-rotor unmanned aerial vehicle formation under unknown disturbance | |
Duoxiu et al. | Proximal policy optimization for multi-rotor UAV autonomous guidance, tracking and obstacle avoidance | |
CN115617039B (en) | Event triggering-based distributed affine unmanned aerial vehicle formation controller construction method and unmanned aerial vehicle formation control method | |
Cordeiro et al. | Non linear controller and path planner algorithm for an autonomous variable shape formation flight | |
CN113885576A (en) | Unmanned aerial vehicle formation environment establishment and control method based on deep reinforcement learning | |
Stingu et al. | An approximate dynamic programming based controller for an underactuated 6DOF quadrotor | |
CN115237150A (en) | Fixed-wing formation obstacle avoidance method | |
Verma et al. | A novel trajectory tracking methodology using structured adaptive model inversion for uninhabited aerial vehicles |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |