CN114020013B - Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning - Google Patents

Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning Download PDF

Info

Publication number
CN114020013B
CN114020013B CN202111246299.5A CN202111246299A CN114020013B CN 114020013 B CN114020013 B CN 114020013B CN 202111246299 A CN202111246299 A CN 202111246299A CN 114020013 B CN114020013 B CN 114020013B
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
formation
collision avoidance
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111246299.5A
Other languages
Chinese (zh)
Other versions
CN114020013A (en
Inventor
张学军
王思峰
唐立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University Sichuan International Center For Innovation In Western China Co ltd
Original Assignee
Beihang University Sichuan International Center For Innovation In Western China Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University Sichuan International Center For Innovation In Western China Co ltd filed Critical Beihang University Sichuan International Center For Innovation In Western China Co ltd
Priority to CN202111246299.5A priority Critical patent/CN114020013B/en
Publication of CN114020013A publication Critical patent/CN114020013A/en
Application granted granted Critical
Publication of CN114020013B publication Critical patent/CN114020013B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • G05D1/104Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention provides an unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning, which comprises the following steps: the method comprises the steps that a strategy enabling the unmanned aerial vehicle to independently avoid collision and fly is output, and the unmanned aerial vehicle can keep formation by setting different constraint conditions; training the unmanned aerial vehicle in a simulation environment, generating a strategy based on collision avoidance behavior by selecting different behaviors to set different rewarding values, and recording various state information and collision avoidance strategies of the unmanned aerial vehicle; processing external environment information by adopting an LSTM mode in a cyclic neural network, and training on the basis of an initial strategy by combining with the state information of the unmanned aerial vehicle; different constraint conditions are added on the basis of collision avoidance, so that the unmanned aerial vehicle keeps a certain formation to fly on the basis of avoiding collision between teams, and the unmanned aerial vehicle is continuously operated and optimized through a model. The invention realizes the effective unification of collision avoidance and formation of the unmanned aerial vehicle, can effectively integrate resources, and can adjust the individual behaviors in real time to acquire the optimal collision avoidance behavior.

Description

Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning
Technical Field
The invention relates to the field of deep reinforcement learning and the technical field of unmanned aerial vehicles, in particular to an unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning.
Background
In recent years, multi-agents have been increasingly studied due to their great potential in various fields. The related fields comprise collaborative exploration of monitoring and rescue, satellite cluster collaborative control, unmanned aerial vehicle formation control and the like. The basic concept of multi-agent systems is to solve complex tasks with individual collaboration, which are not accomplished by a single agent even if it has expensive equipment. Formation control is a fundamental problem of multi-agent systems, the goal of which is to achieve and maintain a certain formation shape that enables the multi-agent systems to jointly accomplish a specific task. Formation maintenance is an important issue in formation control. In addition, collision avoidance should also be considered in order to ensure the safety of the multi-agent system. Finding collision-free, time-efficient paths in an uncertain dynamic environment remains a challenge due to interactions between agents and trade-offs between collision avoidance and formation maintenance.
For the problem of formation maintenance, several formation control technologies are proposed in the research of other scholars, including formation control based on behavior, a virtual structure method and a formation control scheme based on a leader-follower architecture. Among these population control techniques, the leader-follower architecture is widely used with its simple structure and availability. While a series of efforts have been made in leader-follower formation control, in previous work, the problem of collision avoidance-based formation control has not been fully studied. Particularly in a dynamic environment, collisions between agents, and between multi-agent systems and obstacles, make collisions more and more difficult.
For the collision avoidance problem, conventional algorithms are generally classified into three types, including an offline planning method, an artificial potential field-based method, and a sensing method. The first offline planning method typically calculates a collision-free trajectory in advance and then uses the result as the desired trajectory for the subsequent tracking control system. However, these methods are computationally intensive. And the information of the whole environment is known in advance, so that the method is inconvenient to realize in a dynamic environment. The artificial potential field based approach avoids collisions by assuming virtual attractive and repulsive fields between individuals in the environment. However, there may be local minima, and sometimes there may be a problem that the destination is not reachable. The sensing and avoiding method solves the problem of collision avoidance by sensing the environment and correspondingly adjusting the current action, and has the characteristic of humanoid. Work on these methods can now be divided into two categories, reaction-based methods and prediction-based methods. The former avoids collisions by setting a walking rule based on the current state, such as a collision avoidance method based on fuzzy logic and a reciprocation speed obstacle method. However, these reaction-based approaches have limitations and may not be reliable in some cases because they do not take into account future conditions. The latter predicts the movement of the obstacle, predicts the future state, and then outputs a long-term action to avoid the collision. However, two problems are apparent-firstly, inaccurate estimation due to various uncertainties; the other is the enormous computational complexity of the prediction operation. Therefore, the traditional collision avoidance method has great limitation and does not have the capability of formation control, so that the gravity center of formation study collision avoidance is gradually shifted to the field of reinforcement learning.
Disclosure of Invention
Aiming at the problem of collision avoidance of a plurality of unmanned aerial vehicle formations, the invention provides an unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning, which is used for carrying out coordinated control on the whole unmanned aerial vehicle formation so as to achieve the purposes of avoiding collision and smoothly completing tasks.
In order to achieve the above object, the present invention adopts the following technical scheme:
an unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning comprises the following specific steps:
step one: the method comprises the steps of selecting a deep reinforcement learning model as a main body frame, setting initial parameters according to an industry maturation experiment, clearly training a strategy for enabling an unmanned aerial vehicle to independently avoid collision and fly, and enabling the unmanned aerial vehicle to keep formation by setting different constraint conditions on the basis;
step two: training the unmanned aerial vehicle in a simulation environment through simulation learning, enabling the unmanned aerial vehicle to simulate the selection behavior of human beings to operate, gradually generating a strategy based on collision avoidance behavior by selecting different behavior setting rewards, recording various state information and collision avoidance strategies of the unmanned aerial vehicle, and storing the state information and the collision avoidance strategies to a certain extent, wherein the state information and the collision avoidance strategies are used as input information of a subsequent learning model;
step three: the external environment information, mainly the state information of the obstacle, is processed by adopting an LSTM mode in the cyclic neural network, then the unmanned aerial vehicle is trained on the basis of an initial strategy by combining the state information of the unmanned aerial vehicle in the second step, and the speed of the unmanned aerial vehicle is adjusted by adopting a second-order dynamics model in the training process so as to obtain stable speed change, wherein the expected value of the training is that the unmanned aerial vehicle can reach a target position in a shorter path;
the Long short-term memory (LSTM) is a special RNN, and is mainly used for solving the problems of gradient elimination and gradient explosion in the Long sequence training process. In short, LSTM is able to perform better in longer sequences than normal RNNs.
Step four: different constraint conditions are added on the basis of collision avoidance, so that the unmanned aerial vehicle keeps a certain formation for flying on the basis of avoiding collision among teams, continuous running optimization is realized through a model, and a flexible flying strategy which keeps the formation and can return to a correct path after collision avoidance is expected to be output. In the first step, the environment comprises a leader, a follower and an obstacle, which are respectively represented by superscripts L, F and O;
the state space of the unmanned aerial vehicle at the time t is expressed as s t The behavioral space can be expressed as a t Other parameters in the training environment are: t represents the time, Δt represents the time step,indicating the position of the unmanned aerial vehicle at time t +.>The speed of the unmanned aerial vehicle at the time t is represented, r is the occupied radius, and p g =[p gx ,p gy ]Representing the target position, v pref To a desired speed, θ t For heading angle->For the state space of the follower +.>Status space for leader +.>Is a state space of an obstacle;
its state information s at time t t Represented asWherein->Representing state information that can be observed by the unmanned aerial vehicle; />Hidden state information which cannot be observed by the unmanned aerial vehicle is represented;
behavior a for unmanned aerial vehicle t Assuming that the unmanned aerial vehicle can quickly respond after receiving the control instruction, settingThe goal of training is to design the strategy pi of the follower: />To select appropriate actions to maintain formation and avoid obstacles;
in the learning architecture, an optimization problem is translated into an objective function, which is a form of multiple objective functions, and a set of constraints, the time t required for the follower to reach the target g And maintaining the accumulated error composition of the formation; meanwhile, the constraint condition also comprises a collision avoidance problem;
the objective function of formation collision avoidance is as follows:
in the formula (1.2), the amino acid sequence,representing other unmanned aerial vehicles in the environment that do not include followers, H t A desired relative offset vector representing the follower relative to the leader; (1.2) represents a constraint condition for avoiding collision, (1.3) represents a constraint condition for reaching a target site, and (1.4) represents a kinematic constraint of the unmanned aerial vehicle.
The second step specifically comprises the following steps:
first, define a joint state space of the unmanned aerial vehicleWherein->Representing the observable space of all followers, +.>Representing the observable space of the obstacle.
Secondly, a value network is designed to estimate the value of the state space, the purpose of the value network is to find the optimal value function, and the definition of the value function is as follows:
in the formula (1.5), the amino acid sequence,representing the rewards acquired at time t, gamma representing the discount factor;
for the optimal strategy pi * :Iterative acquisition from the value function:
in the formula (1.6)Representing the transition probability between time t and t + deltat.
Finally, solving the problem of formation control by adopting a formation evaluation function based on the idea of reinforcement learning, wherein the formation evaluation function is used for evaluating the quality of the formation and calculating the rewards of the formation and reflecting the errors of the formation track in real time; taking Euclidean distance between the target position and the actual position as an input; the constructed reward function for formation is defined as:
in the formula (1.7), the amino acid sequence,a formation error value formed at time t;
the reward function for collision avoidance is expressed as follows:
in the formula (1.8)Representing a minimum distance between the follower and the other drone;
combining equations (1.7) and (1.8) to obtain a complete bonus function R t The method comprises the following steps:
in the third step, the second-order power model is as follows:
in the formula (1.10), the amino acid sequence,and->Representing the position, speed and control input vector of the follower, respectively; in contrast, the +.>And->A position and velocity vector representing the leader; the follower should keep a certain distance from the leader according to the formation to be maintained, H p =[H x ,H y ] T Representing the relative offset vector that needs to be maintained with respect to the leader follower;
suppose ζ F =[(P F ) T ,(V F ) T ] T Indicating the position and speed of the follower, ζ L =[(P L ) T ,(V L ) T ] T Representing the position and speed of the leader, the relative offset vector of the two isThe conditions for maintaining formation for any given initial state follower and leader are:
a control protocol is assumed according to the control conditions, wherein k 1 ,k 2 >0:
Compared with the prior art, the invention has the advantages that:
the collision avoidance behavior of the unmanned aerial vehicle is difficult to be effectively unified with the formation behavior of the unmanned aerial vehicle, the traditional collision avoidance mode lacks flexibility and cannot be flexibly integrated with the formation system of the unmanned aerial vehicle, most of traditional formation control is based on a control theory, is more prone to fixed and unchanged movement tasks, and lacks dynamic adjustment to dynamic environments. Therefore, from the viewpoint of the field of deep reinforcement learning, the unmanned aerial vehicle formation collision avoidance method based on the deep reinforcement learning is provided, so that the unmanned aerial vehicle collision avoidance and formation are effectively unified, resources can be effectively integrated, the behaviors of individuals can be adjusted in real time to obtain the optimal collision avoidance behavior, and the mobility of a cluster system and the adaptability to a complex environment are greatly improved.
Drawings
FIG. 1 is a general idea of the formation collision avoidance of the present invention;
FIG. 2 is a flow chart of processing data by the LSTM module of the present invention;
fig. 3 is a general algorithm block diagram of the formation collision avoidance of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the accompanying drawings and by way of examples in order to make the objects, technical solutions and advantages of the invention more apparent.
The unmanned aerial vehicle formation collision avoidance method for deep reinforcement learning comprises the following specific steps:
step one: firstly, a deep reinforcement learning model is selected as a main body frame, then initial parameters are set according to industry maturation experiments, a strategy enabling the unmanned aerial vehicle to independently avoid collision and fly is explicitly output, and on the basis, the unmanned aerial vehicle can keep formation to a certain extent by setting different constraint conditions. The environment of the invention includes a leader, a follower and an obstacle, and is divided by superscripts F, L, O for convenience of distinction.
The state space of the unmanned aerial vehicle at time t can be expressed as s t The behavioral space can be expressed as a t Other parameters in the training environment are shown in table 1.
TABLE 1 parameter details list
Various information of the unmanned aerial vehicle in the operation process can be clarified through the table 1, and the state information s of the unmanned aerial vehicle at the time t t Can be expressed asWherein->Representing state information that can be observed by the unmanned aerial vehicle;and the hidden state information which cannot be observed by the unmanned aerial vehicle is indicated.
Behavior a for unmanned aerial vehicle t The unmanned aerial vehicle is assumed to respond quickly after receiving the control instruction, thus settingThe goal of training is to design the strategy pi:. About.>To select appropriate actions to maintain formation and avoid obstacles.
In a learning architecture, this problem can be translated into an optimization problem of an objective function, which is a form of a multiple objective function, and a set of constraints, the time t required for the follower to reach the target g And maintaining the accumulated error composition of the formation. At the same time, the constraint condition also comprises a collision avoidance problem.
Thus, the objective function of formation collision avoidance is as follows:
in the formula (1.2), the amino acid sequence,representing other unmanned aerial vehicles in the environment that do not include followers, H t Representing the expected relative offset vector of the follower with respect to the leader. (1.2) represents a constraint condition for avoiding collision, (1.3) represents a constraint condition for reaching a target site, and (1.4) represents a kinematic constraint of the unmanned aerial vehicle.
Step two: the unmanned aerial vehicle is trained in a simulation environment through simulation learning, so that the unmanned aerial vehicle simulates the selection behavior of human beings to operate, a strategy based on collision avoidance behavior is gradually generated by selecting different behavior setting reward values, then various state information and collision avoidance strategies of the unmanned aerial vehicle are recorded, certain storage is carried out, and the state information and the collision avoidance strategies are used as input information of a subsequent learning model.
First defining a joint state space of a unmanned aerial vehicleWherein->Representing the observable space of all followers, +.>Representing the observable space of the obstacle.
Secondly, a value network is designed to estimate the value of the state space, the purpose of the value network is to find the optimal value function, and the definition of the value function is as follows:
in the formula (1.5), the amino acid sequence,indicating the rewards acquired at time t, gamma indicating the discount factor.
For the optimal strategy pi * :The iterative acquisition may be performed from a value function. For example:
in the formula (1.6)Representing the transition probability between time t and t + deltat.
Finally, the problem of formation control is solved by adopting a formation evaluation function based on the idea of reinforcement learning, and the essence of the problem is that the method is used for evaluating the quality of the formation and calculating the rewards of the formation, and particularly can reflect errors of formation tracks in real time. The euclidean distance between the target position and the actual position is taken as an input. The constructed reward function for formation is defined as:
in the formula (1.7), the amino acid sequence,indicating the formation error value formed at time t. While the reward function for collision avoidance is expressed as follows:
in the formula (1.8)Representing the minimum distance between the follower and the other drone. The complete bonus function obtained by combining equation (1.7) and equation (1.8) is:
the idea of combining collision avoidance and clustering can be considered that unmanned aerial vehicle clusters are collision avoidance is shown in fig. 1.
Step three: and thirdly, processing external environment information, mainly the state information of the obstacle, by adopting an LSTM mode in the cyclic neural network, and training on the basis of an initial strategy by combining the state information of the unmanned aerial vehicle in the second step, wherein the speed of the unmanned aerial vehicle is adjusted by adopting a second-order dynamics model in the training process so as to acquire stable speed change, and the expected value of training is that the unmanned aerial vehicle can reach a target position in a shorter path.
At time t, the state of the obstacle is regarded as an input sequence of the LSTM network, and the LSTM network processes the state information of the obstacle one by one as shown in fig. 2, and finally generates the coding information of all the obstacles. The LSTM network can solve the problem of uncertain number of obstacles, thereby avoiding updating the network due to changing number of obstacles.
The second order power model selected is as follows:
in the formula (1.10), the amino acid sequence,and->Representing the position, velocity and control input vector of the follower, respectively. In contrast, the +.>And->Representing the position and velocity vectors of the leader. The follower should be kept at a distance from the leader according to the formation to be maintained, thus H p =[H x ,H y ] T Representing the relative offset vector that needs to be maintained with respect to the leader follower.
Suppose ζ F =[(P F ) T ,(V F ) T ] T Indicating the position and speed of the follower, ζ L =[(P L ) T ,(V L ) T ] T Representing the position and speed of the leader, the relative offset vector of the two isThe conditions for maintaining formation for any given initial state follower and leader are as follows:
depending on the control conditions, the following control protocol (where k 1 ,k 2 >0):
Step four: finally, different constraint conditions are added on the basis of collision avoidance, so that the unmanned aerial vehicle keeps a certain formation for flying on the basis of avoiding collision among teams, and continuous running optimization is carried out through a model, and a flexible flying strategy which keeps the formation and can return to a correct path after collision avoidance is expected to be output. .
The main body frame of the unmanned aerial vehicle formation collision avoidance algorithm is shown in fig. 3.
The training process in the steps can be subdivided into:
(1) Executing an optimal reciprocal collision avoidance strategy algorithm based on formation, and collecting a demonstration data setThen go to (2).
(2) By presenting data setsThe value network V is initialized and then goes to (3).
(3) Initializing a target value networkThen go to (4).
(4) Initializing experience playback memoryFor breaking the association between the data, and then go to (5).
(5) And (4) performing circulation, if the maximum execution times are not reached, executing the following steps, otherwise, exiting, and returning to the cost function V.
a) Initializing random training spaces
b) The following steps are repeatedly performed until success or timeout:
i.A selection behavior
Selecting optimal behavior by maximizing the sum of the cost function and the feedback function
Preserving tuplesInto experience playback memory M
Randomly extracting training lots from M
v. set the desired value of the output
Performing gradient descent based on a cost function
c) Updating target value network once every training C times
d) Ending, returning the cost function V
Those of ordinary skill in the art will appreciate that the embodiments described herein are intended to aid the reader in understanding the practice of the invention and that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims (1)

1. The unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning is characterized by comprising the following steps of:
step one: the method comprises the steps of selecting a deep reinforcement learning model as a main body frame, setting initial parameters according to an industry maturation experiment, clearly training a strategy for enabling an unmanned aerial vehicle to independently avoid collision and fly, and enabling the unmanned aerial vehicle to keep formation by setting different constraint conditions on the basis;
in the first step, the environment comprises a leader, a follower and an obstacle, which are respectively represented by superscripts L, F and O;
the state space of the unmanned aerial vehicle at the time t is expressed as s t The behavioral space can be expressed as a t Other parameters in the training environment are: t represents the time, Δt represents the time step,indicating the position of the unmanned aerial vehicle at time t +.>The speed of the unmanned aerial vehicle at the time t is represented, r is the occupied radius, and p g =[p gx ,p gy ]Representing the target position, v pref To a desired speed, θ t For heading angle->For the state space of the follower +.>Status space for leader +.>Is a state space of an obstacle;
its state information s at time t t Represented asWherein->Representing state information that can be observed by the unmanned aerial vehicle; />Hidden state information which cannot be observed by the unmanned aerial vehicle is represented;
behavior a for unmanned aerial vehicle t Assuming noThe man-machine can quickly respond and set after receiving the control instructionThe goal of training is to design the strategy pi:. About.>To select appropriate actions to maintain formation and avoid obstacles;
in the learning architecture, an optimization problem is translated into an objective function, which is a form of multiple objective functions, and a set of constraints, the time t required for the follower to reach the target g And maintaining the accumulated error composition of the formation; meanwhile, the constraint condition also comprises a collision avoidance problem;
the objective function of formation collision avoidance is as follows:
in the formula (1.2), the amino acid sequence,representing other unmanned aerial vehicles in the environment that do not include followers, H t A desired relative offset vector representing the follower relative to the leader; (1.2) a constraint for avoiding collision,(1.3) representing constraints on reaching the target site and (1.4) representing kinematic constraints of the drone;
step two: training the unmanned aerial vehicle in a simulation environment through simulation learning, enabling the unmanned aerial vehicle to simulate the selection behavior of human beings to operate, gradually generating a strategy based on collision avoidance behavior by selecting different behavior setting rewards, recording various state information and collision avoidance strategies of the unmanned aerial vehicle, and storing the state information and the collision avoidance strategies to a certain extent, wherein the state information and the collision avoidance strategies are used as input information of a subsequent learning model;
the second step specifically comprises the following steps:
first, define a joint state space of the unmanned aerial vehicleWherein->Representing the observable space of all followers, +.>An observable space representing an obstacle;
secondly, a value network is designed to estimate the value of the state space, the purpose of the value network is to find the optimal value function, and the definition of the value function is as follows:
in the formula (1.5), the amino acid sequence,representing the rewards acquired at time t, gamma representing the discount factor;
for the optimal strategy pi * :Iterative acquisition from the value function:
in the formula (1.6)Representing the transition probability between time t and t+Δt;
finally, solving the problem of formation control by adopting a formation evaluation function based on the idea of reinforcement learning, wherein the formation evaluation function is used for evaluating the quality of the formation and calculating the rewards of the formation and reflecting the errors of the formation track in real time; taking Euclidean distance between the target position and the actual position as an input; the constructed reward function for formation is defined as:
in the formula (1.7), the amino acid sequence,a formation error value formed at time t;
the reward function for collision avoidance is expressed as follows:
in the formula (1.8)Representing a minimum distance between the follower and the other drone;
combining equations (1.7) and (1.8) to obtain a complete bonus function R t The method comprises the following steps:
step three: the external environment information is processed by adopting an LSTM mode in the cyclic neural network, then the unmanned aerial vehicle is trained on the basis of an initial strategy by combining with the state information of the unmanned aerial vehicle in the second step, and the speed of the unmanned aerial vehicle is adjusted by adopting a second-order dynamics model in the training process so as to obtain stable speed change, wherein the expected value of the training is that the unmanned aerial vehicle can reach a target position in a shorter path;
in the third step, the second order dynamics model is as follows:
in the formula (1.10), the amino acid sequence,and->Representing the position, speed and control input vector of the follower, respectively; in contrast, the +.>And->A position and velocity vector representing the leader; the follower should keep a certain distance from the leader according to the formation to be maintained, H p =[H x ,H y ] T Representing the relative offset vector that needs to be maintained with respect to the leader follower;
suppose ζ F =[(P F ) T ,(V F ) T ] T Indicating the position and speed of the follower, ζ L =[(P L ) T ,(V L ) T ] T Representing the position and speed of the leader, the relative offset vector of the two isThe conditions for maintaining formation for any given initial state follower and leader are:
a control protocol is assumed according to the control conditions, wherein k 1 ,k 2 >0:(1.12);
Step four: different constraint conditions are added on the basis of collision avoidance, so that the unmanned aerial vehicle keeps a certain formation for flying on the basis of avoiding collision among teams, continuous running optimization is realized through a model, and a flexible flying strategy which keeps the formation and can return to a correct path after collision avoidance is expected to be output.
CN202111246299.5A 2021-10-26 2021-10-26 Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning Active CN114020013B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111246299.5A CN114020013B (en) 2021-10-26 2021-10-26 Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111246299.5A CN114020013B (en) 2021-10-26 2021-10-26 Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN114020013A CN114020013A (en) 2022-02-08
CN114020013B true CN114020013B (en) 2024-03-15

Family

ID=80057596

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111246299.5A Active CN114020013B (en) 2021-10-26 2021-10-26 Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114020013B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114815882A (en) * 2022-04-08 2022-07-29 北京航空航天大学 Unmanned aerial vehicle autonomous formation intelligent control method based on reinforcement learning
CN116069023B (en) * 2022-12-20 2024-02-23 南京航空航天大学 Multi-unmanned vehicle formation control method and system based on deep reinforcement learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992000A (en) * 2019-04-04 2019-07-09 北京航空航天大学 A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning
WO2020056875A1 (en) * 2018-09-20 2020-03-26 初速度(苏州)科技有限公司 Parking strategy based on deep reinforcement learning
CN111694365A (en) * 2020-07-01 2020-09-22 武汉理工大学 Unmanned ship formation path tracking method based on deep reinforcement learning
JP2021034050A (en) * 2019-08-21 2021-03-01 哈爾浜工程大学 Auv action plan and operation control method based on reinforcement learning
CN113110592A (en) * 2021-04-23 2021-07-13 南京大学 Unmanned aerial vehicle obstacle avoidance and path planning method
CN113253733A (en) * 2021-06-03 2021-08-13 杭州未名信科科技有限公司 Navigation obstacle avoidance method, device and system based on learning and fusion
CN113485323A (en) * 2021-06-11 2021-10-08 同济大学 Flexible formation method for cascaded multiple mobile robots

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020056875A1 (en) * 2018-09-20 2020-03-26 初速度(苏州)科技有限公司 Parking strategy based on deep reinforcement learning
CN109992000A (en) * 2019-04-04 2019-07-09 北京航空航天大学 A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning
JP2021034050A (en) * 2019-08-21 2021-03-01 哈爾浜工程大学 Auv action plan and operation control method based on reinforcement learning
CN111694365A (en) * 2020-07-01 2020-09-22 武汉理工大学 Unmanned ship formation path tracking method based on deep reinforcement learning
CN113110592A (en) * 2021-04-23 2021-07-13 南京大学 Unmanned aerial vehicle obstacle avoidance and path planning method
CN113253733A (en) * 2021-06-03 2021-08-13 杭州未名信科科技有限公司 Navigation obstacle avoidance method, device and system based on learning and fusion
CN113485323A (en) * 2021-06-11 2021-10-08 同济大学 Flexible formation method for cascaded multiple mobile robots

Also Published As

Publication number Publication date
CN114020013A (en) 2022-02-08

Similar Documents

Publication Publication Date Title
Jesus et al. Deep deterministic policy gradient for navigation of mobile robots in simulated environments
de Jesus et al. Soft actor-critic for navigation of mobile robots
Konidaris et al. Constructing skill trees for reinforcement learning agents from demonstration trajectories
CN114020013B (en) Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning
US11561544B2 (en) Indoor monocular navigation method based on cross-sensor transfer learning and system thereof
Çatal et al. Learning perception and planning with deep active inference
Eiffert et al. Path planning in dynamic environments using generative rnns and monte carlo tree search
Botteghi et al. On reward shaping for mobile robot navigation: A reinforcement learning and SLAM based approach
Strasdat et al. Which landmark is useful? Learning selection policies for navigation in unknown environments
Fan et al. Learning resilient behaviors for navigation under uncertainty
CN113391633A (en) Urban environment-oriented mobile robot fusion path planning method
Manh et al. Autonomous navigation for omnidirectional robot based on deep reinforcement learning
Ahmad et al. End-to-end probabilistic depth perception and 3d obstacle avoidance using pomdp
Salvatore et al. A neuro-inspired approach to intelligent collision avoidance and navigation
Lee et al. Bayesian Residual Policy Optimization:: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts
Sharma et al. Model based path planning using Q-Learning
Doellinger et al. Environment-aware multi-target tracking of pedestrians
CN114396949B (en) DDPG-based mobile robot apriori-free map navigation decision-making method
Ustun et al. Controlling synthetic characters in simulations: a case for cognitive architectures and sigma
CN115542733A (en) Self-adaptive dynamic window method based on deep reinforcement learning
Nguyen et al. Cumulative training and transfer learning for multi-robots collision-free navigation problems
Botteghi et al. Entropy-based exploration for mobile robot navigation: a learning-based approach
Tang et al. Reinforcement learning for robots path planning with rule-based shallow-trial
Boborzi et al. Learning normalizing flow policies based on highway demonstrations
Truong et al. An efficient navigation framework for autonomous mobile robots in dynamic environments using learning algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant