CN111897316A - Multi-aircraft autonomous decision-making method under scene fast-changing condition - Google Patents

Multi-aircraft autonomous decision-making method under scene fast-changing condition Download PDF

Info

Publication number
CN111897316A
CN111897316A CN202010575719.3A CN202010575719A CN111897316A CN 111897316 A CN111897316 A CN 111897316A CN 202010575719 A CN202010575719 A CN 202010575719A CN 111897316 A CN111897316 A CN 111897316A
Authority
CN
China
Prior art keywords
aircraft
distance
action
ith
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010575719.3A
Other languages
Chinese (zh)
Other versions
CN111897316B (en
Inventor
杜文博
曹先彬
李宇萌
郭通
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202010575719.3A priority Critical patent/CN111897316B/en
Publication of CN111897316A publication Critical patent/CN111897316A/en
Application granted granted Critical
Publication of CN111897316B publication Critical patent/CN111897316B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/0088Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • G05D1/104Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying

Abstract

The invention discloses a multi-aircraft autonomous decision-making method under a scene fast-changing condition, belonging to the technical field of aircrafts; the multi-aircraft autonomous decision method under the scene fast-changing condition specifically comprises the following steps: firstly, carrying a laser radar on each aircraft respectively for target detection, and identifying static obstacles or other aircraft in a detection range according to returned three-dimensional point cloud data; then, an autonomous conflict resolution model is constructed by utilizing three-dimensional point cloud data of the aircraft; solving based on a multi-agent reinforcement learning framework to obtain a reward function of selecting actions according to an input state; and finally, the neural network learning module performs centralized training and decentralized execution based on a reward function, calculates all action values which can be taken based on a certain state through a converged neural network, and solves the multi-agent behavior action according to combined optimization. The invention can utilize the transfer learning technology to carry out inheritance training when scene information changes, and has better transfer property.

Description

Multi-aircraft autonomous decision-making method under scene fast-changing condition
Technical Field
The invention belongs to the technical field of aircrafts, relates to a conflict resolution method, and particularly relates to a multi-aircraft autonomous decision-making method under a scene fast-changing condition.
Background
With the rapid development of aeronautical science and technology, in complex and severe high-risk operating environments, low-altitude small aircrafts are widely applied in the aspects of aerial surveillance, forest rescue, reconnaissance and exploration, military application and the like. Therefore, the problems of path planning and conflict resolution in the autonomous decision-making of the multiple aircrafts cause wide attention of scholars at home and abroad.
The actual low-altitude operation environment has the most important characteristics that the scene is complex and highly dynamic, and a dynamic threat with unknown motion characteristics can exist, and in many actual tasks, the targets of the intelligent agent are not static generally but dynamic, while the regulation and control of the existing aircraft mainly depend on a pre-planned or established action set, and the future complex and dynamic scene is difficult to adapt.
The autonomous decision making of multiple aircrafts is a typical multi-agent cooperation problem, and the agents are expected to have the ability of learning to the environment, namely, automatically acquiring knowledge, accumulating experience, continuously updating and expanding the knowledge, and improving the knowledge performance. Learning capabilities are the ability of an agent to update knowledge through experimentation, observation and speculation. The intelligent agent can improve self adaptive ability only through continuous learning, and obtains knowledge by means of continuous interaction with the environment.
Disclosure of Invention
Aiming at the problems, the invention provides the multi-aircraft autonomous decision-making method under the scene fast-changing condition, which fully considers the dynamic property of the scene and improves the learning capacity of the multi-aircraft.
The multi-aircraft autonomous decision method under the scene fast-changing condition comprises the following steps:
step one, aiming at NUFrame aircraft and NTIn a scene formed by targets, each aircraft is respectively carried with a laser radar sensor and regularly sends radar echo signals in a detection range;
each aircraft corresponds to one target respectively, and the initial value of the target is set randomly. N is a radical ofUAnd NTThe values of (A) are the same.
The detection range is as follows: each aircraft is considered to be a mass point,
Figure BDA0002550901490000012
the horizontal detection range angle is theta for the radius of the maximum detection distanceiAngle of vertical detection range of
Figure BDA0002550901490000011
Secondly, identifying static obstacles or other aircrafts in a detection range by each aircraft according to three-dimensional point cloud data returned by the radar echo signals;
when other aircrafts are detected, the returned three-dimensional point cloud data are the three-dimensional coordinates and the speed directions of the other aircrafts, when static obstacles are detected, the returned three-dimensional point cloud data are the boundary coordinates of the static obstacles, and if no obstacle exists, the returned three-dimensional point cloud data are 0.
Step three, aiming at the time slot t, constructing an autonomous conflict resolution model by the three-dimensional point cloud data of the ith aircraft and other aircraft;
the autonomous conflict resolution model aims at the shortest distance from each aircraft to the respective target point, and the objective function is as follows:
Figure BDA0002550901490000021
s.t.R1,R2,R3
diand the distance between the ith aircraft and the target point corresponding to the aircraft is represented.
The three constraints are as follows:
(1)R1a reward function representing the arrival of each aircraft at a respective target location; the calculation formula is as follows:
Figure BDA0002550901490000022
i′∈{1,2,…,NT};Si′judging the completion degree of the target, if a certain target is not completed, Si′-1, otherwise, the goal is completed, Si′=0。
(2)R2And (3) a return function representing that each aircraft and the static obstacle can not collide with each other, and the calculation formula is as follows:
Figure BDA0002550901490000023
Piis the path of the ith aircraft, DmRepresents the mth static obstacle; m is an element of [1, N ]M],NMRepresenting the total number of static obstacles in the scene.
(3)R3The method is characterized in that a return function which can not generate collision between any aircrafts is represented, and the calculation formula is as follows:
Figure BDA0002550901490000024
Figure BDA0002550901490000025
the position coordinates of the ith aircraft at the current moment are obtained;
Figure BDA0002550901490000026
the position coordinate of the jth aircraft at the current moment is taken as the position coordinate of the jth aircraft at the current moment;
solving the autonomous conflict resolution model of the multiple aircrafts based on the multi-agent reinforcement learning framework to obtain a reward function for selecting actions according to the input state;
the reward functions include the following:
(1) reward function r set for shortest path between each aircraft and initial position of respective targeta
First, an initial r is seta=0;
Then, the ith aircraft XiAt time t, the state is
Figure BDA0002550901490000027
Acting as
Figure BDA0002550901490000028
According to the motion
Figure BDA0002550901490000029
Calculating the aircraft X after executing the actioniCurrent position of
Figure BDA00025509014900000210
And target position
Figure BDA00025509014900000211
The distance between
Figure BDA00025509014900000212
Expressed as:
Figure BDA00025509014900000213
finally, N is cumulatively calculatedUSelecting the sum of the distances between the current position and the target position of the aircraft after the aircraft moves at the moment t, and updating the reward function ra
The update formula is:
Figure BDA00025509014900000214
therefore, if the sum of the accumulated distances of the aircrafts is larger, the joint strategy is poorer; otherwise, the combination strategy is good.
(2) Reward function r set for collision detection of aircraft and obstacleb
First, an initial r is setb=0;
Then, the ith aircraft X is calculatediAccording to the operation at time t
Figure BDA0002550901490000031
Calculating the aircraft X after executing the actioniCurrent position of
Figure BDA0002550901490000032
And the position p of the mth static obstacle in the detection rangemThe distance between, expressed as:
Figure BDA0002550901490000033
further, the distance is determined
Figure BDA0002550901490000034
Whether or not less than aircraft XiMinimum safe distance n from static obstacleoIf so, setting a penalty value
Figure BDA0002550901490000035
Otherwise
Figure BDA0002550901490000036
Setting penalty values
Figure BDA0002550901490000037
For the ith aircraft XiAt time t, the aircraft XiThe distances between the static obstacles in the detection range and the minimum safe distance noJudging to obtain the sum of punishment values
Figure BDA0002550901490000038
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t respectively is updated, and the reward function r is updatedb
Figure BDA0002550901490000039
Therefore, the closer the aircraft is to the obstacle, the smaller the joint revenue obtained from the overall multi-aircraft autonomic decision making.
(3) Reward function r set for collision detection between aircraft and aircraftc
First, an initial r is setc=0;
Then, the ith aircraft X is calculatediAccording to the operation at time t
Figure BDA00025509014900000310
Calculating the aircraft X after executing the actioniCurrent position of
Figure BDA00025509014900000311
And the current position of the jth aircraft in the detection range
Figure BDA00025509014900000312
The distance between, expressed as:
Figure BDA00025509014900000313
here a delay of one time step is set for the observation of other aircraft that is noisy.
Further, the distance is determined
Figure BDA00025509014900000314
Whether or not less than the collision distance n of the aircraftcAnd a proximity risk distance nm,nc<nm(ii) a If so, the mobile terminal can be started,
Figure BDA00025509014900000315
then a penalty value is set
Figure BDA00025509014900000316
Otherwise, when it is satisfied
Figure BDA00025509014900000317
Then a penalty value is set
Figure BDA00025509014900000318
If it satisfies
Figure BDA00025509014900000319
Then a penalty value is set
Figure BDA00025509014900000320
For the ith aircraft XiAt time t, the aircraft XiWith all other fliesDistance of the traveling device and collision distance ncAnd a proximity risk distance nmJudging to obtain the sum of punishment values
Figure BDA00025509014900000321
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t is updated, and the punishment function r is updatedc
Figure BDA00025509014900000322
Thus, the closer the aircraft is to other aircraft, the less the joint revenue the overall multi-aircraft autonomic decision may be to achieve.
And step five, the neural network learning module performs centralized training and decentralized execution based on the reward function, calculates all action values which can be taken based on a certain state through a converged neural network, and solves the multi-agent behavior action according to combined optimization.
The invention has the advantages that:
(1) the multi-aircraft autonomous decision method under the scene fast-changing condition has important practical significance by taking the scene with complex and high dynamics of a low-altitude airspace, unknown running characteristics of multiple elements, more complex coupling relation between the airspace environment and a traffic object and complex and fast-changing tasks as a research background.
(2) The invention relates to a multi-aircraft autonomous decision method under a scene fast-changing condition, which not only fully considers the dynamic property of the scene, but also considers incomplete information and non-ideal communication, and provides a method for guiding the autonomous decision of an aircraft.
Drawings
Fig. 1 is a schematic diagram of the detection range of a laser radar when an aircraft performs collision detection according to the present invention.
FIG. 2 is a diagram of a multi-agent reinforcement learning model according to the present invention.
Fig. 3 is a schematic view of the aircraft safety distance of the present invention.
Fig. 4 is a flowchart of a multi-aircraft autonomous decision method under a scene fast change condition according to the present invention.
Detailed Description
The present invention will be described in further detail and with reference to the accompanying drawings so that those skilled in the art can understand and practice the invention.
The invention provides a multi-aircraft autonomous decision method under a scene fast-changing condition, which aims at a complex high-dynamic scene and has the following characteristics: (1) static and dynamic obstacles coexist in a scene, and a target may change dynamically in the flight process; (2) the perception range of a single unmanned aerial vehicle is limited, and global information cannot be obtained; (3) the unmanned aerial vehicles can communicate with each other to share local airspace information; (4) interference and random loss exist in communication between the unmanned aerial vehicles; the multi-aircraft autonomic decision is broken down into two sub-problems: (1) planning a path; (2) the conflict is resolved. For path planning and conflict resolution, the optimization problem has been proven to be an NP-hard problem, and a heuristic algorithm is required for solving. Therefore, a method for solving the autonomous decision of multiple aircrafts can be completed through division: the two sub-problems are solved by solving the path planning and the conflict first, and then the solutions of the two sub-problems are combined to be used as a final solution.
As shown in fig. 4, the aircraft autonomous decision method includes the following steps:
step one, aiming at NUFrame aircraft and NTIn a scene formed by targets, each aircraft is respectively carried with a laser radar sensor, collision detection is carried out regularly in a detection range, and radar echo signals are sent;
the flight conflict detection adopts non-cooperative threat conflict detection based on a radar system, and the laser radar plays an important role in the autonomous navigation technology. The main performance parameters of the laser radar include the wavelength of laser light, the detection distance, and the field of view (FOV), which is divided into a horizontal field of view and a vertical field of view. The two most commonly used lidar wavelengths are 905nm and 1550 nm. The 1550nm wavelength radar sensor can operate at higher power, detecting distances further than the 905nm wavelength, but with a greater weight.
The invention is provided with NUFrame aircraft and NTEach aircraft corresponds to one target, and the initial value of the target is randomly set; n is a radical ofUAnd NTThe values of (A) are the same. For the ith aircraft XiAt time t, the state is
Figure BDA0002550901490000041
Acting as
Figure BDA0002550901490000042
Status of state
Figure BDA0002550901490000043
Obtaining the position information of the static obstacle by the three-dimensional point cloud data returned by airborne measuring equipment of a laser radar sensor carried by an aircraft;
as shown in fig. 1, the detection range of the radar is: each aircraft is considered to be a mass point,
Figure BDA0002550901490000051
the detection range angle of the horizontal FOV is theta for the radius of the maximum detection distanceiThe vertical FOV detection range angle is
Figure BDA0002550901490000052
Secondly, identifying static obstacles or other aircrafts in a detection range by each aircraft according to three-dimensional point cloud data returned by the radar echo signals;
the aircraft is regarded as a mass point, the aircraft sends radar echo signals within a detection range regularly, when other aircraft are detected, returned three-dimensional point cloud data are three-dimensional coordinates and speed directions of other aircraft, when static obstacles are detected, the returned three-dimensional point cloud data are boundary coordinates of the static obstacles, and if no obstacles exist, the returned three-dimensional point cloud data are 0.
Step three, aiming at the time slot t, establishing an autonomous decision modeling model by the three-dimensional point cloud data of the ith aircraft and other aircraft;
the invention describes the design process of the autonomous decision problem from three aspects of observation value, action and return function.
1) Observed value st: at each time T, T1, 2.., T represents the maximum time at which the aircraft reaches the target; since the agent in reinforcement learning makes a control decision based on the collected current state and the aircraft reward value, an observation s is first constructedtIth aircraft XiThe observed value of the state at time t is expressed as
Figure BDA0002550901490000053
The joint state of the multi-agent system composed of all aircrafts is expressed as
Figure BDA0002550901490000054
Wherein the content of the first and second substances,
Figure BDA0002550901490000055
indicated at time slot t, ith aircraft XiThe action at the time t is
Figure BDA0002550901490000056
According to the motion
Figure BDA0002550901490000057
Calculating the aircraft X after executing the actioniCurrent position of
Figure BDA0002550901490000058
And target position
Figure BDA0002550901490000059
The distance between
Figure BDA00025509014900000510
And judging whether the current task is finished or not.
Figure BDA00025509014900000511
Showing the ith aircraft X at time tiPerforming an action
Figure BDA00025509014900000512
Rear aircraft XiCurrent position p oft iAnd the current position of the jth aircraft in the detection range
Figure BDA00025509014900000513
To determine if a conflict between aircraft has occurred, where the observation of other aircraft is noisy and has a delay of one time step.
Figure BDA00025509014900000514
At time t, the ith aircraft XiPerforming an action
Figure BDA00025509014900000515
Rear aircraft XiCurrent position of
Figure BDA00025509014900000516
And the position p of the mth static obstacle in the detection rangemTo determine whether a collision occurs between the aircraft and the obstacle;
2) action at: from the perspective of the DRL mechanism, if the movement of the aircraft is characterized as an action, the action can cause a change in the environment, and the moving distance of the aircraft can determine the energy consumption of the aircraft. Thus representing reinforcement learning actions based on the flight direction acceleration of the aircraft movement model
Figure BDA00025509014900000517
ρj(t)∈[0,ρmax]Represents the pitch direction velocity, p, received by the jth aircraft at time t as the starting timemaxRepresenting the maximum speed in the pitch direction.
Figure BDA00025509014900000518
Representing the pitch direction acceleration received by the jth aircraft at time t as the starting time.
Figure BDA00025509014900000519
Represents a maximum acceleration in the pitch direction;
Figure BDA00025509014900000520
represents a minimum acceleration in the pitch direction;
Figure BDA0002550901490000061
representing the yaw-direction speed received by the jth aircraft at time t as the starting time.
Figure BDA0002550901490000062
Representing the yaw direction acceleration received by the jth aircraft at time t as the starting time.
Set atThe number of the medium elements is 2 x NUI, the slave agent received action atAnd then, the jth aircraft can be determined to hover at the current position or move to a new position, so that the control of the continuous movement of the aircraft is realized.
3) A return function rt: the objective of the autonomous decision problem is that the distance from each aircraft to the corresponding target point is shortest, so that three different constraints (the aircraft needs to finish the target, and the aircraft cannot collide with obstacles or the aircraft) exist, and in order to design a return function, the invention adopts the objective and the constraint for separately discussing the autonomous risk avoidance problem.
Firstly, the optimization goal of the multi-aircraft autonomous decision is that the path of each aircraft is shortest after the multi-aircraft autonomous decision reaches the goal, and then the objective function is expressed as:
Figure BDA0002550901490000063
diand the distance between the ith aircraft and the target point corresponding to the aircraft is represented.
Besides, three constraint conditions are designed respectively, and the following constraints are required to be met:
(1) all the objectives are accomplished:
R1a reward function representing the arrival of each aircraft at a respective target location; the calculation formula is as follows:
Figure BDA0002550901490000064
i′∈{1,2,…,NT};Si′judging the completion degree of the target, if a certain target is not completed, Si′-1, otherwise, the goal is completed, Si′=0。
(2) No collision between the aircraft and the obstacle can occur:
R2and (3) a return function representing that each aircraft and the static obstacle can not collide with each other, and the calculation formula is as follows:
Figure BDA0002550901490000065
Piis the path of the ith aircraft,
Figure BDA0002550901490000066
Figure BDA0002550901490000067
representing the flight position coordinates of the aircraft at time T; dmRepresents the mth static obstacle; m is an element of [1, N ]M],NMRepresenting the total number of static obstacles in the scene.
(3) No collision between the aircraft can occur:
R3the method is characterized in that a return function which can not generate collision between any aircrafts is represented, and the calculation formula is as follows:
Figure BDA0002550901490000068
Figure BDA0002550901490000069
the position coordinates of the ith aircraft at the current moment are obtained;
Figure BDA00025509014900000610
the position coordinate of the jth aircraft at the current moment is taken as the position coordinate of the jth aircraft at the current moment;
therefore, the multi-aircraft autonomous decision problem now turns into a combinatorial optimization problem, i.e. the autonomous conflict resolution model aims at the shortest distance from each aircraft to its own target point, and the objective function is as follows:
Figure BDA00025509014900000611
s.t.R1,R2,R3
solving the autonomous decision model of the multi-aircraft based on a multi-agent reinforcement learning (MADDPG) frame to obtain a reward function for selecting actions according to the input state;
the specific process is as follows:
1) establishing a multi-agent neural network
The state space and action space of each Agent (Agent) are abstracted to be completely consistent with the aircraft. The policy of each agent is determined by a parameter theta,
Figure BDA0002550901490000071
Figure BDA0002550901490000072
denotes the NthUNeural network parameters for individual aircraft. The strategy of the agent is mu,
Figure BDA0002550901490000073
Figure BDA0002550901490000074
representing the aircraft at a neural network parameter thetaiTime of day policy. Let the policy of the agent be a deterministic policy, the action of the agent is completely determined by its policy and its corresponding parameters:
Figure BDA0002550901490000075
aithe motion of the ith aircraft; o ° oiRepresenting the observation of the ith aircraft, including information on the distances between the agent and obstacles, targets and other agents; thetaiRepresenting neural network parameters for the ith aircraft.
By MADDPG-related theory, deterministic strategy
Figure BDA0002550901490000076
The gradient of (d) is:
Figure BDA0002550901490000077
Figure BDA0002550901490000078
representing an action network objective function; ex,a~DRepresenting a desire for a random strategy sequence;
Figure BDA0002550901490000079
representing a joint observation of an agent;
Figure BDA00025509014900000710
representing the Q value function, D represents the Experience pool (Experience Replay Buffer) in MADDPG, and contains the tuples:
Figure BDA00025509014900000711
x' represents the joint observation of the agent at the next moment;
Figure BDA00025509014900000712
denotes the NthUA reward function for the rack aircraft;
Figure BDA00025509014900000713
the action value function of the network strategy of the critic is expressed and completely realized by a neural network, named as a critic network, and is updated according to the following objective function:
Figure BDA00025509014900000714
L(θi) Representing a critic network loss function; r represents a bonus that is given,
Figure BDA00025509014900000715
ria reward function representing an ith aircraft; γ ∈ (0, 1) denotes the attenuation factor;
Figure BDA00025509014900000716
denotes the NthUErecting the next moment of the aircraft; a'jRepresenting the action of the jth aircraft at the next moment; mu's'jPolicy, o for the next moment of the jth aircraftjRepresents an observation of the jth aircraft;
Figure BDA00025509014900000717
and
Figure BDA00025509014900000718
the structure is identical, but the parameter update lags behind
Figure BDA00025509014900000719
Is generated.
Figure BDA00025509014900000720
The representation parameter update lags behind
Figure BDA00025509014900000721
The critic network strategy action value function has better physical meaning auxiliary action network training, and the action network is updated according to the following formula:
Figure BDA00025509014900000722
wherein J represents an action network objective function; s represents a small batch of samples drawn at random.
The model of the entire design is shown in fig. 2.
2) Reward function design
In order to meet the constraint conditions, the design of the reward function needs to be carried out on the MADDPG; as shown in fig. 3, the reward function includes the following:
(1) accumulating reward functions r set for shortest paths between each aircraft and the initial position of the respective targeta
First, an initial r is seta=0;
Then, the ith aircraft XiAt time t, the state is
Figure BDA0002550901490000081
Acting as
Figure BDA0002550901490000082
According to the motion
Figure BDA0002550901490000083
Calculating the aircraft X after executing the actioniCurrent position of
Figure BDA0002550901490000084
And target position
Figure BDA0002550901490000085
The distance between
Figure BDA0002550901490000086
Expressed as:
Figure BDA0002550901490000087
finally, N is cumulatively calculatedUSelecting the sum of the distances between the current position and the target position of the aircraft after the aircraft moves at the moment t, and updating the reward function ra
The update formula is:
Figure BDA0002550901490000088
therefore, if the sum of the accumulated distances of the aircrafts is larger, the joint strategy is poorer; otherwise, the combination strategy is good.
(2) Reward function r set for collision detection of aircraft and obstacleb
In order to ensure that the aircraft and the obstacle do not collide, collision detection is required, and first, initial r is setb=0;
Then, the ith aircraft X is calculatediAccording to the operation at time t
Figure BDA0002550901490000089
Calculating the aircraft X after executing the actioniCurrent position of
Figure BDA00025509014900000810
And the position p of the mth static obstacle in the detection rangemThe distance between, expressed as:
Figure BDA00025509014900000811
further, the distance is determined
Figure BDA00025509014900000812
Whether or not less than aircraft XiMinimum safe distance n from static obstacleoIf so, setting a penalty value
Figure BDA00025509014900000813
Otherwise
Figure BDA00025509014900000814
Setting penalty values
Figure BDA00025509014900000815
For the ith aircraft XiAt time t, the aircraft XiThe distances between the static obstacles in the detection range and the minimum safe distance noJudging to obtain the sum of punishment values
Figure BDA00025509014900000816
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t respectively is updated, and the reward function r is updatedb
Figure BDA00025509014900000817
Therefore, the closer the aircraft is to the obstacle, the smaller the joint revenue obtained from the overall multi-aircraft autonomic decision making.
(3) Reward function r set for collision detection between aircraft and aircraftc
In order to ensure that no collision occurs between the aircraft and the aircraft, collision detection needs to be performed, and first, an initial r is setc=0;
Then, the ith aircraft X is calculatediAccording to the operation at time t
Figure BDA00025509014900000818
Calculating the aircraft X after executing the actioniCurrent position of
Figure BDA00025509014900000819
And the current position of the jth aircraft in the detection range
Figure BDA00025509014900000820
The distance between, expressed as:
Figure BDA00025509014900000821
here a delay of one time step is set for the observation of other aircraft that is noisy.
Further, the distance is determined
Figure BDA00025509014900000822
Whether or not less than the collision distance n of the aircraftcAnd approach toDistance at risk nm,nc<nm(ii) a If so, the mobile terminal can be started,
Figure BDA00025509014900000823
then a penalty value is set
Figure BDA00025509014900000824
Otherwise, when it is satisfied
Figure BDA00025509014900000825
Then a penalty value is set
Figure BDA00025509014900000826
If it satisfies
Figure BDA00025509014900000827
Then a penalty value is set
Figure BDA00025509014900000828
For the ith aircraft XiAt time t, the aircraft XiThe distance from all other aircraft is respectively the collision distance ncAnd a proximity risk distance nmJudging to obtain the sum of punishment values
Figure BDA0002550901490000091
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t is updated, and the punishment function r is updatedc
Figure BDA0002550901490000092
Thus, the closer the aircraft is to other aircraft, the less the joint revenue the overall multi-aircraft autonomic decision may be to achieve.
And step five, the neural network learning module performs centralized training and decentralized execution based on the reward function, calculates all action values which can be taken based on a certain state through a converged neural network, and solves the multi-agent behavior action according to combined optimization.
Each agent contains an action Network (Actor Network) and a critic Network (CriticNetwork). The Critic part of each Agent can acquire action information of all the other agents, centralized training and decentralized execution are carried out, namely during training, overall Critic capable of being observed is introduced to guide operator training, and during testing, only the operator with local observation is used for taking action.

Claims (4)

1. A multi-aircraft autonomous decision method under a scene fast-changing condition is characterized by comprising the following steps:
step one, aiming at NUFrame aircraft and NTIn a scene formed by targets, each aircraft is respectively carried with a laser radar sensor and regularly sends radar echo signals in a detection range;
secondly, identifying static obstacles or other aircrafts in a detection range by each aircraft according to three-dimensional point cloud data returned by the radar echo signals;
step three, aiming at the time slot t, constructing an autonomous conflict resolution model by the three-dimensional point cloud data of the ith aircraft and other aircraft;
the autonomous conflict resolution model aims at the shortest distance from each aircraft to the respective target point, and the objective function is as follows:
Figure FDA0002550901480000011
s.t.R1,R2,R3
direpresenting the distance between the ith aircraft and a target point corresponding to the aircraft;
the three constraints are as follows:
(1)R1a reward function representing the arrival of each aircraft at a respective target location; the calculation formula is as follows:
Figure FDA0002550901480000012
i′∈{1,2,…,NT};Si′judging the completion degree of the target, if a certain target is not completed, Si′-1, otherwise, the goal is completed, Si′=0;
(2)R2And (3) a return function representing that each aircraft and the static obstacle can not collide with each other, and the calculation formula is as follows:
Figure FDA0002550901480000013
Piis the path of the ith aircraft, DmRepresents the mth static obstacle; m is an element of [1, N ]M],NMRepresenting a total number of static obstacles in the scene;
(3)R3the method is characterized in that a return function which can not generate collision between any aircrafts is represented, and the calculation formula is as follows:
Figure FDA0002550901480000014
Figure FDA0002550901480000015
the position coordinates of the ith aircraft at the current moment are obtained;
Figure FDA0002550901480000016
the position coordinate of the jth aircraft at the current moment is taken as the position coordinate of the jth aircraft at the current moment;
solving the autonomous conflict resolution model of the multiple aircrafts based on the multi-agent reinforcement learning framework to obtain a reward function for selecting actions according to the input state;
the reward functions include the following:
(1) reward function r set for shortest path between each aircraft and initial position of respective targeta
First, an initial r is seta=0;
Then, the ith aircraft XiAt time t, the state is
Figure FDA0002550901480000017
Acting as
Figure FDA0002550901480000018
According to the motion
Figure FDA0002550901480000019
Calculating the aircraft X after executing the actioniCurrent position of
Figure FDA00025509014800000110
And target position
Figure FDA00025509014800000111
The distance between
Figure FDA00025509014800000112
Expressed as:
Figure FDA00025509014800000113
finally, N is cumulatively calculatedUSelecting the sum of the distances between the current position and the target position of the aircraft after the aircraft moves at the moment t, and updating the reward function ra
The update formula is:
Figure FDA0002550901480000021
therefore, if the sum of the accumulated distances of the aircrafts is larger, the joint strategy is poorer; otherwise, the combination strategy is good;
(2) reward function r set for collision detection of aircraft and obstacleb
First, an initial r is setb=0;
Then, the ith aircraft X is calculatediAccording to the operation at time t
Figure FDA0002550901480000022
ComputingAfter performing the action, the aircraft XiCurrent position of
Figure FDA0002550901480000023
And the position p of the mth static obstacle in the detection rangemThe distance between, expressed as:
Figure FDA0002550901480000024
further, the distance is determined
Figure FDA0002550901480000025
Whether or not less than aircraft XiMinimum safe distance n from static obstacleoIf so, setting a penalty value
Figure FDA0002550901480000026
Otherwise
Figure FDA0002550901480000027
Setting penalty values
Figure FDA0002550901480000028
For the ith aircraft XiAt time t, the aircraft XiThe distances between the static obstacles in the detection range and the minimum safe distance noJudging to obtain the sum of punishment values
Figure FDA0002550901480000029
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t respectively is updated, and the reward function r is updatedb
Figure FDA00025509014800000210
Therefore, the closer the aircraft is to the obstacle, the smaller the joint revenue obtained by the whole multi-aircraft autonomous decision;
(3) reward function r set for collision detection between aircraft and aircraftc
First, an initial r is setc=0;
Then, the ith aircraft X is calculatediAccording to the operation at time t
Figure FDA00025509014800000211
Calculating the aircraft X after executing the actioniCurrent position of
Figure FDA00025509014800000212
And the current position of the jth aircraft in the detection range
Figure FDA00025509014800000213
The distance between, expressed as:
Figure FDA00025509014800000214
where the observations of other aircraft are noisy and delayed by a time step;
further, the distance is determined
Figure FDA00025509014800000215
Whether or not less than the collision distance n of the aircraftcAnd a proximity risk distance nm,nc<nm(ii) a If so, the mobile terminal can be started,
Figure FDA00025509014800000216
then a penalty value is set
Figure FDA00025509014800000217
Otherwise, when it is satisfied
Figure FDA00025509014800000218
Then a penalty value is set
Figure FDA00025509014800000219
If it satisfies
Figure FDA00025509014800000220
Then a penalty value is set
Figure FDA00025509014800000221
For the ith aircraft XiAt time t, the aircraft XiThe distance from all other aircraft is respectively the collision distance ncAnd a proximity risk distance nmJudging to obtain the sum of punishment values
Figure FDA00025509014800000222
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t is updated, and the punishment function r is updatedc
Figure FDA00025509014800000223
Therefore, the closer the aircraft is to other aircraft, the smaller the joint revenue obtained by the whole multi-aircraft autonomous decision;
and step five, the neural network learning module performs centralized training and decentralized execution based on the reward function, calculates all action values which can be taken based on a certain state through a converged neural network, and solves the multi-agent behavior action according to combined optimization.
2. The method for multi-aircraft autonomous decision making under the scene fast changing condition as claimed in claim 1, wherein in the first step, each aircraft corresponds to a target, and the initial value of the target is randomly set.
3. The method for multi-aircraft autonomous decision making under the condition of fast changing scenes as claimed in claim 1, wherein in step two, when other aircraft is detected, the returned three-dimensional point cloud data are the three-dimensional coordinates and the speed direction of the other aircraft, when static obstacle is detected, the returned three-dimensional point cloud data are the boundary coordinates of the static obstacle, and if no obstacle is present, the returned three-dimensional point cloud data are 0.
4. The method for multi-aircraft autonomous decision making under scene fast changing conditions as claimed in claim 1, wherein in the fifth step, each agent comprises an action Network (Actor Network) and a critic Network (critic Network). The Critic part of each Agent can acquire action information of all the other agents, centralized training and decentralized execution are carried out, namely during training, overall Critic for observation is introduced to guide the training of an actor, and during testing, only the actor with local observation is used for taking action.
CN202010575719.3A 2020-06-22 2020-06-22 Multi-aircraft autonomous decision-making method under scene fast-changing condition Active CN111897316B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010575719.3A CN111897316B (en) 2020-06-22 2020-06-22 Multi-aircraft autonomous decision-making method under scene fast-changing condition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010575719.3A CN111897316B (en) 2020-06-22 2020-06-22 Multi-aircraft autonomous decision-making method under scene fast-changing condition

Publications (2)

Publication Number Publication Date
CN111897316A true CN111897316A (en) 2020-11-06
CN111897316B CN111897316B (en) 2021-05-14

Family

ID=73207769

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010575719.3A Active CN111897316B (en) 2020-06-22 2020-06-22 Multi-aircraft autonomous decision-making method under scene fast-changing condition

Country Status (1)

Country Link
CN (1) CN111897316B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112462804A (en) * 2020-12-24 2021-03-09 四川大学 Unmanned aerial vehicle perception and avoidance strategy based on ADS-B and ant colony algorithm
CN112633415A (en) * 2021-01-11 2021-04-09 中国人民解放军国防科技大学 Unmanned aerial vehicle cluster intelligent task execution method and device based on rule constraint training
CN113705921A (en) * 2021-09-03 2021-11-26 厦门闽江智慧科技有限公司 Electric vehicle dynamic path planning optimization solving method based on hybrid charging strategy
CN113962031A (en) * 2021-12-20 2022-01-21 北京航空航天大学 Heterogeneous platform conflict resolution method based on graph neural network reinforcement learning
CN114115350A (en) * 2021-12-02 2022-03-01 清华大学 Aircraft control method, device and equipment
CN114237293A (en) * 2021-12-16 2022-03-25 中国人民解放军海军航空大学 Deep reinforcement learning formation transformation method and system based on dynamic target allocation
CN114237235A (en) * 2021-12-02 2022-03-25 之江实验室 Mobile robot obstacle avoidance method based on deep reinforcement learning
CN114679757A (en) * 2020-12-26 2022-06-28 中国航天科工飞航技术研究院(中国航天海鹰机电技术研究院) Ultra-high-speed low-vacuum pipeline aircraft handover switching method and device
CN117177275A (en) * 2023-11-03 2023-12-05 中国人民解放军国防科技大学 SCMA-MEC-based Internet of things equipment calculation rate optimization method
US11907335B2 (en) * 2020-10-16 2024-02-20 Cognitive Space System and method for facilitating autonomous target selection

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109116854A (en) * 2018-09-16 2019-01-01 南京大学 A kind of robot cooperated control method of multiple groups based on intensified learning and control system
CN109725532A (en) * 2018-12-24 2019-05-07 杭州电子科技大学 One kind being applied to relative distance control and adaptive corrective method between multiple agent
CN109992000A (en) * 2019-04-04 2019-07-09 北京航空航天大学 A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning
US20190342731A1 (en) * 2018-05-01 2019-11-07 New York University System method and computer-accessible medium for blockchain-based distributed ledger for analyzing and tracking environmental targets
WO2019234702A2 (en) * 2018-06-08 2019-12-12 Tata Consultancy Services Limited Actor model based architecture for multi robot systems and optimized task scheduling method thereof
CN110991545A (en) * 2019-12-10 2020-04-10 中国人民解放军军事科学院国防科技创新研究院 Multi-agent confrontation oriented reinforcement learning training optimization method and device
CN111045445A (en) * 2019-10-23 2020-04-21 浩亚信息科技有限公司 Aircraft intelligent collision avoidance method, equipment and medium based on reinforcement learning
CN111103881A (en) * 2019-12-25 2020-05-05 北方工业大学 Multi-agent formation anti-collision control method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190342731A1 (en) * 2018-05-01 2019-11-07 New York University System method and computer-accessible medium for blockchain-based distributed ledger for analyzing and tracking environmental targets
WO2019234702A2 (en) * 2018-06-08 2019-12-12 Tata Consultancy Services Limited Actor model based architecture for multi robot systems and optimized task scheduling method thereof
CN109116854A (en) * 2018-09-16 2019-01-01 南京大学 A kind of robot cooperated control method of multiple groups based on intensified learning and control system
CN109725532A (en) * 2018-12-24 2019-05-07 杭州电子科技大学 One kind being applied to relative distance control and adaptive corrective method between multiple agent
CN109992000A (en) * 2019-04-04 2019-07-09 北京航空航天大学 A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning
CN111045445A (en) * 2019-10-23 2020-04-21 浩亚信息科技有限公司 Aircraft intelligent collision avoidance method, equipment and medium based on reinforcement learning
CN110991545A (en) * 2019-12-10 2020-04-10 中国人民解放军军事科学院国防科技创新研究院 Multi-agent confrontation oriented reinforcement learning training optimization method and device
CN111103881A (en) * 2019-12-25 2020-05-05 北方工业大学 Multi-agent formation anti-collision control method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YUMENG LI ETC: "A SATISFICING CONFLICT RESOLUTION APPROACH FOR MULTIPLE UAVS", 《IEEE INTERNET OF THINGS JOURNAL》 *
齐乃明等: "航迹预测的多无人机任务规划方法", 《哈尔滨工业大学学报》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11907335B2 (en) * 2020-10-16 2024-02-20 Cognitive Space System and method for facilitating autonomous target selection
CN112462804B (en) * 2020-12-24 2022-05-10 四川大学 Unmanned aerial vehicle perception and avoidance strategy based on ADS-B and ant colony algorithm
CN112462804A (en) * 2020-12-24 2021-03-09 四川大学 Unmanned aerial vehicle perception and avoidance strategy based on ADS-B and ant colony algorithm
CN114679757B (en) * 2020-12-26 2023-11-03 中国航天科工飞航技术研究院(中国航天海鹰机电技术研究院) Cross-zone switching method and device for ultra-high-speed low-vacuum pipeline aircraft
CN114679757A (en) * 2020-12-26 2022-06-28 中国航天科工飞航技术研究院(中国航天海鹰机电技术研究院) Ultra-high-speed low-vacuum pipeline aircraft handover switching method and device
CN112633415B (en) * 2021-01-11 2023-05-19 中国人民解放军国防科技大学 Unmanned aerial vehicle cluster intelligent task execution method and device based on rule constraint training
CN112633415A (en) * 2021-01-11 2021-04-09 中国人民解放军国防科技大学 Unmanned aerial vehicle cluster intelligent task execution method and device based on rule constraint training
CN113705921B (en) * 2021-09-03 2024-02-27 厦门闽江智慧科技有限公司 Electric vehicle dynamic path planning optimization method based on hybrid charging strategy
CN113705921A (en) * 2021-09-03 2021-11-26 厦门闽江智慧科技有限公司 Electric vehicle dynamic path planning optimization solving method based on hybrid charging strategy
CN114237235B (en) * 2021-12-02 2024-01-19 之江实验室 Mobile robot obstacle avoidance method based on deep reinforcement learning
CN114237235A (en) * 2021-12-02 2022-03-25 之江实验室 Mobile robot obstacle avoidance method based on deep reinforcement learning
CN114115350A (en) * 2021-12-02 2022-03-01 清华大学 Aircraft control method, device and equipment
CN114237293A (en) * 2021-12-16 2022-03-25 中国人民解放军海军航空大学 Deep reinforcement learning formation transformation method and system based on dynamic target allocation
CN114237293B (en) * 2021-12-16 2023-08-25 中国人民解放军海军航空大学 Deep reinforcement learning formation transformation method and system based on dynamic target allocation
CN113962031B (en) * 2021-12-20 2022-03-29 北京航空航天大学 Heterogeneous platform conflict resolution method based on graph neural network reinforcement learning
CN113962031A (en) * 2021-12-20 2022-01-21 北京航空航天大学 Heterogeneous platform conflict resolution method based on graph neural network reinforcement learning
CN117177275B (en) * 2023-11-03 2024-01-30 中国人民解放军国防科技大学 SCMA-MEC-based Internet of things equipment calculation rate optimization method
CN117177275A (en) * 2023-11-03 2023-12-05 中国人民解放军国防科技大学 SCMA-MEC-based Internet of things equipment calculation rate optimization method

Also Published As

Publication number Publication date
CN111897316B (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN111897316B (en) Multi-aircraft autonomous decision-making method under scene fast-changing condition
CN110456823B (en) Double-layer path planning method aiming at unmanned aerial vehicle calculation and storage capacity limitation
Tisdale et al. Autonomous UAV path planning and estimation
CN110703804A (en) Layering anti-collision control method for fixed-wing unmanned aerial vehicle cluster
KR20190023633A (en) Wide area autonomus search method and system using multi UAVs
CN113848974B (en) Aircraft trajectory planning method and system based on deep reinforcement learning
Wang et al. Virtual reality technology of multi uavearthquake disaster path optimization
Chen et al. Path planning and cooperative control for multiple UAVs based on consistency theory and Voronoi diagram
CN112923925B (en) Dual-mode multi-unmanned aerial vehicle collaborative track planning method for hovering and tracking ground target
Li et al. Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm
Liu A progressive motion-planning algorithm and traffic flow analysis for high-density 2D traffic
CN111880574B (en) Unmanned aerial vehicle collision avoidance method and system
CN110825112B (en) Oil field dynamic invasion target tracking system and method based on multiple unmanned aerial vehicles
CN114679729A (en) Radar communication integrated unmanned aerial vehicle cooperative multi-target detection method
Bodi et al. Reinforcement learning based UAV formation control in GPS-denied environment
CN114138022A (en) Distributed formation control method for unmanned aerial vehicle cluster based on elite pigeon swarm intelligence
Huang et al. Cooperative collision avoidance method for multi-UAV based on Kalman filter and model predictive control
CN113110593A (en) Flight formation cooperative self-adaptive control method based on virtual structure and estimation information transmission
Yan et al. Collaborative path planning based on MAXQ hierarchical reinforcement learning for manned/unmanned aerial vehicles
CN113900449B (en) Multi-unmanned aerial vehicle track planning method and device, unmanned aerial vehicle and storage medium
CN116822362A (en) Unmanned aerial vehicle conflict-free four-dimensional flight path planning method based on particle swarm optimization
Zhang et al. Survey of safety management approaches to unmanned aerial vehicles and enabling technologies
Duoxiu et al. Proximal policy optimization for multi-rotor UAV autonomous guidance, tracking and obstacle avoidance
Liu et al. Multi-agent collaborative adaptive cruise control based on reinforcement learning
Lu et al. Dual Redundant UAV Path Planning and Mission Analysis Based on Dubins Curves

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant