CN111897316B - Multi-aircraft autonomous decision-making method under scene fast-changing condition - Google Patents
Multi-aircraft autonomous decision-making method under scene fast-changing condition Download PDFInfo
- Publication number
- CN111897316B CN111897316B CN202010575719.3A CN202010575719A CN111897316B CN 111897316 B CN111897316 B CN 111897316B CN 202010575719 A CN202010575719 A CN 202010575719A CN 111897316 B CN111897316 B CN 111897316B
- Authority
- CN
- China
- Prior art keywords
- aircraft
- distance
- action
- ith
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 230000006870 function Effects 0.000 claims abstract description 56
- 238000001514 detection method Methods 0.000 claims abstract description 42
- 230000009471 action Effects 0.000 claims abstract description 37
- 230000003068 static effect Effects 0.000 claims abstract description 32
- 238000013528 artificial neural network Methods 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims abstract description 12
- 238000005457 optimization Methods 0.000 claims abstract description 7
- 230000002787 reinforcement Effects 0.000 claims abstract description 7
- 230000006399 behavior Effects 0.000 claims abstract description 4
- 238000004364 calculation method Methods 0.000 claims description 14
- 230000001186 cumulative effect Effects 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 2
- 230000003111 delayed effect Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000013526 transfer learning Methods 0.000 abstract 1
- 230000001133 acceleration Effects 0.000 description 5
- 230000002567 autonomic effect Effects 0.000 description 5
- 230000033001 locomotion Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/0088—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
- G05D1/104—Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Game Theory and Decision Science (AREA)
- Medical Informatics (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention discloses a multi-aircraft autonomous decision-making method under a scene fast-changing condition, belonging to the technical field of aircrafts; the multi-aircraft autonomous decision method under the scene fast-changing condition specifically comprises the following steps: firstly, carrying a laser radar on each aircraft respectively for target detection, and identifying static obstacles or other aircraft in a detection range according to returned three-dimensional point cloud data; then, an autonomous conflict resolution model is constructed by utilizing three-dimensional point cloud data of the aircraft; solving based on a multi-agent reinforcement learning framework to obtain a reward function of selecting actions according to an input state; and finally, the neural network learning module performs centralized training and decentralized execution based on a reward function, calculates all action values which can be taken based on a certain state through a converged neural network, and solves the multi-agent behavior action according to combined optimization. The invention can utilize the transfer learning technology to carry out inheritance training when scene information changes, and has better transfer property.
Description
Technical Field
The invention belongs to the technical field of aircrafts, relates to a conflict resolution method, and particularly relates to a multi-aircraft autonomous decision-making method under a scene fast-changing condition.
Background
With the rapid development of aeronautical science and technology, in complex and severe high-risk operating environments, low-altitude small aircrafts are widely applied in the aspects of aerial surveillance, forest rescue, reconnaissance and exploration, military application and the like. Therefore, the problems of path planning and conflict resolution in the autonomous decision-making of the multiple aircrafts cause wide attention of scholars at home and abroad.
The actual low-altitude operation environment has the most important characteristics that the scene is complex and highly dynamic, and a dynamic threat with unknown motion characteristics can exist, and in many actual tasks, the targets of the intelligent agent are not static generally but dynamic, while the regulation and control of the existing aircraft mainly depend on a pre-planned or established action set, and the future complex and dynamic scene is difficult to adapt.
The autonomous decision making of multiple aircrafts is a typical multi-agent cooperation problem, and the agents are expected to have the ability of learning to the environment, namely, automatically acquiring knowledge, accumulating experience, continuously updating and expanding the knowledge, and improving the knowledge performance. Learning capabilities are the ability of an agent to update knowledge through experimentation, observation and speculation. The intelligent agent can improve self adaptive ability only through continuous learning, and obtains knowledge by means of continuous interaction with the environment.
Disclosure of Invention
Aiming at the problems, the invention provides the multi-aircraft autonomous decision-making method under the scene fast-changing condition, which fully considers the dynamic property of the scene and improves the learning capacity of the multi-aircraft.
The multi-aircraft autonomous decision method under the scene fast-changing condition comprises the following steps:
step one, aiming at NUFrame aircraft and NTIn a scene formed by targets, each aircraft is respectively carried with a laser radar sensor and regularly sends radar echo signals in a detection range;
each aircraft corresponds to one target respectively, and the initial value of the target is set randomly. N is a radical ofUAnd NTThe values of (A) are the same.
The detection range is as follows: each aircraft is considered to be a mass point,the horizontal detection range angle is theta for the radius of the maximum detection distanceiAngle of vertical detection range of
Secondly, identifying static obstacles or other aircrafts in a detection range by each aircraft according to three-dimensional point cloud data returned by the radar echo signals;
when other aircrafts are detected, the returned three-dimensional point cloud data are the three-dimensional coordinates and the speed directions of the other aircrafts, when static obstacles are detected, the returned three-dimensional point cloud data are the boundary coordinates of the static obstacles, and if no obstacle exists, the returned three-dimensional point cloud data are 0.
Step three, aiming at the time slot t, constructing an autonomous conflict resolution model by the three-dimensional point cloud data of the ith aircraft and other aircraft;
the autonomous conflict resolution model aims at the shortest distance from each aircraft to the respective target point, and the objective function is as follows:
s.t.R1,R2,R3
diand the distance between the ith aircraft and the target point corresponding to the aircraft is represented.
The three constraints are as follows:
(1)R1a reward function representing the arrival of each aircraft at a respective target location; the calculation formula is as follows:
i′∈{1,2,…,NT};Si′judging the completion degree of the target, if a certain target is not completed, Si′-1, otherwise, the goal is completed, Si′=0。
(2)R2And (3) a return function representing that each aircraft and the static obstacle can not collide with each other, and the calculation formula is as follows:Piis the path of the ith aircraft, DmRepresents the mth static obstacle; m is an element of [1, N ]M],NMRepresenting the total number of static obstacles in the scene.
(3)R3The method is characterized in that a return function which can not generate collision between any aircrafts is represented, and the calculation formula is as follows:
the position coordinates of the ith aircraft at the current moment are obtained;the position coordinate of the jth aircraft at the current moment is taken as the position coordinate of the jth aircraft at the current moment;
solving the autonomous conflict resolution model of the multiple aircrafts based on the multi-agent reinforcement learning framework to obtain a reward function for selecting actions according to the input state;
the reward functions include the following:
(1) reward function r set for shortest path between each aircraft and initial position of respective targeta;
First, an initial r is seta=0;
Then, the ith aircraft XiAt time t, the state isActing asAccording to the motionCalculating the aircraft X after executing the actioniCurrent position ofAnd target positionThe distance betweenExpressed as:
finally, N is cumulatively calculatedUSelecting the sum of the distances between the current position and the target position of the aircraft after the aircraft moves at the moment t, and updating the reward function ra;
therefore, if the sum of the accumulated distances of the aircrafts is larger, the joint strategy is poorer; otherwise, the combination strategy is good.
(2) Reward function r set for collision detection of aircraft and obstacleb;
First, an initial r is setb=0;
Then, the ith aircraft X is calculatediAccording to the operation at time tCalculating the aircraft X after executing the actioniCurrent position ofAnd the position p of the mth static obstacle in the detection rangemThe distance between, expressed as:
further, the distance is determinedWhether or not less than aircraft XiMinimum safe distance n from static obstacleoIf so, setting a penalty valueOtherwiseSetting penalty values
For the ith aircraft XiAt time t, the aircraft XiThe distances between the static obstacles in the detection range and the minimum safe distance noJudging to obtain the sum of punishment values
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t respectively is updated, and the reward function r is updatedb:
Therefore, the closer the aircraft is to the obstacle, the smaller the joint revenue obtained from the overall multi-aircraft autonomic decision making.
(3) Reward function r set for collision detection between aircraft and aircraftc;
First, an initial r is setc=0;
Then, the ith aircraft X is calculatediAccording to the operation at time tCalculating the aircraft X after executing the actioniCurrent position ofAnd the current position of the jth aircraft in the detection rangeThe distance between, expressed as:
here a delay of one time step is set for the observation of other aircraft that is noisy.
Further, the distance is determinedWhether or not less than the collision distance n of the aircraftcAnd a proximity risk distance nm,nc<nm(ii) a If so, the mobile terminal can be started,then a penalty value is setOtherwise, when it is satisfiedThen a penalty value is setIf it satisfiesThen a penalty value is set
For the ith aircraft XiAt time t, the aircraft XiThe distance from all other aircraft is respectively the collision distance ncAnd a proximity risk distance nmJudging to obtain the sum of punishment values
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t is updated, and the punishment function r is updatedc:
Thus, the closer the aircraft is to other aircraft, the less the joint revenue the overall multi-aircraft autonomic decision may be to achieve.
And step five, the neural network learning module performs centralized training and decentralized execution based on the reward function, calculates all action values which can be taken based on a certain state through a converged neural network, and solves the multi-agent behavior action according to combined optimization.
The invention has the advantages that:
(1) the multi-aircraft autonomous decision method under the scene fast-changing condition has important practical significance by taking the scene with complex and high dynamics of a low-altitude airspace, unknown running characteristics of multiple elements, more complex coupling relation between the airspace environment and a traffic object and complex and fast-changing tasks as a research background.
(2) The invention relates to a multi-aircraft autonomous decision method under a scene fast-changing condition, which not only fully considers the dynamic property of the scene, but also considers incomplete information and non-ideal communication, and provides a method for guiding the autonomous decision of an aircraft.
Drawings
Fig. 1 is a schematic diagram of the detection range of a laser radar when an aircraft performs collision detection according to the present invention.
FIG. 2 is a diagram of a multi-agent reinforcement learning model according to the present invention.
Fig. 3 is a schematic view of the aircraft safety distance of the present invention.
Fig. 4 is a flowchart of a multi-aircraft autonomous decision method under a scene fast change condition according to the present invention.
Detailed Description
The present invention will be described in further detail and with reference to the accompanying drawings so that those skilled in the art can understand and practice the invention.
The invention provides a multi-aircraft autonomous decision method under a scene fast-changing condition, which aims at a complex high-dynamic scene and has the following characteristics: (1) static and dynamic obstacles coexist in a scene, and a target may change dynamically in the flight process; (2) the perception range of a single unmanned aerial vehicle is limited, and global information cannot be obtained; (3) the unmanned aerial vehicles can communicate with each other to share local airspace information; (4) interference and random loss exist in communication between the unmanned aerial vehicles; the multi-aircraft autonomic decision is broken down into two sub-problems: (1) planning a path; (2) the conflict is resolved. For path planning and conflict resolution, the optimization problem has been proven to be an NP-hard problem, and a heuristic algorithm is required for solving. Therefore, a method for solving the autonomous decision of multiple aircrafts can be completed through division: the two sub-problems are solved by solving the path planning and the conflict first, and then the solutions of the two sub-problems are combined to be used as a final solution.
As shown in fig. 4, the aircraft autonomous decision method includes the following steps:
step one, aiming at NUFrame aircraft and NTIn a scene formed by targets, each aircraft is respectively carried with a laser radar sensor, collision detection is carried out regularly in a detection range, and radar echo signals are sent;
the flight conflict detection adopts non-cooperative threat conflict detection based on a radar system, and the laser radar plays an important role in the autonomous navigation technology. The main performance parameters of the laser radar include the wavelength of laser light, the detection distance, and the field of view (FOV), which is divided into a horizontal field of view and a vertical field of view. The two most commonly used lidar wavelengths are 905nm and 1550 nm. The 1550nm wavelength radar sensor can operate at higher power, detecting distances further than the 905nm wavelength, but with a greater weight.
The invention is provided with NUFrame aircraft and NTEach aircraft corresponds to one target, and the initial value of the target is randomly set; n is a radical ofUAnd NTThe values of (A) are the same. For the ith aircraft XiAt time t, the state isActing asStatus of stateObtaining the position information of the static obstacle by the three-dimensional point cloud data returned by airborne measuring equipment of a laser radar sensor carried by an aircraft;
as shown in fig. 1, the detection range of the radar is: each aircraft is considered to be a mass point,the detection range angle of the horizontal FOV is theta for the radius of the maximum detection distanceiThe vertical FOV detection range angle is
Secondly, identifying static obstacles or other aircrafts in a detection range by each aircraft according to three-dimensional point cloud data returned by the radar echo signals;
the aircraft is regarded as a mass point, the aircraft sends radar echo signals within a detection range regularly, when other aircraft are detected, returned three-dimensional point cloud data are three-dimensional coordinates and speed directions of other aircraft, when static obstacles are detected, the returned three-dimensional point cloud data are boundary coordinates of the static obstacles, and if no obstacles exist, the returned three-dimensional point cloud data are 0.
Step three, aiming at the time slot t, establishing an autonomous decision modeling model by the three-dimensional point cloud data of the ith aircraft and other aircraft;
the invention describes the design process of the autonomous decision problem from three aspects of observation value, action and return function.
1) Observed value st: at each time T, T1, 2.., T represents the maximum time at which the aircraft reaches the target; because the agent in reinforcement learning makes a control decision based on the collected current state and the aircraft reward valueAn observation s is constructed firsttIth aircraft XiThe observed value of the state at time t is expressed asThe joint state of the multi-agent system composed of all aircrafts is expressed as
Wherein,indicated at time slot t, ith aircraft XiThe action at the time t isAccording to the motionCalculating the aircraft X after executing the actioniCurrent position ofAnd target positionThe distance betweenAnd judging whether the current task is finished or not.
Showing the ith aircraft X at time tiPerforming an actionRear aircraft XiCurrent position p oft iAnd the current position of the jth aircraft in the detection rangeTo determine if a conflict between aircraft has occurred, where the observation of other aircraft is noisy and has a delay of one time step.
At time t, the ith aircraft XiPerforming an actionRear aircraft XiCurrent position ofAnd the position p of the mth static obstacle in the detection rangemTo determine whether a collision occurs between the aircraft and the obstacle;
2) action at: from the perspective of the DRL mechanism, if the movement of the aircraft is characterized as an action, the action can cause a change in the environment, and the moving distance of the aircraft can determine the energy consumption of the aircraft. Thus representing reinforcement learning actions based on the flight direction acceleration of the aircraft movement model
ρj(t)∈[0,ρmax]Represents the pitch direction velocity, p, received by the jth aircraft at time t as the starting timemaxRepresenting the maximum speed in the pitch direction.
Representing the pitch direction acceleration received by the jth aircraft at time t as the starting time.Represents a maximum acceleration in the pitch direction;represents a minimum acceleration in the pitch direction;
Representing the yaw direction acceleration received by the jth aircraft at time t as the starting time.
Set atThe number of the medium elements is 2 x NUI, the slave agent received action atAnd then, the jth aircraft can be determined to hover at the current position or move to a new position, so that the control of the continuous movement of the aircraft is realized.
3) A return function rt: the objective of the autonomous decision problem is that the distance from each aircraft to the corresponding target point is shortest, so that three different constraints (the aircraft needs to finish the target, and the aircraft cannot collide with obstacles or the aircraft) exist, and in order to design a return function, the invention adopts the objective and the constraint for separately discussing the autonomous risk avoidance problem.
Firstly, the optimization goal of the multi-aircraft autonomous decision is that the path of each aircraft is shortest after the multi-aircraft autonomous decision reaches the goal, and then the objective function is expressed as:diand the distance between the ith aircraft and the target point corresponding to the aircraft is represented.
Besides, three constraint conditions are designed respectively, and the following constraints are required to be met:
(1) all the objectives are accomplished:
R1a reward function representing the arrival of each aircraft at a respective target location; the calculation formula is as follows:
i′∈{1,2,…,NT};Si′judging the completion degree of the target, if a certain target is not completed, Si′-1, otherwise, the goal is completed, Si′=0。
(2) No collision between the aircraft and the obstacle can occur:
R2and (3) a return function representing that each aircraft and the static obstacle can not collide with each other, and the calculation formula is as follows:Piis the path of the ith aircraft, representing the flight position coordinates of the aircraft at time T; dmRepresents the mth static obstacle; m is an element of [1, N ]M],NMRepresenting the total number of static obstacles in the scene.
(3) No collision between the aircraft can occur:
R3the method is characterized in that a return function which can not generate collision between any aircrafts is represented, and the calculation formula is as follows:
the position coordinates of the ith aircraft at the current moment are obtained;the position coordinate of the jth aircraft at the current moment is taken as the position coordinate of the jth aircraft at the current moment;
therefore, the multi-aircraft autonomous decision problem now turns into a combinatorial optimization problem, i.e. the autonomous conflict resolution model aims at the shortest distance from each aircraft to its own target point, and the objective function is as follows:
s.t.R1,R2,R3
solving the autonomous decision model of the multi-aircraft based on a multi-agent reinforcement learning (MADDPG) frame to obtain a reward function for selecting actions according to the input state;
the specific process is as follows:
1) establishing a multi-agent neural network
The state space and action space of each Agent (Agent) are abstracted to be completely consistent with the aircraft. The policy of each agent is determined by a parameter theta, denotes the NthUNeural network parameters for individual aircraft. The strategy of the agent is mu, representing the aircraft at a neural network parameter thetaiTime of day policy. Let the policy of the agent be a deterministic policy, the action of the agent is completely determined by its policy and its corresponding parameters:
aithe motion of the ith aircraft; o ° oiRepresenting the observation of the ith aircraft, including information on the distances between the agent and obstacles, targets and other agents; thetaiRepresenting neural network parameters for the ith aircraft.
representing an action network objective function; ex,a~DRepresenting a desire for a random strategy sequence;representing a joint observation of an agent;representing the Q value function, D represents the Experience pool (Experience Replay Buffer) in MADDPG, and contains the tuples:
x' represents the joint observation of the agent at the next moment;denotes the NthUA reward function for the rack aircraft;the action value function of the network strategy of the critic is expressed and completely realized by a neural network, named as a critic network, and is updated according to the following objective function:
ria reward function representing an ith aircraft; γ ∈ (0, 1) denotes the attenuation factor;denotes the NthUErecting the next moment of the aircraft; a'jRepresenting the action of the jth aircraft at the next moment; mu's'jPolicy, o for the next moment of the jth aircraftjRepresents an observation of the jth aircraft;andthe structure is identical, but the parameter update lags behindIs generated.The representation parameter update lags behindThe critic network strategy action value function has better physical meaning auxiliary action network training, and the action network is updated according to the following formula:
wherein J represents an action network objective function; s represents a small batch of samples drawn at random.
The model of the entire design is shown in fig. 2.
2) Reward function design
In order to meet the constraint conditions, the design of the reward function needs to be carried out on the MADDPG; as shown in fig. 3, the reward function includes the following:
(1) accumulating reward functions r set for shortest paths between each aircraft and the initial position of the respective targeta;
First, an initial r is seta=0;
Then, the ith aircraft XiAt time t, the state isActing asAccording to the motionCalculating the aircraft X after executing the actioniCurrent position ofAnd target positionThe distance betweenExpressed as:
finally, N is cumulatively calculatedUSelecting the sum of the distances between the current position and the target position of the aircraft after the aircraft moves at the moment t, and updating the reward function ra;
therefore, if the sum of the accumulated distances of the aircrafts is larger, the joint strategy is poorer; otherwise, the combination strategy is good.
(2) Impact aircraft and obstaclesSet reward function r for collision detectionb;
In order to ensure that the aircraft and the obstacle do not collide, collision detection is required, and first, initial r is setb=0;
Then, the ith aircraft X is calculatediAccording to the operation at time tCalculating the aircraft X after executing the actioniCurrent position ofAnd the position p of the mth static obstacle in the detection rangemThe distance between, expressed as:
further, the distance is determinedWhether or not less than aircraft XiMinimum safe distance n from static obstacleoIf so, setting a penalty valueOtherwiseSetting penalty values
For the ith aircraft XiAt time t, the aircraft XiThe distances between the static obstacles in the detection range and the minimum safe distance noJudging to obtain the sum of punishment values
Accumulation meterCalculate NUThe sum of the punishment values corresponding to the aircraft at the moment t respectively is updated, and the reward function r is updatedb:
Therefore, the closer the aircraft is to the obstacle, the smaller the joint revenue obtained from the overall multi-aircraft autonomic decision making.
(3) Reward function r set for collision detection between aircraft and aircraftc;
In order to ensure that no collision occurs between the aircraft and the aircraft, collision detection needs to be performed, and first, an initial r is setc=0;
Then, the ith aircraft X is calculatediAccording to the operation at time tCalculating the aircraft X after executing the actioniCurrent position ofAnd the current position of the jth aircraft in the detection rangeThe distance between, expressed as:
here a delay of one time step is set for the observation of other aircraft that is noisy.
Further, the distance is determinedWhether or not less than the collision distance n of the aircraftcAnd a proximity risk distance nm,nc<nm(ii) a If so, the mobile terminal can be started,then a penalty value is setOtherwise, when it is satisfiedThen a penalty value is setIf it satisfiesThen a penalty value is set
For the ith aircraft XiAt time t, the aircraft XiThe distance from all other aircraft is respectively the collision distance ncAnd a proximity risk distance nmJudging to obtain the sum of punishment values
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t is updated, and the punishment function r is updatedc:
Thus, the closer the aircraft is to other aircraft, the less the joint revenue the overall multi-aircraft autonomic decision may be to achieve.
And step five, the neural network learning module performs centralized training and decentralized execution based on the reward function, calculates all action values which can be taken based on a certain state through a converged neural network, and solves the multi-agent behavior action according to combined optimization.
Each agent contains an action Network (Actor Network) and a Critic Network (Critic Network). The Critic part of each Agent can acquire action information of all the other agents, centralized training and decentralized execution are carried out, namely during training, overall Critic capable of being observed is introduced to guide operator training, and during testing, only the operator with local observation is used for taking action.
Claims (4)
1. A multi-aircraft autonomous decision method under a scene fast-changing condition is characterized by comprising the following steps:
step one, aiming at NUFrame aircraft and NTIn a scene formed by targets, each aircraft is respectively carried with a laser radar sensor and regularly sends radar echo signals in a detection range;
secondly, identifying static obstacles or other aircrafts in a detection range by each aircraft according to three-dimensional point cloud data returned by the radar echo signals;
step three, aiming at the time slot t, constructing an autonomous conflict resolution model by the three-dimensional point cloud data of the ith aircraft and other aircraft;
the autonomous conflict resolution model aims at the shortest distance from each aircraft to the respective target point, and the objective function is as follows:
s.t.R1,R2,R3
direpresenting the distance between the ith aircraft and a target point corresponding to the aircraft;
the three constraints are as follows:
(1)R1a reward function representing the arrival of each aircraft at a respective target location; the calculation formula is as follows:
i′∈{1,2,…,NT};Si′judging the completion degree of the target, if a certain target is not completed, Si′-1, otherwise, the goal is completed, Si′=0;
(2)R2And (3) a return function representing that each aircraft and the static obstacle can not collide with each other, and the calculation formula is as follows:Piis the path of the ith aircraft, DmRepresents the mth static obstacle; m is an element of [1, N ]M],NMRepresenting a total number of static obstacles in the scene;
(3)R3the method is characterized in that a return function which can not generate collision between any aircrafts is represented, and the calculation formula is as follows:
the position coordinates of the ith aircraft at the current moment are obtained;the position coordinate of the jth aircraft at the current moment is taken as the position coordinate of the jth aircraft at the current moment;
solving the autonomous conflict resolution model of the multiple aircrafts based on the multi-agent reinforcement learning framework to obtain a reward function for selecting actions according to the input state;
the reward functions include the following:
(1) reward function r set for shortest path between each aircraft and initial position of respective targeta;
First, an initial r is seta=0;
Then, the ith aircraft XiAt time t, the state isActing asAccording to the motionCalculating the aircraft X after executing the actioniCurrent position ofAnd target positionThe distance betweenExpressed as:
finally, N is cumulatively calculatedUSelecting the sum of the distances between the current position and the target position of the aircraft after the aircraft moves at the moment t, and updating the reward function ra;
therefore, if the sum of the accumulated distances of the aircrafts is larger, the joint strategy is poorer; otherwise, the combination strategy is good;
(2) reward function r set for collision detection of aircraft and obstacleb;
First, an initial r is setb=0;
Then, the ith aircraft X is calculatediAccording to the operation at time tCalculating the aircraft X after executing the actioniCurrent position ofAnd the position p of the mth static obstacle in the detection rangemThe distance between, expressed as:
further, the distance is determinedWhether or not less than aircraft XiMinimum safe distance n from static obstacleoIf so, setting a penalty valueOtherwiseSetting penalty values
For the ith aircraft XiAt time t, the aircraft XiThe distances between the static obstacles in the detection range and the minimum safe distance noJudging to obtain the sum of punishment values
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t respectively is updated, and the reward function r is updatedb:
Therefore, the closer the aircraft is to the obstacle, the smaller the joint revenue obtained by the whole multi-aircraft autonomous decision;
(3) reward function r set for collision detection between aircraft and aircraftc;
First, an initial r is setc=0;
Then, the ith aircraft X is calculatediAccording to the operation at time tCalculating the aircraft X after executing the actioniCurrent position ofAnd the current position of the jth aircraft in the detection rangeThe distance between, expressed as:
where the observations of other aircraft are noisy and delayed by a time step;
further, the distance is determinedWhether or not less than the collision distance n of the aircraftcAnd a proximity risk distance nm,nc<nm(ii) a If so, the mobile terminal can be started,then a penalty value is setOtherwise, when it is satisfiedThen a penalty value is setIf it satisfiesThen a penalty value is set
For the ith aircraft XiAt time t, the aircraft XiThe distance from all other aircraft is respectively the collision distance ncAnd a proximity risk distance nmJudging to obtain the sum of punishment values
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t is updated, and the punishment function r is updatedc:
Therefore, the closer the aircraft is to other aircraft, the smaller the joint revenue obtained by the whole multi-aircraft autonomous decision;
and step five, the neural network learning module performs centralized training and decentralized execution based on the reward function, calculates all action values which can be taken based on a certain state through a converged neural network, and solves the multi-agent behavior action according to combined optimization.
2. The method for multi-aircraft autonomous decision making under the scene fast changing condition as claimed in claim 1, wherein in the first step, each aircraft corresponds to a target, and the initial value of the target is randomly set.
3. The method for multi-aircraft autonomous decision making under the condition of fast changing scenes as claimed in claim 1, wherein in step two, when other aircraft is detected, the returned three-dimensional point cloud data are the three-dimensional coordinates and the speed direction of the other aircraft, when static obstacle is detected, the returned three-dimensional point cloud data are the boundary coordinates of the static obstacle, and if no obstacle is present, the returned three-dimensional point cloud data are 0.
4. The method for multi-aircraft autonomous decision making under the condition of fast scene change according to claim 1, wherein in the fifth step, each Agent comprises an action Network Actor Network and a Critic Network, the Critic part of each Agent can acquire action information of all the other agents, and centralized training and decentralized execution are performed, that is, during training, action training is guided by introducing Critic observing the global state, and during testing, action is taken only by using the action with local observation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010575719.3A CN111897316B (en) | 2020-06-22 | 2020-06-22 | Multi-aircraft autonomous decision-making method under scene fast-changing condition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010575719.3A CN111897316B (en) | 2020-06-22 | 2020-06-22 | Multi-aircraft autonomous decision-making method under scene fast-changing condition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111897316A CN111897316A (en) | 2020-11-06 |
CN111897316B true CN111897316B (en) | 2021-05-14 |
Family
ID=73207769
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010575719.3A Active CN111897316B (en) | 2020-06-22 | 2020-06-22 | Multi-aircraft autonomous decision-making method under scene fast-changing condition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111897316B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11907335B2 (en) * | 2020-10-16 | 2024-02-20 | Cognitive Space | System and method for facilitating autonomous target selection |
CN112462804B (en) * | 2020-12-24 | 2022-05-10 | 四川大学 | Unmanned aerial vehicle perception and avoidance strategy based on ADS-B and ant colony algorithm |
CN114679757B (en) * | 2020-12-26 | 2023-11-03 | 中国航天科工飞航技术研究院(中国航天海鹰机电技术研究院) | Cross-zone switching method and device for ultra-high-speed low-vacuum pipeline aircraft |
CN112633415B (en) * | 2021-01-11 | 2023-05-19 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle cluster intelligent task execution method and device based on rule constraint training |
CN113705921B (en) * | 2021-09-03 | 2024-02-27 | 厦门闽江智慧科技有限公司 | Electric vehicle dynamic path planning optimization method based on hybrid charging strategy |
CN114115309B (en) * | 2021-11-24 | 2024-09-06 | 西北工业大学 | Planetary flight obstacle avoidance guidance method based on ARS reinforcement learning algorithm |
CN114115350B (en) * | 2021-12-02 | 2024-05-10 | 清华大学 | Aircraft control method, device and equipment |
CN114237235B (en) * | 2021-12-02 | 2024-01-19 | 之江实验室 | Mobile robot obstacle avoidance method based on deep reinforcement learning |
CN114237293B (en) * | 2021-12-16 | 2023-08-25 | 中国人民解放军海军航空大学 | Deep reinforcement learning formation transformation method and system based on dynamic target allocation |
CN113962031B (en) * | 2021-12-20 | 2022-03-29 | 北京航空航天大学 | Heterogeneous platform conflict resolution method based on graph neural network reinforcement learning |
CN117177275B (en) * | 2023-11-03 | 2024-01-30 | 中国人民解放军国防科技大学 | SCMA-MEC-based Internet of things equipment calculation rate optimization method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109116854A (en) * | 2018-09-16 | 2019-01-01 | 南京大学 | A kind of robot cooperated control method of multiple groups based on intensified learning and control system |
CN109725532A (en) * | 2018-12-24 | 2019-05-07 | 杭州电子科技大学 | One kind being applied to relative distance control and adaptive corrective method between multiple agent |
CN109992000A (en) * | 2019-04-04 | 2019-07-09 | 北京航空航天大学 | A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning |
WO2019234702A2 (en) * | 2018-06-08 | 2019-12-12 | Tata Consultancy Services Limited | Actor model based architecture for multi robot systems and optimized task scheduling method thereof |
CN110991545A (en) * | 2019-12-10 | 2020-04-10 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-agent confrontation oriented reinforcement learning training optimization method and device |
CN111045445A (en) * | 2019-10-23 | 2020-04-21 | 浩亚信息科技有限公司 | Aircraft intelligent collision avoidance method, equipment and medium based on reinforcement learning |
CN111103881A (en) * | 2019-12-25 | 2020-05-05 | 北方工业大学 | Multi-agent formation anti-collision control method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11533593B2 (en) * | 2018-05-01 | 2022-12-20 | New York University | System method and computer-accessible medium for blockchain-based distributed ledger for analyzing and tracking environmental targets |
-
2020
- 2020-06-22 CN CN202010575719.3A patent/CN111897316B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019234702A2 (en) * | 2018-06-08 | 2019-12-12 | Tata Consultancy Services Limited | Actor model based architecture for multi robot systems and optimized task scheduling method thereof |
CN109116854A (en) * | 2018-09-16 | 2019-01-01 | 南京大学 | A kind of robot cooperated control method of multiple groups based on intensified learning and control system |
CN109725532A (en) * | 2018-12-24 | 2019-05-07 | 杭州电子科技大学 | One kind being applied to relative distance control and adaptive corrective method between multiple agent |
CN109992000A (en) * | 2019-04-04 | 2019-07-09 | 北京航空航天大学 | A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning |
CN111045445A (en) * | 2019-10-23 | 2020-04-21 | 浩亚信息科技有限公司 | Aircraft intelligent collision avoidance method, equipment and medium based on reinforcement learning |
CN110991545A (en) * | 2019-12-10 | 2020-04-10 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-agent confrontation oriented reinforcement learning training optimization method and device |
CN111103881A (en) * | 2019-12-25 | 2020-05-05 | 北方工业大学 | Multi-agent formation anti-collision control method and system |
Non-Patent Citations (2)
Title |
---|
A SATISFICING CONFLICT RESOLUTION APPROACH FOR MULTIPLE UAVS;YUMENG LI ETC;《IEEE INTERNET OF THINGS JOURNAL》;20190430;第6卷(第2期);全文 * |
航迹预测的多无人机任务规划方法;齐乃明等;《哈尔滨工业大学学报》;20160430;第48卷(第4期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111897316A (en) | 2020-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111897316B (en) | Multi-aircraft autonomous decision-making method under scene fast-changing condition | |
CN110456823B (en) | Double-layer path planning method aiming at unmanned aerial vehicle calculation and storage capacity limitation | |
Tisdale et al. | Autonomous UAV path planning and estimation | |
CN113848974B (en) | Aircraft trajectory planning method and system based on deep reinforcement learning | |
CN110703804A (en) | Layering anti-collision control method for fixed-wing unmanned aerial vehicle cluster | |
CN112304314B (en) | Navigation method of distributed multi-robot | |
Li et al. | Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm | |
CN111811511A (en) | Unmanned aerial vehicle cluster real-time track generation method based on dimension reduction decoupling mechanism | |
CN113900449B (en) | Multi-unmanned aerial vehicle track planning method and device, unmanned aerial vehicle and storage medium | |
CN111880574B (en) | Unmanned aerial vehicle collision avoidance method and system | |
Chen et al. | Path planning and cooperative control for multiple UAVs based on consistency theory and Voronoi diagram | |
Bodi et al. | Reinforcement learning based UAV formation control in GPS-denied environment | |
CN114679729B (en) | Unmanned aerial vehicle cooperative multi-target detection method integrating radar communication | |
CN110825112B (en) | Oil field dynamic invasion target tracking system and method based on multiple unmanned aerial vehicles | |
CN116679751A (en) | Multi-aircraft collaborative search method considering flight constraint | |
Huang et al. | Cooperative collision avoidance method for multi-uav based on kalman filter and model predictive control | |
CN114138022A (en) | Distributed formation control method for unmanned aerial vehicle cluster based on elite pigeon swarm intelligence | |
CN117170238B (en) | Heterogeneous unmanned aerial vehicle cluster search algorithm based on collaborative distributed MPC | |
Duoxiu et al. | Proximal policy optimization for multi-rotor uav autonomous guidance, tracking and obstacle avoidance | |
CN116822362B (en) | Unmanned aerial vehicle conflict-free four-dimensional flight path planning method based on particle swarm optimization | |
CN113110593A (en) | Flight formation cooperative self-adaptive control method based on virtual structure and estimation information transmission | |
Yan et al. | Collaborative path planning based on MAXQ hierarchical reinforcement learning for manned/unmanned aerial vehicles | |
Zhang et al. | Survey of safety management approaches to unmanned aerial vehicles and enabling technologies | |
Lu et al. | Dual redundant UAV path planning and mission analysis based on Dubins curves | |
Sahawneh et al. | Path planning in the local-level frame for small unmanned aircraft systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |