CN111897316A - Multi-aircraft autonomous decision-making method under scene fast-changing condition - Google Patents
Multi-aircraft autonomous decision-making method under scene fast-changing condition Download PDFInfo
- Publication number
- CN111897316A CN111897316A CN202010575719.3A CN202010575719A CN111897316A CN 111897316 A CN111897316 A CN 111897316A CN 202010575719 A CN202010575719 A CN 202010575719A CN 111897316 A CN111897316 A CN 111897316A
- Authority
- CN
- China
- Prior art keywords
- aircraft
- distance
- action
- ith
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/0088—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
- G05D1/104—Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying
Abstract
The invention discloses a multi-aircraft autonomous decision-making method under a scene fast-changing condition, belonging to the technical field of aircrafts; the multi-aircraft autonomous decision method under the scene fast-changing condition specifically comprises the following steps: firstly, carrying a laser radar on each aircraft respectively for target detection, and identifying static obstacles or other aircraft in a detection range according to returned three-dimensional point cloud data; then, an autonomous conflict resolution model is constructed by utilizing three-dimensional point cloud data of the aircraft; solving based on a multi-agent reinforcement learning framework to obtain a reward function of selecting actions according to an input state; and finally, the neural network learning module performs centralized training and decentralized execution based on a reward function, calculates all action values which can be taken based on a certain state through a converged neural network, and solves the multi-agent behavior action according to combined optimization. The invention can utilize the transfer learning technology to carry out inheritance training when scene information changes, and has better transfer property.
Description
Technical Field
The invention belongs to the technical field of aircrafts, relates to a conflict resolution method, and particularly relates to a multi-aircraft autonomous decision-making method under a scene fast-changing condition.
Background
With the rapid development of aeronautical science and technology, in complex and severe high-risk operating environments, low-altitude small aircrafts are widely applied in the aspects of aerial surveillance, forest rescue, reconnaissance and exploration, military application and the like. Therefore, the problems of path planning and conflict resolution in the autonomous decision-making of the multiple aircrafts cause wide attention of scholars at home and abroad.
The actual low-altitude operation environment has the most important characteristics that the scene is complex and highly dynamic, and a dynamic threat with unknown motion characteristics can exist, and in many actual tasks, the targets of the intelligent agent are not static generally but dynamic, while the regulation and control of the existing aircraft mainly depend on a pre-planned or established action set, and the future complex and dynamic scene is difficult to adapt.
The autonomous decision making of multiple aircrafts is a typical multi-agent cooperation problem, and the agents are expected to have the ability of learning to the environment, namely, automatically acquiring knowledge, accumulating experience, continuously updating and expanding the knowledge, and improving the knowledge performance. Learning capabilities are the ability of an agent to update knowledge through experimentation, observation and speculation. The intelligent agent can improve self adaptive ability only through continuous learning, and obtains knowledge by means of continuous interaction with the environment.
Disclosure of Invention
Aiming at the problems, the invention provides the multi-aircraft autonomous decision-making method under the scene fast-changing condition, which fully considers the dynamic property of the scene and improves the learning capacity of the multi-aircraft.
The multi-aircraft autonomous decision method under the scene fast-changing condition comprises the following steps:
step one, aiming at NUFrame aircraft and NTIn a scene formed by targets, each aircraft is respectively carried with a laser radar sensor and regularly sends radar echo signals in a detection range;
each aircraft corresponds to one target respectively, and the initial value of the target is set randomly. N is a radical ofUAnd NTThe values of (A) are the same.
The detection range is as follows: each aircraft is considered to be a mass point,the horizontal detection range angle is theta for the radius of the maximum detection distanceiAngle of vertical detection range of
Secondly, identifying static obstacles or other aircrafts in a detection range by each aircraft according to three-dimensional point cloud data returned by the radar echo signals;
when other aircrafts are detected, the returned three-dimensional point cloud data are the three-dimensional coordinates and the speed directions of the other aircrafts, when static obstacles are detected, the returned three-dimensional point cloud data are the boundary coordinates of the static obstacles, and if no obstacle exists, the returned three-dimensional point cloud data are 0.
Step three, aiming at the time slot t, constructing an autonomous conflict resolution model by the three-dimensional point cloud data of the ith aircraft and other aircraft;
the autonomous conflict resolution model aims at the shortest distance from each aircraft to the respective target point, and the objective function is as follows:
s.t.R1,R2,R3
diand the distance between the ith aircraft and the target point corresponding to the aircraft is represented.
The three constraints are as follows:
(1)R1a reward function representing the arrival of each aircraft at a respective target location; the calculation formula is as follows:
i′∈{1,2,…,NT};Si′judging the completion degree of the target, if a certain target is not completed, Si′-1, otherwise, the goal is completed, Si′=0。
(2)R2And (3) a return function representing that each aircraft and the static obstacle can not collide with each other, and the calculation formula is as follows:Piis the path of the ith aircraft, DmRepresents the mth static obstacle; m is an element of [1, N ]M],NMRepresenting the total number of static obstacles in the scene.
(3)R3The method is characterized in that a return function which can not generate collision between any aircrafts is represented, and the calculation formula is as follows:
the position coordinates of the ith aircraft at the current moment are obtained;the position coordinate of the jth aircraft at the current moment is taken as the position coordinate of the jth aircraft at the current moment;
solving the autonomous conflict resolution model of the multiple aircrafts based on the multi-agent reinforcement learning framework to obtain a reward function for selecting actions according to the input state;
the reward functions include the following:
(1) reward function r set for shortest path between each aircraft and initial position of respective targeta;
First, an initial r is seta=0;
Then, the ith aircraft XiAt time t, the state isActing asAccording to the motionCalculating the aircraft X after executing the actioniCurrent position ofAnd target positionThe distance betweenExpressed as:
finally, N is cumulatively calculatedUSelecting the sum of the distances between the current position and the target position of the aircraft after the aircraft moves at the moment t, and updating the reward function ra;
therefore, if the sum of the accumulated distances of the aircrafts is larger, the joint strategy is poorer; otherwise, the combination strategy is good.
(2) Reward function r set for collision detection of aircraft and obstacleb;
First, an initial r is setb=0;
Then, the ith aircraft X is calculatediAccording to the operation at time tCalculating the aircraft X after executing the actioniCurrent position ofAnd the position p of the mth static obstacle in the detection rangemThe distance between, expressed as:
further, the distance is determinedWhether or not less than aircraft XiMinimum safe distance n from static obstacleoIf so, setting a penalty valueOtherwiseSetting penalty values
For the ith aircraft XiAt time t, the aircraft XiThe distances between the static obstacles in the detection range and the minimum safe distance noJudging to obtain the sum of punishment values
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t respectively is updated, and the reward function r is updatedb:
Therefore, the closer the aircraft is to the obstacle, the smaller the joint revenue obtained from the overall multi-aircraft autonomic decision making.
(3) Reward function r set for collision detection between aircraft and aircraftc;
First, an initial r is setc=0;
Then, the ith aircraft X is calculatediAccording to the operation at time tCalculating the aircraft X after executing the actioniCurrent position ofAnd the current position of the jth aircraft in the detection rangeThe distance between, expressed as:
here a delay of one time step is set for the observation of other aircraft that is noisy.
Further, the distance is determinedWhether or not less than the collision distance n of the aircraftcAnd a proximity risk distance nm,nc<nm(ii) a If so, the mobile terminal can be started,then a penalty value is setOtherwise, when it is satisfiedThen a penalty value is setIf it satisfiesThen a penalty value is set
For the ith aircraft XiAt time t, the aircraft XiWith all other fliesDistance of the traveling device and collision distance ncAnd a proximity risk distance nmJudging to obtain the sum of punishment values
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t is updated, and the punishment function r is updatedc:
Thus, the closer the aircraft is to other aircraft, the less the joint revenue the overall multi-aircraft autonomic decision may be to achieve.
And step five, the neural network learning module performs centralized training and decentralized execution based on the reward function, calculates all action values which can be taken based on a certain state through a converged neural network, and solves the multi-agent behavior action according to combined optimization.
The invention has the advantages that:
(1) the multi-aircraft autonomous decision method under the scene fast-changing condition has important practical significance by taking the scene with complex and high dynamics of a low-altitude airspace, unknown running characteristics of multiple elements, more complex coupling relation between the airspace environment and a traffic object and complex and fast-changing tasks as a research background.
(2) The invention relates to a multi-aircraft autonomous decision method under a scene fast-changing condition, which not only fully considers the dynamic property of the scene, but also considers incomplete information and non-ideal communication, and provides a method for guiding the autonomous decision of an aircraft.
Drawings
Fig. 1 is a schematic diagram of the detection range of a laser radar when an aircraft performs collision detection according to the present invention.
FIG. 2 is a diagram of a multi-agent reinforcement learning model according to the present invention.
Fig. 3 is a schematic view of the aircraft safety distance of the present invention.
Fig. 4 is a flowchart of a multi-aircraft autonomous decision method under a scene fast change condition according to the present invention.
Detailed Description
The present invention will be described in further detail and with reference to the accompanying drawings so that those skilled in the art can understand and practice the invention.
The invention provides a multi-aircraft autonomous decision method under a scene fast-changing condition, which aims at a complex high-dynamic scene and has the following characteristics: (1) static and dynamic obstacles coexist in a scene, and a target may change dynamically in the flight process; (2) the perception range of a single unmanned aerial vehicle is limited, and global information cannot be obtained; (3) the unmanned aerial vehicles can communicate with each other to share local airspace information; (4) interference and random loss exist in communication between the unmanned aerial vehicles; the multi-aircraft autonomic decision is broken down into two sub-problems: (1) planning a path; (2) the conflict is resolved. For path planning and conflict resolution, the optimization problem has been proven to be an NP-hard problem, and a heuristic algorithm is required for solving. Therefore, a method for solving the autonomous decision of multiple aircrafts can be completed through division: the two sub-problems are solved by solving the path planning and the conflict first, and then the solutions of the two sub-problems are combined to be used as a final solution.
As shown in fig. 4, the aircraft autonomous decision method includes the following steps:
step one, aiming at NUFrame aircraft and NTIn a scene formed by targets, each aircraft is respectively carried with a laser radar sensor, collision detection is carried out regularly in a detection range, and radar echo signals are sent;
the flight conflict detection adopts non-cooperative threat conflict detection based on a radar system, and the laser radar plays an important role in the autonomous navigation technology. The main performance parameters of the laser radar include the wavelength of laser light, the detection distance, and the field of view (FOV), which is divided into a horizontal field of view and a vertical field of view. The two most commonly used lidar wavelengths are 905nm and 1550 nm. The 1550nm wavelength radar sensor can operate at higher power, detecting distances further than the 905nm wavelength, but with a greater weight.
The invention is provided with NUFrame aircraft and NTEach aircraft corresponds to one target, and the initial value of the target is randomly set; n is a radical ofUAnd NTThe values of (A) are the same. For the ith aircraft XiAt time t, the state isActing asStatus of stateObtaining the position information of the static obstacle by the three-dimensional point cloud data returned by airborne measuring equipment of a laser radar sensor carried by an aircraft;
as shown in fig. 1, the detection range of the radar is: each aircraft is considered to be a mass point,the detection range angle of the horizontal FOV is theta for the radius of the maximum detection distanceiThe vertical FOV detection range angle is
Secondly, identifying static obstacles or other aircrafts in a detection range by each aircraft according to three-dimensional point cloud data returned by the radar echo signals;
the aircraft is regarded as a mass point, the aircraft sends radar echo signals within a detection range regularly, when other aircraft are detected, returned three-dimensional point cloud data are three-dimensional coordinates and speed directions of other aircraft, when static obstacles are detected, the returned three-dimensional point cloud data are boundary coordinates of the static obstacles, and if no obstacles exist, the returned three-dimensional point cloud data are 0.
Step three, aiming at the time slot t, establishing an autonomous decision modeling model by the three-dimensional point cloud data of the ith aircraft and other aircraft;
the invention describes the design process of the autonomous decision problem from three aspects of observation value, action and return function.
1) Observed value st: at each time T, T1, 2.., T represents the maximum time at which the aircraft reaches the target; since the agent in reinforcement learning makes a control decision based on the collected current state and the aircraft reward value, an observation s is first constructedtIth aircraft XiThe observed value of the state at time t is expressed asThe joint state of the multi-agent system composed of all aircrafts is expressed as
Wherein the content of the first and second substances,indicated at time slot t, ith aircraft XiThe action at the time t isAccording to the motionCalculating the aircraft X after executing the actioniCurrent position ofAnd target positionThe distance betweenAnd judging whether the current task is finished or not.
Showing the ith aircraft X at time tiPerforming an actionRear aircraft XiCurrent position p oft iAnd the current position of the jth aircraft in the detection rangeTo determine if a conflict between aircraft has occurred, where the observation of other aircraft is noisy and has a delay of one time step.
At time t, the ith aircraft XiPerforming an actionRear aircraft XiCurrent position ofAnd the position p of the mth static obstacle in the detection rangemTo determine whether a collision occurs between the aircraft and the obstacle;
2) action at: from the perspective of the DRL mechanism, if the movement of the aircraft is characterized as an action, the action can cause a change in the environment, and the moving distance of the aircraft can determine the energy consumption of the aircraft. Thus representing reinforcement learning actions based on the flight direction acceleration of the aircraft movement model
ρj(t)∈[0,ρmax]Represents the pitch direction velocity, p, received by the jth aircraft at time t as the starting timemaxRepresenting the maximum speed in the pitch direction.
Representing the pitch direction acceleration received by the jth aircraft at time t as the starting time.Represents a maximum acceleration in the pitch direction;represents a minimum acceleration in the pitch direction;
Representing the yaw direction acceleration received by the jth aircraft at time t as the starting time.
Set atThe number of the medium elements is 2 x NUI, the slave agent received action atAnd then, the jth aircraft can be determined to hover at the current position or move to a new position, so that the control of the continuous movement of the aircraft is realized.
3) A return function rt: the objective of the autonomous decision problem is that the distance from each aircraft to the corresponding target point is shortest, so that three different constraints (the aircraft needs to finish the target, and the aircraft cannot collide with obstacles or the aircraft) exist, and in order to design a return function, the invention adopts the objective and the constraint for separately discussing the autonomous risk avoidance problem.
Firstly, the optimization goal of the multi-aircraft autonomous decision is that the path of each aircraft is shortest after the multi-aircraft autonomous decision reaches the goal, and then the objective function is expressed as:diand the distance between the ith aircraft and the target point corresponding to the aircraft is represented.
Besides, three constraint conditions are designed respectively, and the following constraints are required to be met:
(1) all the objectives are accomplished:
R1a reward function representing the arrival of each aircraft at a respective target location; the calculation formula is as follows:
i′∈{1,2,…,NT};Si′judging the completion degree of the target, if a certain target is not completed, Si′-1, otherwise, the goal is completed, Si′=0。
(2) No collision between the aircraft and the obstacle can occur:
R2and (3) a return function representing that each aircraft and the static obstacle can not collide with each other, and the calculation formula is as follows:Piis the path of the ith aircraft, representing the flight position coordinates of the aircraft at time T; dmRepresents the mth static obstacle; m is an element of [1, N ]M],NMRepresenting the total number of static obstacles in the scene.
(3) No collision between the aircraft can occur:
R3the method is characterized in that a return function which can not generate collision between any aircrafts is represented, and the calculation formula is as follows:
the position coordinates of the ith aircraft at the current moment are obtained;the position coordinate of the jth aircraft at the current moment is taken as the position coordinate of the jth aircraft at the current moment;
therefore, the multi-aircraft autonomous decision problem now turns into a combinatorial optimization problem, i.e. the autonomous conflict resolution model aims at the shortest distance from each aircraft to its own target point, and the objective function is as follows:
s.t.R1,R2,R3
solving the autonomous decision model of the multi-aircraft based on a multi-agent reinforcement learning (MADDPG) frame to obtain a reward function for selecting actions according to the input state;
the specific process is as follows:
1) establishing a multi-agent neural network
The state space and action space of each Agent (Agent) are abstracted to be completely consistent with the aircraft. The policy of each agent is determined by a parameter theta, denotes the NthUNeural network parameters for individual aircraft. The strategy of the agent is mu, representing the aircraft at a neural network parameter thetaiTime of day policy. Let the policy of the agent be a deterministic policy, the action of the agent is completely determined by its policy and its corresponding parameters:
aithe motion of the ith aircraft; o ° oiRepresenting the observation of the ith aircraft, including information on the distances between the agent and obstacles, targets and other agents; thetaiRepresenting neural network parameters for the ith aircraft.
representing an action network objective function; ex,a~DRepresenting a desire for a random strategy sequence;representing a joint observation of an agent;representing the Q value function, D represents the Experience pool (Experience Replay Buffer) in MADDPG, and contains the tuples:
x' represents the joint observation of the agent at the next moment;denotes the NthUA reward function for the rack aircraft;the action value function of the network strategy of the critic is expressed and completely realized by a neural network, named as a critic network, and is updated according to the following objective function:
ria reward function representing an ith aircraft; γ ∈ (0, 1) denotes the attenuation factor;denotes the NthUErecting the next moment of the aircraft; a'jRepresenting the action of the jth aircraft at the next moment; mu's'jPolicy, o for the next moment of the jth aircraftjRepresents an observation of the jth aircraft;andthe structure is identical, but the parameter update lags behindIs generated.The representation parameter update lags behindThe critic network strategy action value function has better physical meaning auxiliary action network training, and the action network is updated according to the following formula:
wherein J represents an action network objective function; s represents a small batch of samples drawn at random.
The model of the entire design is shown in fig. 2.
2) Reward function design
In order to meet the constraint conditions, the design of the reward function needs to be carried out on the MADDPG; as shown in fig. 3, the reward function includes the following:
(1) accumulating reward functions r set for shortest paths between each aircraft and the initial position of the respective targeta;
First, an initial r is seta=0;
Then, the ith aircraft XiAt time t, the state isActing asAccording to the motionCalculating the aircraft X after executing the actioniCurrent position ofAnd target positionThe distance betweenExpressed as:
finally, N is cumulatively calculatedUSelecting the sum of the distances between the current position and the target position of the aircraft after the aircraft moves at the moment t, and updating the reward function ra;
therefore, if the sum of the accumulated distances of the aircrafts is larger, the joint strategy is poorer; otherwise, the combination strategy is good.
(2) Reward function r set for collision detection of aircraft and obstacleb;
In order to ensure that the aircraft and the obstacle do not collide, collision detection is required, and first, initial r is setb=0;
Then, the ith aircraft X is calculatediAccording to the operation at time tCalculating the aircraft X after executing the actioniCurrent position ofAnd the position p of the mth static obstacle in the detection rangemThe distance between, expressed as:
further, the distance is determinedWhether or not less than aircraft XiMinimum safe distance n from static obstacleoIf so, setting a penalty valueOtherwiseSetting penalty values
For the ith aircraft XiAt time t, the aircraft XiThe distances between the static obstacles in the detection range and the minimum safe distance noJudging to obtain the sum of punishment values
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t respectively is updated, and the reward function r is updatedb:
Therefore, the closer the aircraft is to the obstacle, the smaller the joint revenue obtained from the overall multi-aircraft autonomic decision making.
(3) Reward function r set for collision detection between aircraft and aircraftc;
In order to ensure that no collision occurs between the aircraft and the aircraft, collision detection needs to be performed, and first, an initial r is setc=0;
Then, the ith aircraft X is calculatediAccording to the operation at time tCalculating the aircraft X after executing the actioniCurrent position ofAnd the current position of the jth aircraft in the detection rangeThe distance between, expressed as:
here a delay of one time step is set for the observation of other aircraft that is noisy.
Further, the distance is determinedWhether or not less than the collision distance n of the aircraftcAnd approach toDistance at risk nm,nc<nm(ii) a If so, the mobile terminal can be started,then a penalty value is setOtherwise, when it is satisfiedThen a penalty value is setIf it satisfiesThen a penalty value is set
For the ith aircraft XiAt time t, the aircraft XiThe distance from all other aircraft is respectively the collision distance ncAnd a proximity risk distance nmJudging to obtain the sum of punishment values
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t is updated, and the punishment function r is updatedc:
Thus, the closer the aircraft is to other aircraft, the less the joint revenue the overall multi-aircraft autonomic decision may be to achieve.
And step five, the neural network learning module performs centralized training and decentralized execution based on the reward function, calculates all action values which can be taken based on a certain state through a converged neural network, and solves the multi-agent behavior action according to combined optimization.
Each agent contains an action Network (Actor Network) and a critic Network (CriticNetwork). The Critic part of each Agent can acquire action information of all the other agents, centralized training and decentralized execution are carried out, namely during training, overall Critic capable of being observed is introduced to guide operator training, and during testing, only the operator with local observation is used for taking action.
Claims (4)
1. A multi-aircraft autonomous decision method under a scene fast-changing condition is characterized by comprising the following steps:
step one, aiming at NUFrame aircraft and NTIn a scene formed by targets, each aircraft is respectively carried with a laser radar sensor and regularly sends radar echo signals in a detection range;
secondly, identifying static obstacles or other aircrafts in a detection range by each aircraft according to three-dimensional point cloud data returned by the radar echo signals;
step three, aiming at the time slot t, constructing an autonomous conflict resolution model by the three-dimensional point cloud data of the ith aircraft and other aircraft;
the autonomous conflict resolution model aims at the shortest distance from each aircraft to the respective target point, and the objective function is as follows:
s.t.R1,R2,R3
direpresenting the distance between the ith aircraft and a target point corresponding to the aircraft;
the three constraints are as follows:
(1)R1a reward function representing the arrival of each aircraft at a respective target location; the calculation formula is as follows:
i′∈{1,2,…,NT};Si′judging the completion degree of the target, if a certain target is not completed, Si′-1, otherwise, the goal is completed, Si′=0;
(2)R2And (3) a return function representing that each aircraft and the static obstacle can not collide with each other, and the calculation formula is as follows:Piis the path of the ith aircraft, DmRepresents the mth static obstacle; m is an element of [1, N ]M],NMRepresenting a total number of static obstacles in the scene;
(3)R3the method is characterized in that a return function which can not generate collision between any aircrafts is represented, and the calculation formula is as follows:
the position coordinates of the ith aircraft at the current moment are obtained;the position coordinate of the jth aircraft at the current moment is taken as the position coordinate of the jth aircraft at the current moment;
solving the autonomous conflict resolution model of the multiple aircrafts based on the multi-agent reinforcement learning framework to obtain a reward function for selecting actions according to the input state;
the reward functions include the following:
(1) reward function r set for shortest path between each aircraft and initial position of respective targeta;
First, an initial r is seta=0;
Then, the ith aircraft XiAt time t, the state isActing asAccording to the motionCalculating the aircraft X after executing the actioniCurrent position ofAnd target positionThe distance betweenExpressed as:
finally, N is cumulatively calculatedUSelecting the sum of the distances between the current position and the target position of the aircraft after the aircraft moves at the moment t, and updating the reward function ra;
therefore, if the sum of the accumulated distances of the aircrafts is larger, the joint strategy is poorer; otherwise, the combination strategy is good;
(2) reward function r set for collision detection of aircraft and obstacleb;
First, an initial r is setb=0;
Then, the ith aircraft X is calculatediAccording to the operation at time tComputingAfter performing the action, the aircraft XiCurrent position ofAnd the position p of the mth static obstacle in the detection rangemThe distance between, expressed as:
further, the distance is determinedWhether or not less than aircraft XiMinimum safe distance n from static obstacleoIf so, setting a penalty valueOtherwiseSetting penalty values
For the ith aircraft XiAt time t, the aircraft XiThe distances between the static obstacles in the detection range and the minimum safe distance noJudging to obtain the sum of punishment values
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t respectively is updated, and the reward function r is updatedb:
Therefore, the closer the aircraft is to the obstacle, the smaller the joint revenue obtained by the whole multi-aircraft autonomous decision;
(3) reward function r set for collision detection between aircraft and aircraftc;
First, an initial r is setc=0;
Then, the ith aircraft X is calculatediAccording to the operation at time tCalculating the aircraft X after executing the actioniCurrent position ofAnd the current position of the jth aircraft in the detection rangeThe distance between, expressed as:
where the observations of other aircraft are noisy and delayed by a time step;
further, the distance is determinedWhether or not less than the collision distance n of the aircraftcAnd a proximity risk distance nm,nc<nm(ii) a If so, the mobile terminal can be started,then a penalty value is setOtherwise, when it is satisfiedThen a penalty value is setIf it satisfiesThen a penalty value is set
For the ith aircraft XiAt time t, the aircraft XiThe distance from all other aircraft is respectively the collision distance ncAnd a proximity risk distance nmJudging to obtain the sum of punishment values
Cumulative calculation of NUThe sum of the punishment values corresponding to the aircraft at the moment t is updated, and the punishment function r is updatedc:
Therefore, the closer the aircraft is to other aircraft, the smaller the joint revenue obtained by the whole multi-aircraft autonomous decision;
and step five, the neural network learning module performs centralized training and decentralized execution based on the reward function, calculates all action values which can be taken based on a certain state through a converged neural network, and solves the multi-agent behavior action according to combined optimization.
2. The method for multi-aircraft autonomous decision making under the scene fast changing condition as claimed in claim 1, wherein in the first step, each aircraft corresponds to a target, and the initial value of the target is randomly set.
3. The method for multi-aircraft autonomous decision making under the condition of fast changing scenes as claimed in claim 1, wherein in step two, when other aircraft is detected, the returned three-dimensional point cloud data are the three-dimensional coordinates and the speed direction of the other aircraft, when static obstacle is detected, the returned three-dimensional point cloud data are the boundary coordinates of the static obstacle, and if no obstacle is present, the returned three-dimensional point cloud data are 0.
4. The method for multi-aircraft autonomous decision making under scene fast changing conditions as claimed in claim 1, wherein in the fifth step, each agent comprises an action Network (Actor Network) and a critic Network (critic Network). The Critic part of each Agent can acquire action information of all the other agents, centralized training and decentralized execution are carried out, namely during training, overall Critic for observation is introduced to guide the training of an actor, and during testing, only the actor with local observation is used for taking action.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010575719.3A CN111897316B (en) | 2020-06-22 | 2020-06-22 | Multi-aircraft autonomous decision-making method under scene fast-changing condition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010575719.3A CN111897316B (en) | 2020-06-22 | 2020-06-22 | Multi-aircraft autonomous decision-making method under scene fast-changing condition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111897316A true CN111897316A (en) | 2020-11-06 |
CN111897316B CN111897316B (en) | 2021-05-14 |
Family
ID=73207769
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010575719.3A Active CN111897316B (en) | 2020-06-22 | 2020-06-22 | Multi-aircraft autonomous decision-making method under scene fast-changing condition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111897316B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112462804A (en) * | 2020-12-24 | 2021-03-09 | 四川大学 | Unmanned aerial vehicle perception and avoidance strategy based on ADS-B and ant colony algorithm |
CN112633415A (en) * | 2021-01-11 | 2021-04-09 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle cluster intelligent task execution method and device based on rule constraint training |
CN113705921A (en) * | 2021-09-03 | 2021-11-26 | 厦门闽江智慧科技有限公司 | Electric vehicle dynamic path planning optimization solving method based on hybrid charging strategy |
CN113962031A (en) * | 2021-12-20 | 2022-01-21 | 北京航空航天大学 | Heterogeneous platform conflict resolution method based on graph neural network reinforcement learning |
CN114115350A (en) * | 2021-12-02 | 2022-03-01 | 清华大学 | Aircraft control method, device and equipment |
CN114237293A (en) * | 2021-12-16 | 2022-03-25 | 中国人民解放军海军航空大学 | Deep reinforcement learning formation transformation method and system based on dynamic target allocation |
CN114237235A (en) * | 2021-12-02 | 2022-03-25 | 之江实验室 | Mobile robot obstacle avoidance method based on deep reinforcement learning |
CN114679757A (en) * | 2020-12-26 | 2022-06-28 | 中国航天科工飞航技术研究院(中国航天海鹰机电技术研究院) | Ultra-high-speed low-vacuum pipeline aircraft handover switching method and device |
CN117177275A (en) * | 2023-11-03 | 2023-12-05 | 中国人民解放军国防科技大学 | SCMA-MEC-based Internet of things equipment calculation rate optimization method |
US11907335B2 (en) * | 2020-10-16 | 2024-02-20 | Cognitive Space | System and method for facilitating autonomous target selection |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109116854A (en) * | 2018-09-16 | 2019-01-01 | 南京大学 | A kind of robot cooperated control method of multiple groups based on intensified learning and control system |
CN109725532A (en) * | 2018-12-24 | 2019-05-07 | 杭州电子科技大学 | One kind being applied to relative distance control and adaptive corrective method between multiple agent |
CN109992000A (en) * | 2019-04-04 | 2019-07-09 | 北京航空航天大学 | A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning |
US20190342731A1 (en) * | 2018-05-01 | 2019-11-07 | New York University | System method and computer-accessible medium for blockchain-based distributed ledger for analyzing and tracking environmental targets |
WO2019234702A2 (en) * | 2018-06-08 | 2019-12-12 | Tata Consultancy Services Limited | Actor model based architecture for multi robot systems and optimized task scheduling method thereof |
CN110991545A (en) * | 2019-12-10 | 2020-04-10 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-agent confrontation oriented reinforcement learning training optimization method and device |
CN111045445A (en) * | 2019-10-23 | 2020-04-21 | 浩亚信息科技有限公司 | Aircraft intelligent collision avoidance method, equipment and medium based on reinforcement learning |
CN111103881A (en) * | 2019-12-25 | 2020-05-05 | 北方工业大学 | Multi-agent formation anti-collision control method and system |
-
2020
- 2020-06-22 CN CN202010575719.3A patent/CN111897316B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190342731A1 (en) * | 2018-05-01 | 2019-11-07 | New York University | System method and computer-accessible medium for blockchain-based distributed ledger for analyzing and tracking environmental targets |
WO2019234702A2 (en) * | 2018-06-08 | 2019-12-12 | Tata Consultancy Services Limited | Actor model based architecture for multi robot systems and optimized task scheduling method thereof |
CN109116854A (en) * | 2018-09-16 | 2019-01-01 | 南京大学 | A kind of robot cooperated control method of multiple groups based on intensified learning and control system |
CN109725532A (en) * | 2018-12-24 | 2019-05-07 | 杭州电子科技大学 | One kind being applied to relative distance control and adaptive corrective method between multiple agent |
CN109992000A (en) * | 2019-04-04 | 2019-07-09 | 北京航空航天大学 | A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning |
CN111045445A (en) * | 2019-10-23 | 2020-04-21 | 浩亚信息科技有限公司 | Aircraft intelligent collision avoidance method, equipment and medium based on reinforcement learning |
CN110991545A (en) * | 2019-12-10 | 2020-04-10 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-agent confrontation oriented reinforcement learning training optimization method and device |
CN111103881A (en) * | 2019-12-25 | 2020-05-05 | 北方工业大学 | Multi-agent formation anti-collision control method and system |
Non-Patent Citations (2)
Title |
---|
YUMENG LI ETC: "A SATISFICING CONFLICT RESOLUTION APPROACH FOR MULTIPLE UAVS", 《IEEE INTERNET OF THINGS JOURNAL》 * |
齐乃明等: "航迹预测的多无人机任务规划方法", 《哈尔滨工业大学学报》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11907335B2 (en) * | 2020-10-16 | 2024-02-20 | Cognitive Space | System and method for facilitating autonomous target selection |
CN112462804B (en) * | 2020-12-24 | 2022-05-10 | 四川大学 | Unmanned aerial vehicle perception and avoidance strategy based on ADS-B and ant colony algorithm |
CN112462804A (en) * | 2020-12-24 | 2021-03-09 | 四川大学 | Unmanned aerial vehicle perception and avoidance strategy based on ADS-B and ant colony algorithm |
CN114679757B (en) * | 2020-12-26 | 2023-11-03 | 中国航天科工飞航技术研究院(中国航天海鹰机电技术研究院) | Cross-zone switching method and device for ultra-high-speed low-vacuum pipeline aircraft |
CN114679757A (en) * | 2020-12-26 | 2022-06-28 | 中国航天科工飞航技术研究院(中国航天海鹰机电技术研究院) | Ultra-high-speed low-vacuum pipeline aircraft handover switching method and device |
CN112633415B (en) * | 2021-01-11 | 2023-05-19 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle cluster intelligent task execution method and device based on rule constraint training |
CN112633415A (en) * | 2021-01-11 | 2021-04-09 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle cluster intelligent task execution method and device based on rule constraint training |
CN113705921B (en) * | 2021-09-03 | 2024-02-27 | 厦门闽江智慧科技有限公司 | Electric vehicle dynamic path planning optimization method based on hybrid charging strategy |
CN113705921A (en) * | 2021-09-03 | 2021-11-26 | 厦门闽江智慧科技有限公司 | Electric vehicle dynamic path planning optimization solving method based on hybrid charging strategy |
CN114237235B (en) * | 2021-12-02 | 2024-01-19 | 之江实验室 | Mobile robot obstacle avoidance method based on deep reinforcement learning |
CN114237235A (en) * | 2021-12-02 | 2022-03-25 | 之江实验室 | Mobile robot obstacle avoidance method based on deep reinforcement learning |
CN114115350A (en) * | 2021-12-02 | 2022-03-01 | 清华大学 | Aircraft control method, device and equipment |
CN114237293A (en) * | 2021-12-16 | 2022-03-25 | 中国人民解放军海军航空大学 | Deep reinforcement learning formation transformation method and system based on dynamic target allocation |
CN114237293B (en) * | 2021-12-16 | 2023-08-25 | 中国人民解放军海军航空大学 | Deep reinforcement learning formation transformation method and system based on dynamic target allocation |
CN113962031B (en) * | 2021-12-20 | 2022-03-29 | 北京航空航天大学 | Heterogeneous platform conflict resolution method based on graph neural network reinforcement learning |
CN113962031A (en) * | 2021-12-20 | 2022-01-21 | 北京航空航天大学 | Heterogeneous platform conflict resolution method based on graph neural network reinforcement learning |
CN117177275B (en) * | 2023-11-03 | 2024-01-30 | 中国人民解放军国防科技大学 | SCMA-MEC-based Internet of things equipment calculation rate optimization method |
CN117177275A (en) * | 2023-11-03 | 2023-12-05 | 中国人民解放军国防科技大学 | SCMA-MEC-based Internet of things equipment calculation rate optimization method |
Also Published As
Publication number | Publication date |
---|---|
CN111897316B (en) | 2021-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111897316B (en) | Multi-aircraft autonomous decision-making method under scene fast-changing condition | |
CN110456823B (en) | Double-layer path planning method aiming at unmanned aerial vehicle calculation and storage capacity limitation | |
Tisdale et al. | Autonomous UAV path planning and estimation | |
CN110703804A (en) | Layering anti-collision control method for fixed-wing unmanned aerial vehicle cluster | |
KR20190023633A (en) | Wide area autonomus search method and system using multi UAVs | |
CN113848974B (en) | Aircraft trajectory planning method and system based on deep reinforcement learning | |
Wang et al. | Virtual reality technology of multi uavearthquake disaster path optimization | |
Chen et al. | Path planning and cooperative control for multiple UAVs based on consistency theory and Voronoi diagram | |
CN112923925B (en) | Dual-mode multi-unmanned aerial vehicle collaborative track planning method for hovering and tracking ground target | |
Li et al. | Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm | |
Liu | A progressive motion-planning algorithm and traffic flow analysis for high-density 2D traffic | |
CN111880574B (en) | Unmanned aerial vehicle collision avoidance method and system | |
CN110825112B (en) | Oil field dynamic invasion target tracking system and method based on multiple unmanned aerial vehicles | |
CN114679729A (en) | Radar communication integrated unmanned aerial vehicle cooperative multi-target detection method | |
Bodi et al. | Reinforcement learning based UAV formation control in GPS-denied environment | |
CN114138022A (en) | Distributed formation control method for unmanned aerial vehicle cluster based on elite pigeon swarm intelligence | |
Huang et al. | Cooperative collision avoidance method for multi-UAV based on Kalman filter and model predictive control | |
CN113110593A (en) | Flight formation cooperative self-adaptive control method based on virtual structure and estimation information transmission | |
Yan et al. | Collaborative path planning based on MAXQ hierarchical reinforcement learning for manned/unmanned aerial vehicles | |
CN113900449B (en) | Multi-unmanned aerial vehicle track planning method and device, unmanned aerial vehicle and storage medium | |
CN116822362A (en) | Unmanned aerial vehicle conflict-free four-dimensional flight path planning method based on particle swarm optimization | |
Zhang et al. | Survey of safety management approaches to unmanned aerial vehicles and enabling technologies | |
Duoxiu et al. | Proximal policy optimization for multi-rotor UAV autonomous guidance, tracking and obstacle avoidance | |
Liu et al. | Multi-agent collaborative adaptive cruise control based on reinforcement learning | |
Lu et al. | Dual Redundant UAV Path Planning and Mission Analysis Based on Dubins Curves |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |