CN110084414A - A kind of blank pipe anti-collision method based on the study of K secondary control deeply - Google Patents

A kind of blank pipe anti-collision method based on the study of K secondary control deeply Download PDF

Info

Publication number
CN110084414A
CN110084414A CN201910311467.0A CN201910311467A CN110084414A CN 110084414 A CN110084414 A CN 110084414A CN 201910311467 A CN201910311467 A CN 201910311467A CN 110084414 A CN110084414 A CN 110084414A
Authority
CN
China
Prior art keywords
control
airplane
neural network
reinforcement learning
sector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910311467.0A
Other languages
Chinese (zh)
Other versions
CN110084414B (en
Inventor
李辉
王壮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHENGDU RONGAO TECHNOLOGY Co Ltd
Original Assignee
CHENGDU RONGAO TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHENGDU RONGAO TECHNOLOGY Co Ltd filed Critical CHENGDU RONGAO TECHNOLOGY Co Ltd
Priority to CN201910311467.0A priority Critical patent/CN110084414B/en
Publication of CN110084414A publication Critical patent/CN110084414A/en
Application granted granted Critical
Publication of CN110084414B publication Critical patent/CN110084414B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Primary Health Care (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a kind of blank pipe anti-collision method based on the study of K secondary control deeply, the control number K for including the following steps: the quantity for being set using in scene aircraft in sector first, and being arranged in anti-collision procedure;Then, in training mode, K secondary control is carried out, in preceding K-1 secondary control, next location point is determined according to the method for Two dimension normal distribution by the effect of neural network, and according to the method for intensified learning, neural network parameter is updated, in kth control, using destination as next location point, so circulation, completes the training of neural network;Finally, in the application mode, the neural network completed using training, the available shortest path not clashed.Method of the invention can be applied in existing air traffic control system (ATCS), and under the premise of not clashing with other aircrafts in sector, the shortest path arrived at the destination has practice significance to blank pipe path planning.

Description

Empty pipe anti-collision method based on K-time control deep reinforcement learning
Technical Field
The invention relates to the field of air traffic management, in particular to an air traffic control anti-collision method based on K-time control deep reinforcement learning.
Background
In recent years, civil aviation is rapidly developed, and the continuous development brings serious air traffic jam and great pressure to air managers. When an airplane flies from one sector to another sector, the flight path of the airplane needs to be planned, and correct guidance is given so as to avoid conflict with the existing airplane in the sector. The existing algorithm can generate an optimal or suboptimal flight path and conduct aircraft guidance, but the calculation efficiency is low, the real-time requirement in real air traffic control cannot be met, and further research is still needed. The deep reinforcement learning execution efficiency is high, the use is flexible, and the improved deep reinforcement learning execution system can be applied to an air traffic control system to quickly give a guide track.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, provides an air traffic control anti-collision method based on K times of control depth reinforcement learning, realizes that an airplane enters a sector and arrives at a destination on the premise of not colliding with the existing airplane in the sector, and can quickly form a plurality of schemes for an air traffic controller to select.
In order to realize the purpose, the invention adopts the following technical scheme:
an empty pipe anti-collision method based on K times of control depth reinforcement learning comprises the following steps:
(1) numbering existing airplanes in the sector, and generating a coordinate matrix P from the current moment to the moment when the airplanes fly out of the sector according to the existing flight plan of the existing airplanes and the time step;
(2) training a deep neural network by using a K-time control deep reinforcement learning method, and generating a path of a control airplane according to the current position of the control airplane and a coordinate matrix P of the airplane in a sector;
the calculation process of the K times of control depth reinforcement learning algorithm is as follows: setting a control frequency K; constructing a deep neural network, inputting a coordinate matrix P for controlling the current position of the airplane and the existing airplane, and outputting a polar coordinate of the next position point for controlling the airplane(ii) a If the control is not the Kth control, obtaining the polar coordinates of the next position point according to a two-dimensional normal distribution method, and updating the deep neural network parameters according to a reinforcement learning method according to the guide result; if the control is the Kth time, taking the destination of the airplane as the next position point, finishing the training and entering the next training;
(3) after massive training, the deep neural network has guiding capability, and can quickly generate a shortest path which does not conflict with other airplanes and reaches a destination for the control airplane according to the input position of the control airplane and the coordinate matrix of the existing airplane;
(4) in practical use, a plurality of deep neural networks with different K values can be trained, and a guide path can be quickly generated for an empty manager according to specific problems.
As a preferred technical solution, in the step (1), the coordinate matrix P of the existing airplane in the sector not only includes the coordinates of the current airplane, but also includes future coordinates according to the flight plan.
As a preferable technical scheme, in the step (2), the control times are adjusted through a parameter K, and the control times can be flexibly set in the empty pipe guide; and selecting the polar angle and the polar diameter of the next position point through two-dimensional normal distribution, wherein the point selection formula is as follows:
wherein,represents the mean and standard deviation of a normal distribution of the pole diameters,the normal distribution mean value and the standard deviation of the representative polar angle meet the exploratory requirement in the reinforcement learning training process by the point selection method;
by adopting the operator-critic double-nerve network structure, the updating formula of the critic nerve network is as follows:
the update formula of the actor neural network is as follows:
wherein,is the learning rate of the neural network and,R t in order to enhance the learning of the return function,V(S t ,w)as a function of the state value at time t,is a discount factor.
As a preferred technical solution, in the step (3), according to the characteristic of neural network feature identification, non-conflicting K position points can be quickly generated according to state input, and the aircraft is controlled to sequentially fly through the K control points, so as to form a non-conflicting shortest path.
As a preferred technical solution, in the step (4), a plurality of alternatives can be generated at the same time for the air manager to flexibly select according to the air situation.
Compared with the prior art, the invention has the following advantages and effects:
(1) compared with the traditional method, the method has higher calculation efficiency and can generate the optimal path within 200 ms.
(2) The invention improves the deep reinforcement learning, the control times can be selected, and the reasonable control times can be selected according to the actual air condition.
(3) The invention applies the empty pipe anti-collision method based on K times of control depth reinforcement learning to the air traffic management system, realizes that the airplane enters the sector and reaches the destination on the premise of not colliding with the existing airplane in the sector, can quickly form a plurality of schemes for an empty pipe manager to select, and has practical significance for planning the empty pipe path.
Drawings
Fig. 1 is a flowchart of an empty pipe anti-collision method based on K times of control depth reinforcement learning according to this embodiment;
fig. 2 is an inside-sector air management schematic diagram of an air management anti-collision method based on K times of control depth reinforcement learning according to this embodiment;
fig. 3 is a schematic control diagram of K times of the empty pipe anti-collision method based on K times of control depth reinforcement learning according to this embodiment;
FIG. 4 is a diagram of an actor neural network structure of the empty pipe anti-collision method based on K times of control depth reinforcement learning according to the present embodiment;
fig. 5 is a flight path diagram between two points of an empty pipe anti-collision method based on K times of control depth reinforcement learning according to the embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention.
An empty pipe anti-collision method based on K times of control depth reinforcement learning is shown in fig. 1 and comprises the following steps:
(1) numbering existing airplanes in the sector, and generating a coordinate matrix P from the current moment to the moment when the airplanes fly out of the sector according to the existing flight plan of the existing airplanes and the time step;
(2) training a deep neural network by using a K-time control deep reinforcement learning method, and generating a path of a control airplane according to the current position of the control airplane and a coordinate matrix P of the airplane in a sector;
the calculation process of the K times of control depth reinforcement learning algorithm is as follows: setting a control frequency K; constructing a deep neural network, inputting a coordinate matrix P for controlling the current position of the airplane and the existing airplane, and outputting a polar coordinate of the next position point for controlling the airplane(ii) a If not the K-thSecondary control, obtaining the polar coordinates of the next position point according to a two-dimensional normal distribution method, and updating the deep neural network parameters according to a guide result and a reinforcement learning method; if the control is the Kth time, taking the destination of the airplane as the next position point, finishing the training and entering the next training;
(3) after massive training, the deep neural network has guiding capability, and can quickly generate a shortest path which does not conflict with other airplanes and reaches a destination for the control airplane according to the input position of the control airplane and the coordinate matrix of the existing airplane;
(4) in practical use, a plurality of deep neural networks with different K values can be trained, and a guide path can be quickly generated for an empty manager according to specific problems.
In the embodiment, an airplane flying according to a set flight path exists in a sector, and the airplane is controlled to fly into the sector, and the air traffic control anti-collision method based on the K-time control depth reinforcement learning realizes that the airplane enters the sector and arrives at a destination on the premise of not colliding with the existing airplane in the sector;
as shown in fig. 2, existing airplanes in a sector are numbered, and a coordinate matrix P from the current moment to the moment when the airplane flies out of the sector is generated according to the existing flight plan of the existing airplanes and the time step;
as shown in fig. 3, there are four airplanes in the sector, and the coordinate matrix P of the airplanes in the sector not only contains the coordinates of the current airplane, but also includes the future coordinates according to the flight plan.
In this embodiment, the number of times of control is adjusted by a parameter K, and the number of times of control can be flexibly set in the empty pipe guidance, as shown in fig. 3, the value of K is 3;
in this embodiment, as shown in fig. 4, the actor neural network is composed of three layers of fully connected networks, and the output is the normal distribution mean and standard deviation of the polar angle and the polar diameter of the next position point.
In this embodiment, according to the characteristic of neural network feature identification, non-conflicting K position points can be quickly generated according to state input, the aircraft is controlled to sequentially fly through the K control points, and the flight trajectory between the two control points is as shown in fig. 5, so as to form a non-conflicting shortest path.
In this embodiment, a collision avoidance solution can be generated within 200ms using the present method. Five different solutions can be generated within one second for an air manager to select, and the efficiency is obviously superior to that of the existing method for generating one solution for several seconds or even tens of seconds.

Claims (5)

1. An empty pipe anti-collision method based on K times of control depth reinforcement learning is characterized by comprising the following steps:
(1) numbering existing airplanes in the sector, and generating a coordinate matrix P from the current moment to the moment when the airplanes fly out of the sector according to the existing flight plan of the existing airplanes and the time step;
(2) training a deep neural network by using a K-time control deep reinforcement learning method, and generating a path of a control airplane according to the current position of the control airplane and a coordinate matrix P of the airplane in a sector;
the calculation process of the K times of control depth reinforcement learning algorithm is as follows: setting a control frequency K; constructing a deep neural network, inputting a coordinate matrix P for controlling the current position of the airplane and the existing airplane, and outputting a polar coordinate of the next position point for controlling the airplane(ii) a If the control is not the Kth control, obtaining the polar coordinates of the next position point according to a two-dimensional normal distribution method, and updating the deep neural network parameters according to a reinforcement learning method according to the guide result; if the control is the Kth time, taking the destination of the airplane as the next position point, finishing the training and entering the next training;
(3) after massive training, the deep neural network has guiding capability, and can quickly generate a shortest path which does not conflict with other airplanes and reaches a destination for the control airplane according to the input position of the control airplane and the coordinate matrix of the existing airplane;
(4) in practical use, a plurality of deep neural networks with different K values can be trained, and a guide path can be quickly generated for an empty manager according to specific problems.
2. The air traffic control anti-collision method based on K times of control depth reinforcement learning according to claim 1, characterized in that in step (1), the coordinate matrix P of the existing airplane in the sector not only contains the current airplane coordinates, but also includes future coordinates according to the flight plan.
3. The empty pipe anti-collision method based on K times of control depth reinforcement learning according to claim 1, characterized in that in the step (2), the control times are adjusted through a parameter K, and the control times can be flexibly set in empty pipe guidance; and selecting the polar angle and the polar diameter of the next position point through two-dimensional normal distribution, wherein the point selection formula is as follows:
wherein,represents the mean and standard deviation of a normal distribution of the pole diameters,the normal distribution mean value and the standard deviation of the representative polar angle meet the exploratory requirement in the reinforcement learning training process by the point selection method;
by adopting the operator-critic double-nerve network structure, the updating formula of the critic nerve network is as follows:
the update formula of the actor neural network is as follows:
wherein,is the learning rate of the neural network and,R t in order to enhance the learning of the return function,V(S t ,w)as a function of the state value at time t,is a discount factor.
4. The empty pipe anti-collision method based on K times of control depth reinforcement learning according to claim 1, characterized in that in step (3), according to the characteristic of neural network feature identification, K non-collision position points are quickly generated according to state input, and the airplane is controlled to sequentially fly through the K control points to form a shortest non-collision path.
5. The empty pipe anti-collision method based on K times of control deep reinforcement learning according to claim 1, characterized in that in the step (4), a plurality of alternatives can be generated simultaneously for the empty pipe administrator to flexibly select according to the empty situation.
CN201910311467.0A 2019-04-18 2019-04-18 Empty pipe anti-collision method based on K-time control deep reinforcement learning Active CN110084414B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910311467.0A CN110084414B (en) 2019-04-18 2019-04-18 Empty pipe anti-collision method based on K-time control deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910311467.0A CN110084414B (en) 2019-04-18 2019-04-18 Empty pipe anti-collision method based on K-time control deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN110084414A true CN110084414A (en) 2019-08-02
CN110084414B CN110084414B (en) 2020-03-06

Family

ID=67415491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910311467.0A Active CN110084414B (en) 2019-04-18 2019-04-18 Empty pipe anti-collision method based on K-time control deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN110084414B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882047A (en) * 2020-09-28 2020-11-03 四川大学 Rapid empty pipe anti-collision method based on reinforcement learning and linear programming
CN113393495A (en) * 2021-06-21 2021-09-14 暨南大学 High-altitude parabolic track identification method based on reinforcement learning
CN113962031A (en) * 2021-12-20 2022-01-21 北京航空航天大学 Heterogeneous platform conflict resolution method based on graph neural network reinforcement learning
CN114141062A (en) * 2021-11-30 2022-03-04 中国电子科技集团公司第二十八研究所 Aircraft interval management decision method based on deep reinforcement learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530704A (en) * 2013-10-16 2014-01-22 南京航空航天大学 Predicating system and method for air dynamic traffic volume in terminal airspace
KR20170002191A (en) * 2015-06-29 2017-01-06 인하공업전문대학산학협력단 Collision avoidance control method for unmanned air vehicle
CN106601033A (en) * 2017-02-28 2017-04-26 中国人民解放军空军装备研究院雷达与电子对抗研究所 Air traffic control mid-term conflict detection method and device
CN106814744A (en) * 2017-03-14 2017-06-09 吉林化工学院 A kind of UAV Flight Control System and method
CN109597839A (en) * 2018-12-04 2019-04-09 中国航空无线电电子研究所 A kind of data digging method based on the avionics posture of operation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530704A (en) * 2013-10-16 2014-01-22 南京航空航天大学 Predicating system and method for air dynamic traffic volume in terminal airspace
KR20170002191A (en) * 2015-06-29 2017-01-06 인하공업전문대학산학협력단 Collision avoidance control method for unmanned air vehicle
CN106601033A (en) * 2017-02-28 2017-04-26 中国人民解放军空军装备研究院雷达与电子对抗研究所 Air traffic control mid-term conflict detection method and device
CN106814744A (en) * 2017-03-14 2017-06-09 吉林化工学院 A kind of UAV Flight Control System and method
CN109597839A (en) * 2018-12-04 2019-04-09 中国航空无线电电子研究所 A kind of data digging method based on the avionics posture of operation

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882047A (en) * 2020-09-28 2020-11-03 四川大学 Rapid empty pipe anti-collision method based on reinforcement learning and linear programming
CN111882047B (en) * 2020-09-28 2021-01-15 四川大学 Rapid empty pipe anti-collision method based on reinforcement learning and linear programming
CN113393495A (en) * 2021-06-21 2021-09-14 暨南大学 High-altitude parabolic track identification method based on reinforcement learning
CN113393495B (en) * 2021-06-21 2022-02-01 暨南大学 High-altitude parabolic track identification method based on reinforcement learning
CN114141062A (en) * 2021-11-30 2022-03-04 中国电子科技集团公司第二十八研究所 Aircraft interval management decision method based on deep reinforcement learning
CN114141062B (en) * 2021-11-30 2022-11-01 中国电子科技集团公司第二十八研究所 Aircraft interval management decision method based on deep reinforcement learning
CN113962031A (en) * 2021-12-20 2022-01-21 北京航空航天大学 Heterogeneous platform conflict resolution method based on graph neural network reinforcement learning
CN113962031B (en) * 2021-12-20 2022-03-29 北京航空航天大学 Heterogeneous platform conflict resolution method based on graph neural network reinforcement learning

Also Published As

Publication number Publication date
CN110084414B (en) 2020-03-06

Similar Documents

Publication Publication Date Title
CN110084414B (en) Empty pipe anti-collision method based on K-time control deep reinforcement learning
CN109508035B (en) Multi-region hierarchical unmanned aerial vehicle formation path planning method based on distributed control
CN108829140B (en) Multi-unmanned aerial vehicle cooperative target searching method based on multi-colony ant colony algorithm
CN105892480A (en) Self-organizing method for cooperative scouting and hitting task of heterogeneous multi-unmanned-aerial-vehicle system
CN111882047B (en) Rapid empty pipe anti-collision method based on reinforcement learning and linear programming
Xia et al. Cooperative task assignment and track planning for multi-UAV attack mobile targets
CN105353766A (en) Distributed fault-tolerant management method of multi-UAV formation structure
CN112489498A (en) Fine route change planning method for route traffic
CN114330115B (en) Neural network air combat maneuver decision-making method based on particle swarm search
CN105953800A (en) Route planning grid space partitioning method for unmanned aerial vehicle
CN105185163A (en) Flight path selection method, flight path selection device, aircraft and air traffic management system
CN111121784B (en) Unmanned reconnaissance aircraft route planning method
CN105718997A (en) Hybrid multi-aircraft conflict resolution method based on artificial potential field method and ant colony algorithm
CN109002056A (en) A kind of large size fixed-wing unmanned plane formation method
CN103578299B (en) A kind of method simulating aircraft process
CN109788618A (en) Lamp light control method, device, system and the medium of vector aircraft ground taxi
CN113268081A (en) Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning
CN114020009A (en) Terrain penetration planning method for small-sized fixed-wing unmanned aerial vehicle
CN114003059A (en) UAV path planning method based on deep reinforcement learning under kinematic constraint condition
CN111157002B (en) Aircraft 3D path planning method based on multi-agent evolutionary algorithm
CN113283727B (en) Airport taxiway scheduling method based on quantum heuristic algorithm
CN117387635B (en) Unmanned aerial vehicle navigation method based on deep reinforcement learning and PID controller
Lee et al. Predictive control for soaring of unpowered autonomous UAVs
CN113220008A (en) Collaborative dynamic path planning method for multi-Mars aircraft
CN116415480B (en) IPSO-based off-road planning method for aircraft offshore platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant