CN110084414B

CN110084414B - Empty pipe anti-collision method based on K-time control deep reinforcement learning

Info

Publication number: CN110084414B
Application number: CN201910311467.0A
Authority: CN
Inventors: 李辉; 王壮
Original assignee: CHENGDU RONGAO TECHNOLOGY Co Ltd
Current assignee: CHENGDU RONGAO TECHNOLOGY Co Ltd
Priority date: 2019-04-18
Filing date: 2019-04-18
Publication date: 2020-03-06
Anticipated expiration: 2039-04-18
Also published as: CN110084414A

Abstract

The invention discloses an empty pipe anti-collision method based on K times of control depth reinforcement learning, which comprises the following steps: firstly, setting the number of airplanes in a sector in a use scene, and setting the control times K in an anti-collision process; then, performing K times of control in a training mode, determining the next position point through the action of the neural network in the former K-1 times of control according to a two-dimensional normal distribution method, updating the neural network parameters according to a reinforcement learning method, taking the destination as the next position point in the Kth time of control, and repeating the steps to finish the training of the neural network; finally, in the application mode, the shortest path without conflict can be obtained by using the trained neural network. The method can be applied to the existing air traffic management system, obtains the shortest path to the destination on the premise of not colliding with other airplanes in the sector, and has practical significance for planning the air pipeline path.

Description

Empty pipe anti-collision method based on K-time control deep reinforcement learning

Technical Field

The invention relates to the field of air traffic management, in particular to an air traffic control anti-collision method based on K-time control deep reinforcement learning.

Background

In recent years, civil aviation is rapidly developed, and the continuous development brings serious air traffic jam and great pressure to air managers. When an airplane flies from one sector to another sector, the flight path of the airplane needs to be planned, and correct guidance is given so as to avoid conflict with the existing airplane in the sector. The existing algorithm can generate an optimal or suboptimal flight path and conduct aircraft guidance, but the calculation efficiency is low, the real-time requirement in real air traffic control cannot be met, and further research is still needed. The deep reinforcement learning execution efficiency is high, the use is flexible, and the improved deep reinforcement learning execution system can be applied to an air traffic control system to quickly give a guide track.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings of the prior art, provides an air traffic control anti-collision method based on K times of control depth reinforcement learning, realizes that an airplane enters a sector and arrives at a destination on the premise of not colliding with the existing airplane in the sector, and can quickly form a plurality of schemes for an air traffic controller to select.

In order to realize the purpose, the invention adopts the following technical scheme:

an empty pipe anti-collision method based on K times of control depth reinforcement learning comprises the following steps:

(1) numbering existing airplanes in the sector, and generating a coordinate matrix P from the current moment to the moment when the airplanes fly out of the sector according to the existing flight plan of the existing airplanes and the time step;

(2) training a deep neural network by using a K-time control deep reinforcement learning method, and generating a path of a control airplane according to the current position of the control airplane and a coordinate matrix P of the airplane in a sector;

the calculation process of the K times of control depth reinforcement learning algorithm is as follows: setting a control frequency K; constructing a deep neural network, inputting a coordinate matrix P for controlling the current position of the airplane and the existing airplane, and outputting a polar coordinate of the next position point for controlling the airplane

If the control is not the Kth control, obtaining the polar coordinates of the next position point according to a two-dimensional normal distribution method, and updating the deep neural network parameters according to a reinforcement learning method according to the guide result; if the control is the Kth time, taking the destination of the airplane as the next position point, finishing the training and entering the next training;

(3) after massive training, the deep neural network has guiding capability, and can quickly generate a shortest path which does not conflict with other airplanes and reaches a destination for the control airplane according to the input position of the control airplane and the coordinate matrix of the existing airplane;

(4) in practical use, a plurality of deep neural networks with different K values can be trained, and a guide path can be quickly generated for an empty manager according to specific problems.

As a preferred technical solution, in the step (1), the coordinate matrix P of the existing airplane in the sector not only includes the coordinates of the current airplane, but also includes future coordinates according to the flight plan.

As a preferable technical scheme, in the step (2), the control times are adjusted through a parameter K, and the control times can be flexibly set in the empty pipe guide; and selecting the polar angle and the polar diameter of the next position point through two-dimensional normal distribution, wherein the point selection formula is as follows:

wherein, mu_ρ，σ_ρRepresents the mean and standard deviation of a normal distribution of the pole diameters,

the normal distribution mean value and the standard deviation of the representative polar angle meet the exploratory requirement in the reinforcement learning training process by the point selection method;

by adopting the operator-critic double-nerve network structure, the updating formula of the critic nerve network is as follows:

the update formula of the actor neural network is as follows:

wherein, α^w，α^θFor the learning rate of neural networks, δ ═ R_t+γV(S_t+1,W)-V(S_t,W),R_tFor reinforcement learning of the return function, V (S)_tW) is a function of the state value at time t, and γ is a discount factor.

As a preferred technical solution, in the step (3), according to the characteristic of neural network feature identification, non-conflicting K position points can be quickly generated according to state input, and the aircraft is controlled to sequentially fly through the K control points, so as to form a non-conflicting shortest path.

As a preferred technical solution, in the step (4), a plurality of alternatives can be generated at the same time for the air manager to flexibly select according to the air situation.

Compared with the prior art, the invention has the following advantages and effects:

(1) compared with the traditional method, the method has higher calculation efficiency and can generate the optimal path within 200 ms.

(2) The invention improves the deep reinforcement learning, the control times can be selected, and the reasonable control times can be selected according to the actual air condition.

(3) The invention applies the empty pipe anti-collision method based on K times of control depth reinforcement learning to the air traffic management system, realizes that the airplane enters the sector and reaches the destination on the premise of not colliding with the existing airplane in the sector, can quickly form a plurality of schemes for an empty pipe manager to select, and has practical significance for planning the empty pipe path.

Drawings

Fig. 1 is a flowchart of an empty pipe anti-collision method based on K times of control depth reinforcement learning according to this embodiment;

fig. 2 is an inside-sector air management schematic diagram of an air management anti-collision method based on K times of control depth reinforcement learning according to this embodiment;

fig. 3 is a schematic control diagram of K times of the empty pipe anti-collision method based on K times of control depth reinforcement learning according to this embodiment;

FIG. 4 is a diagram of an actor neural network structure of the empty pipe anti-collision method based on K times of control depth reinforcement learning according to the present embodiment;

fig. 5 is a flight path diagram between two points of an empty pipe anti-collision method based on K times of control depth reinforcement learning according to the embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention.

An empty pipe anti-collision method based on K times of control depth reinforcement learning is shown in fig. 1 and comprises the following steps:

In the embodiment, an airplane flying according to a set flight path exists in a sector, and the airplane is controlled to fly into the sector, and the air traffic control anti-collision method based on the K-time control depth reinforcement learning realizes that the airplane enters the sector and arrives at a destination on the premise of not colliding with the existing airplane in the sector;

as shown in fig. 2, existing airplanes in a sector are numbered, and a coordinate matrix P from the current moment to the moment when the airplane flies out of the sector is generated according to the existing flight plan of the existing airplanes and the time step;

as shown in fig. 3, there are four airplanes in the sector, and the coordinate matrix P of the airplanes in the sector not only contains the coordinates of the current airplane, but also includes the future coordinates according to the flight plan.

In this embodiment, the number of times of control is adjusted by a parameter K, and the number of times of control can be flexibly set in the empty pipe guidance, as shown in fig. 3, the value of K is 3;

in this embodiment, as shown in fig. 4, the actor neural network is composed of three layers of fully connected networks, and the output is the normal distribution mean and standard deviation of the polar angle and the polar diameter of the next position point.

In this embodiment, according to the characteristic of neural network feature identification, non-conflicting K position points can be quickly generated according to state input, the aircraft is controlled to sequentially fly through the K control points, and the flight trajectory between the two control points is as shown in fig. 5, so as to form a non-conflicting shortest path.

In this embodiment, a collision avoidance solution can be generated within 200ms using the present method. Five different solutions can be generated within one second for an air manager to select, and the efficiency is obviously superior to that of the existing method for generating one solution for several seconds or even tens of seconds.

Claims

1. An empty pipe anti-collision method based on K times of control depth reinforcement learning is characterized by comprising the following steps:

If the control is not the Kth control, obtaining the polar coordinates of the next position point according to a two-dimensional normal distribution method, and updating the deep neural network parameters according to a reinforcement learning method according to the guide result; if the control is the Kth control, the destination of the airplane is taken as the next position point, and the control is finishedThe training enters the next training;

(3) after mass training, according to the input position of the control plane and the coordinate matrix of the existing plane, a shortest path which does not conflict with other planes and reaches a destination is quickly generated for the control plane;

(4) in actual use, a plurality of deep neural networks with different K values are trained according to actual air conditions, and a guide path is quickly generated for an air traffic manager aiming at specific problems.

2. The air traffic control anti-collision method based on K times of control depth reinforcement learning of claim 1, characterized in that in step (1), the coordinate matrix P of the existing airplane in the sector not only contains the current airplane coordinates, but also includes the future coordinates according to the flight plan.

3. The empty pipe anti-collision method based on K times of control depth reinforcement learning of claim 1, characterized in that in step (2), the control times are adjusted by a parameter K, and the control times are flexibly set in empty pipe guidance; and selecting the polar angle and the polar diameter of the next position point through two-dimensional normal distribution, wherein the point selection formula is as follows:

normal distribution mean and standard deviation representing polar angle;

the update formula of the actor neural network is as follows:

wherein, α^w，α^θFor the learning rate of neural networks, δ ═ R_t+γV(S_t+1，w)-V(S_t，w)，R_tFor reinforcement learning of the return function, V (S)_tW) is a function of the state value at time t, and γ is a discount factor.

4. The empty pipe anti-collision method based on K times of control depth reinforcement learning of claim 1, characterized in that in step (3), according to the characteristic of neural network feature identification, K non-collision position points are rapidly generated according to state input, and the airplane is controlled to sequentially fly through K control points to form a shortest non-collision path.

5. The empty pipe anti-collision method based on K times of control deep reinforcement learning of claim 1, characterized in that in step (4), a plurality of alternatives are generated simultaneously for the empty pipe administrator to flexibly select according to the empty situation.