CN112818599A - Air control method based on reinforcement learning and four-dimensional track - Google Patents

Air control method based on reinforcement learning and four-dimensional track Download PDF

Info

Publication number
CN112818599A
CN112818599A CN202110134760.1A CN202110134760A CN112818599A CN 112818599 A CN112818599 A CN 112818599A CN 202110134760 A CN202110134760 A CN 202110134760A CN 112818599 A CN112818599 A CN 112818599A
Authority
CN
China
Prior art keywords
airplane
point
speed
dimensional
course
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110134760.1A
Other languages
Chinese (zh)
Other versions
CN112818599B (en
Inventor
俎文强
季玉龙
何扬
黄操
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202110134760.1A priority Critical patent/CN112818599B/en
Publication of CN112818599A publication Critical patent/CN112818599A/en
Application granted granted Critical
Publication of CN112818599B publication Critical patent/CN112818599B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses an air control method based on reinforcement learning and four-dimensional tracks, which comprises the steps of firstly establishing airplane aerodynamic performance models of different types; acquiring four-dimensional track data of different airplane types on different air routes according to the airplane pneumatic performance model; generating a four-dimensional track model of the airline-type through data playback; and finally, building a neural network based on a reinforcement learning algorithm, training the four-dimensional track pressed on the movement of the airplane, constructing a nested reinforcement learning model of a nested speed intelligent body in the course intelligent body, realizing the selection of the route of the airplane by selecting the target course direction of the airplane, and realizing the control of the arrival time of the airplane by selecting the target speed of the airplane, thereby realizing the function that the airplane presses the four-dimensional track model according to the specified time, speed, course and altitude. The invention can provide a feasible solution for the problems of large flow, complex airplane scheduling method, difficult air control and the like of the current airport.

Description

Air control method based on reinforcement learning and four-dimensional track
Technical Field
The invention relates to the technical field of intelligent air traffic control, in particular to an air traffic control method based on reinforcement learning and four-dimensional tracks.
Background
A new generation of air traffic control should be intelligent. This is because high density traffic conditions and large numbers of aircraft present significant challenges to air traffic controllers (ATCos), and they therefore require automatic approach to reduce complexity, particularly at landing (arrival) and takeoff. One simple way to automatically implement the air traffic control problem is to control the aircraft to fly along the calculated 4D trajectory by artificial intelligence ATCos.
The european air traffic authority has determined that data-driven trajectory prediction, in particular 4D trajectories that are typically predicted using aircraft aerodynamic performance models, is one of the key pillars for future air traffic management. It emphasizes the importance of the air traffic control method based on the track and airplane performance model
Methods based on trajectory or aircraft performance models have been extensively studied in the field of air traffic control. Klomp proposed in 2019 a conceptual decision support tool for 4D trajectory management, aiming to overcome these problems by directly visualizing the solution space related to actions. The feasibility of the concept is verified by performing a preliminary verification on the partial implementation of the solution space representation. Jacco et al in 2016 proposed a project Bluesky that investigated the feasibility of air traffic simulation of fully open source and open data methods. One of the main contributions is to achieve high fidelity, e.g. the aircraft performance is truly modeled on the aerodynamic performance of the aircraft.
The research on the automatic air traffic control method by Marc Britain in 2018 provides a deep reinforcement learning method, uses an air traffic control simulator created by NASA as an environment to test the reinforcement learning technology of Marc Britain, provides tactical decision support for air traffic controllers, selects a route and changes speed for each airplane, and solves the sequencing and separation problems of autonomous air traffic control. They have designed a nested agent structure where the master agent takes an action (changes the route) and the nested agent is responsible for speed control, solving the problem of not being able to plan an environment as a typical single agent environment due to the non-markov nature involved in this problem. Nested agents can decouple the set of actions that change routes and change speeds. The results show that the reward number tends to oscillate frequently, but increases, throughout the training. However, their approach is not applicable in all cases. In addition, in their research, NASA33 was used as a simulator, and only the case where the aircraft was born at a fixed location and moved on a limited path was considered, and the influence of the aircraft aerodynamic performance package on the flight path of the aircraft was not considered. They employ a DQN-based deep nested agent approach, which is a value-based reinforcement learning approach that is applicable to discrete environments, but not to continuous environments.
Vonk explored the possibility of applying reinforcement learning techniques to air traffic control in the sequencing and spacing of airplanes in 2019. The experiment was aimed at learning to navigate to the FAF point at the same time, while arriving at the correct time, to simulate interaction with arriving agents. However, the results are not stable. The limitation of this approach is that they train the aircraft only by heading instructions, regardless of speed factors, and at constant speed they do not know the trajectory that the AI ultimately chooses and cannot control the direction of arrival.
As for recent research advances, several researchers have proposed nested approaches to enhance learning. Surioyo Ghosh proposes an intelligent air traffic control method based on a multi-agent reinforcement learning algorithm in 2020, and the main method is to train a single main neural network to solve the interaction influence among the multi-agents. They discovered a multi-agent reinforcement learning optimal learning paradigm, however, their main research direction was air traffic collision detection and avoidance. The methods proposed by them are not applicable to the field of four-dimensional trajectory-based air traffic control because they do not take into account the time constraint to reach the target, which is a condition that four-dimensional trajectory-based air traffic control must take into account and rely on.
In summary, the problems of the prior art are as follows:
(1) in the prior art, in the process of solving the air control problem based on the four-dimensional track by using the traditional reinforcement learning method, the problem of sparse reward occurs, and how to solve the sparse reward is one of the difficulties; in addition, the design of the reward function is a difficult point for the training of multi-objective agents.
(2) In the prior art, most of the algorithms are similar to the aircraft collision avoidance algorithm for researching the air traffic control field based on reinforcement learning, and the algorithms are helpful for the research of a specific field, but are not a widely applicable air traffic control method. And the intelligent air control method based on the four-dimensional track is one of the basic widely-applicable methods.
(3) In the prior art, an air control method based on reinforcement learning and four-dimensional trajectory has great limitations, such as: poor stability, low accuracy, more restriction conditions, etc. In addition, due to the limitation of accuracy and complexity of the algorithm, the influence of multiple factors cannot be considered at the same time, most of the factors only consider one influence factor, such as the heading angle of the airplane or the speed of the airplane, and the influence of multiple factors cannot be considered at the same time, so that the condition for practical use is temporarily not met.
Moreover, aiming at the design problem of the reward function of the multi-target intelligent agent, the following problems can occur when the reward function is designed by self: 1. the reward is abstract, the formula expression 2 is difficult, the parameters are more, the difficulty is higher than 3, and the reward function effect is poor.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide an air control method based on reinforcement learning and four-dimensional trajectory, which can provide a feasible solution to the problems of large traffic, complex airplane scheduling method, difficult air control, etc. faced by the current airport. The technical scheme is as follows:
an air control method based on reinforcement learning and four-dimensional track comprises the following steps:
s1: establishing airplane pneumatic performance models of different types by modeling the engine performance of different types;
s2: acquiring four-dimensional track data of different types of airplanes on different routes according to the airplane pneumatic performance model; generating a four-dimensional track model of the airline-type through data playback;
s3: based on a reinforcement learning algorithm, a neural network is built, four-dimensional tracks are pressed on the movement of the airplane for training, a nesting reinforcement learning model of a nesting speed intelligent body in a heading intelligent body is built, the selection of the route of the airplane is realized by selecting the target heading of the airplane, and the control of the arrival time of the airplane is realized by selecting the target speed of the airplane, so that the function that the airplane presses the four-dimensional track model according to the specified time, speed, heading and altitude is realized.
Further, the specific process of S1 is as follows: the method comprises the steps of defining a key position point with airplane motion state information, selecting an airplane of a specific airplane type in a flight simulation system with an airplane pneumatic performance model to simulate flight according to a route summarized by a specified position point, recording information including flight time, six degrees of freedom of the airplane and environmental factors at fixed time intervals, and storing the information in a recording file.
Further, the specific process of S2 is as follows:
s21: acquiring the track points meeting the conditions to form a track point set G, and mapping each track point to a flight line to obtain a discrete track point mapping point set G' on the flight line;
G={gi,i=1,2,3...,n} (1)
G'={g'i,i=1,2,3...,n} (2)
wherein, giIs a navigation point, g 'meeting the conditions'iIs a track point giA mapped point on the flight path; n is the number of samples;
s22: calculating mapping points g 'of each navigation point'iDistance s to the start of each legiObtaining a sample set W' of discrete track point mapping points on the route with respect to distance and speed;
W'={(si,vi),i=1,2,…,n} (3)
wherein s isiIs the distance from the sampling point to the beginning of the flight, viIs a one-dimensional output vector and is expressed at the starting point s of the range flightiSpeed of the aircraft at location;
s23: for the collected sample set W', LSSVM in machine learning is selected, and each sample set is usedDistance xi from sample point to respective hyperplaneiRepresents the empirical risk of LSSVM, and the least empirical risk of training is
Figure BDA0002922997920000031
Minimum, its mathematical model is:
Figure BDA0002922997920000032
wherein w is viAbout siA linear parameter of (d); b is a linear offset;
according to the principle of minimizing the structural risk, the LSSVM needs to ensure the distance maximization of two classification hyperplanes, and the solved mathematical model is a compromise between empirical risk and structural risk, namely
Figure BDA0002922997920000033
Where C is a penalty factor and the distance ξ from a sample point to its hyperplaneiIs a training error;
s33: to solve this optimization problem, Lagrange's function is introduced:
Figure BDA0002922997920000041
wherein alpha isiN is Lagrange multiplier, e is unit vector;
Figure BDA0002922997920000042
representation wsiw/|w|;
The following relationship is obtained from the KKT condition:
Figure BDA0002922997920000043
kernel function
Figure BDA0002922997920000044
sjIs a navigation point mapping point g'jDistance to the starting point of each leg; then the solution form of equation (7) is converted into:
Figure BDA0002922997920000045
wherein Q is an element KijK × k order kernel matrix of (1), I is the identity matrix, and vector e ═ 1, …,1]TThe vector α ═ α1,…,αn]TVector v ═ v1,…,vn]T
Solving formula (8) to obtain alphaiAnd substituting the value of b into the formula (6) to obtain the chaotic time series regression model of the LSSVM, wherein the chaotic time series regression model of the LSSVM is as follows:
Figure BDA0002922997920000046
the speed value of each position point s on the corresponding route is as follows:
Figure BDA0002922997920000047
and after mapping of the flight path s-v is obtained, a four-dimensional trajectory model of the flight path-model is induced.
Further, the mapping each course point to the course in S21 includes:
mapping the data of the straight line route: drawing a perpendicular line to the straight course l through each course point, and generating an intersection point with the course, namely a mapping point corresponding to the course point;
and (3) arc course data mapping: and connecting each track point with the center of a circle of an arc course, wherein the intersection point of the formed straight line and the arc line is the mapping point corresponding to the track point.
Further, the S3 specifically includes:
s31: setting up an experimental environment in a simulation system, determining the type of a training aircraft, the birth position of the aircraft and the simulation speed, and initializing the environment;
s32: according to the PPO algorithm, a reinforcement learning algorithm is built:
(1) setting a state space:
there are two agents in the reinforcement learning experiment: selecting a speed intelligent agent of the speed and an intelligent agent for changing course;
the state space for setting the intelligent body with the body course is as follows: [ Δ lat, Δ lon, tarhdg, hdg ];
wherein, Δ lat is the difference between the target latitude and the aircraft latitude, and Δ lon is the difference between the target longitude and the actuator longitude; tarhdg represents the target heading, hdg represents the actuator heading;
the state space for setting the speed agent is: [ Δ lat, Δ lon, tarhdg, hdg, cas, time ];
wherein cas represents the calibrated speed of the aircraft and time represents the remaining time from the target;
(2) setting an action space:
defining an action space of a heading agent: a. thet=[0,hdg,360]
Wherein the minimum course is 0 degree, the maximum course is 360 degrees, and the motion space is distributed from 0 to 360 degrees;
defining the motion space of the velocity agent: a. thet=[vmin,vt-1,vmax]
Wherein v isminIs the minimum allowable calibration speed, vmaxIs the maximum allowable calibration speed, the motion space is a distribution from 0 to 1000;
(3) setting a reward function
The reward function of the heading agent is used for guiding the agent to select the heading, and is expressed as:
R=αd*d+αh*Δhdg (11)
wherein d is the distance from the current position of the airplane to the target position, delta hdg is the current heading minus the target heading, alphadAnd alphahThe coefficients of the distance and the course are respectively;
the velocity intelligent reward function is used to change the speed of the aircraft to arrive at the target location at the correct time, expressed as:
Δd=αd*(d-d′) (12)
wherein d' is the current speed of the aircraft multiplied by the delay time;
defining the merit function as: ad-Q, real-Q estimation; defining an analysis function ratio as the distribution difference of two probability distributions after importance sampling processing;
the loss function is defined as: a policy function network updates the policy by maximizing the dominance function;
s33: training a nested reinforcement learning model with a course and a speed selected respectively:
(1) taking a course intelligent agent as a main intelligent agent, nesting a speed intelligent agent in the main intelligent agent, adopting an action of selecting the course through the main intelligent agent, and realizing an action of controlling and changing the speed through the nested speed intelligent agent; the state space of the main agent is [ delta lat, delta lon, tarhdg, hdg ], and the state of the nested agent is [ delta d ];
(2) the main agent and the nested speed agent of the nested reinforcement learning model have the same neural network structure, namely an AC structure; for the evaluation network, the merit function is defined as:
Figure BDA0002922997920000061
wherein, thetacIs a parameter for evaluating a neural Network (Critic Network) matrix, RtIs an instant reward for the user,
Figure BDA0002922997920000062
is a function of the state value of the next state,
Figure BDA0002922997920000063
is a function of the state value of the current state; γ is a value between 0 and 1, representing a future attenuation factor, which is considered less the further away from the present;
using least squares, parameter θcThe update formula of (2) is:
Figure BDA0002922997920000065
where α is the learning rate of the evaluation network definition, θcAre parameters of a neural network matrix of the evaluation network,
Figure BDA0002922997920000066
representing the parameter thetacThe step size of the update;
for Policy Network (Policy Network), a Policy gradient method is adopted, and pi (a | S)tp) Is shown in state StProbability of lower selection a action; parameter theta of action neural networkpThe update formula of (2) is:
Figure BDA0002922997920000067
where α is a learning rate defined by the policy network, and is the same as the learning rate in equation (14), θ is expressed by the same learning rate αpAre parameters of the neural network matrix of the policy network,
Figure BDA0002922997920000064
representing the parameter thetapThe step size of the update;
and finally, optimizing the random sampling data by adopting a failure experience playback method, thereby optimizing the convergence direction of the neural network and solving the problem of sparse reward.
Further, in S31, initializing the environment includes: randomly generating a plurality of navigation points in the landing direction of the airport, and randomly generating a delay time sequence to enable the training airplane to land through the navigation points in a correct time sequence; the plane is randomly born in a designated area, the course is random, and the speed and the altitude are set.
Further, the step S3 is followed by:
s4: updating the four-dimensional track model according to the simulation time of the simulation system, so that each four-dimensional track point has a time tag as a time identifier; and judging whether the time mark of the current four-dimensional track point is occupied or not when the new aircraft selects to occupy the four-dimensional track point according to the time mark of the four-dimensional track point, and if so, giving up the current four-dimensional track point and reselecting.
Furthermore, in the route-model four-dimensional trajectory model established in S2, each four-dimensional trajectory point is generated in a route independently, or is distributed in different routes having intersecting route sections, and the same four-dimensional trajectory point distributed on different routes becomes the same four-dimensional trajectory point at the intersecting route section route.
The invention has the beneficial effects that:
(1) the present invention has different advantages from several algorithms for air traffic control that were produced in the last two decades. According to the invention, the advantages and the disadvantages under the current air flow environment are analyzed by researching the mainstream air control algorithm, and finally, an air traffic control method based on reinforcement learning and four-dimensional track is selected; the advantages of non-reinforcement learning air traffic control and reinforcement learning-based air traffic control are combined, innovation is provided on the basis, and a four-dimensional track model is adopted to realize and simplify the air control method under large-scale flow; experimental results under a flight simulation verification system show that the air traffic control algorithm is suitable for air control under large flow, and can be integrated in a related project engine or frame.
(2) In the stage of constructing the database, the invention generates the aerodynamic performance model of the airplane by modeling the engines of different types. The simulation system is used for controlling the movement of the airplane and is established on the basis of the aerodynamic performance model of the airplane, so that the airplane can move and simulate according to the real movement process of the airplane, and the reliability of the experimental result of the simulation system is effectively enhanced.
(3) In the data processing stage, the flight motion data of different airplane types under different airlines are collected to obtain an airline-airplane type four-dimensional track model. The acquired data comprises information of longitude and latitude, speed, height, course angle, roll angle, pitch angle and the like of the airplane. The method for generating the four-dimensional track model by data acquisition and data playback can solve the problem that the aircraft state cannot be calculated due to the fact that the speed is difficult to predict.
(4) In the core algorithm design stage, a reinforcement learning algorithm is introduced, the four-dimensional track is taken as a target, and the purpose of fixing the course and pressing the four-dimensional track in time is realized by adjusting the course, the speed and the height of the airplane. And in the algorithm training stage, a four-dimensional track and the appearance posture of the airplane are set, the airplane is ensured to appear in a proper range, such as a distance of 20-50km from a four-dimensional track point, and the convergence of the acceleration algorithm is guided by adopting the thought of simulated learning and expert data.
Drawings
Fig. 1 is a flowchart of an air control method based on reinforcement learning and four-dimensional trajectory according to an embodiment of the present invention.
FIG. 2 is a diagram for selecting the effective data range of the straight flight of the aircraft based on the 4D track aviation control method.
FIG. 3 is a diagram for selecting the effective data range of the arc flight of the aircraft based on the 4D track aviation control method.
FIG. 4 is a geometric schematic diagram of an LSSVM algorithm of the 4D track-based air traffic control method of the invention.
Fig. 5 is a schematic diagram of the distance between the end point and the target point in 200 experimental results provided by the example of the present invention.
Fig. 6 is a schematic diagram of the track angle to the end point in 200 experimental results provided by the example of the present invention.
FIG. 7 is a schematic diagram of the delay time of the end point in 200 experiments according to the embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and specific embodiments. The invention refers to the idea of reverse reinforcement learning, pre-trains a proper evaluation network and a strategy network by adopting expert data in advance, and fits an incentive function according to a formula of accumulated return. The method for fitting the reward function by the reverse reinforcement learning algorithm can avoid the problems of incomplete design consideration and poor convergence effect of subjective design of the reward function.
Aiming at the problem of sparse reward of intelligent agent training, the sparse reward can cause slow convergence speed of the algorithm and even the algorithm is easy to fall into local optimal dilemma. The present invention employs failed experience playback (HER) to avoid situations where rewards are sparse. The reward sparseness means that the number of times that an agent can reach a target is small when the agent trains in a wide space, the learning efficiency of the agent is low, the learning frequency is easily increased, and the training effect is worse and worse. The method for replaying the failed experience can effectively solve the problem of sparse reward, and the method has the following idea: and modifying the target value of each piece of data, so that the modified data is effective data reaching the target.
Aiming at the problem of multi-target intelligent agent multi-strategy network output, the invention adopts a nested reinforcement learning method, which comprises two intelligent agents for respectively controlling the course and the speed of an aircraft. The main intelligent strategy network is a course control network, the output of the strategy network is the probability distribution of the target course of the aircraft, and the selection of the aircraft route is realized by selecting the target course of the aircraft; the strategy network of the nested intelligent agent is a speed control network, the output of the strategy network is the probability distribution of the target speed of the aircraft, and the control of the arrival time of the aircraft is realized by selecting the target speed of the aircraft.
The invention realizes the intelligent control of air flow by using a four-dimensional track model and a reinforcement learning algorithm, and the flow of the method is shown in figure 1, and the method specifically comprises the following steps:
the method comprises the following steps: the method comprises the steps of establishing airplane pneumatic performance models of different models by modeling engine performance and other performances of different models.
The modeling method provided by the embodiment of the invention mainly aims at modeling the aerodynamic performance model of the fixed-wing airplane and the helicopter, and mainly models the performances of the airplane, such as the speed, the acceleration, the rise rate and the like.
Step two: according to the airplane aerodynamic performance model, four-dimensional track data of different airplane types to different air routes are collected, and then the air routes and the four-dimensional track model are generated through data playback.
The speed models of different models provided by the embodiment of the invention are different, and different route-four-dimensional track models can be generated aiming at different models of the same route. Six-degree-of-freedom information and time information of the airplane are collected every 1s, and a course-four-dimensional track model is generated through data playback.
Step three: based on a reinforcement learning algorithm, a neural network is built, the four-dimensional track is pressed on the airplane motion for training, and the four-dimensional track is pressed on the airplane according to the specified time, speed, course and altitude by using the trained neural network.
The reinforcement learning algorithm provided by the embodiment of the invention adopts a random strategy and a gradient descent method to build a multilayer neural network and construct an intelligent agent by taking the four-dimensional track to be pressed as a target. The stochastic strategy is then updated by sampling of the agent and gradient descent of the neural network.
Step four: aiming at the problem that four-dimensional track points of a plurality of routes are intersected and overlapped, time attributes are set for each four-dimensional track point, four-dimensional tracks with the same time labels are converged into the same four-dimensional track during intersection, and updating of a four-dimensional track model and a conflict avoidance algorithm of the four-dimensional track model are achieved according to the running time of a four-dimensional track system and the time labels of the four-dimensional tracks.
The application principle of the present invention is further explained with reference to the following specific embodiments;
example 1: four-dimensional track air control implementation process and analysis
(one) four-dimensional trajectory data acquisition
(1) Flight simulation system
The invention selects a flight simulation system with an airplane pneumatic performance model for experiment. The flight of an aircraft for simulation training may be subject to aerodynamic performance model constraints, i.e., aircraft performance constraints such as engine performance, aircraft weight, and the like. The airplane trained by the flight simulation system with the airplane pneumatic performance model is more suitable for the flight condition of a real airplane to a certain extent, and the training result is more suitable for being applied to a real flight environment.
(2) Course four-dimensional trajectory data acquisition
A flight path typically has several key location points that should have heading, altitude, speed, etc. of the aircraft's moving state information. From these key location points, a route can be generalized. The specific routes are different due to the difference of airplane models, so each model corresponds to one route.
After the key position points are defined, the flight simulation system with the airplane pneumatic performance model selects an airplane of one airplane type to simulate flight according to the air route summarized by the specified position points, records information such as flight time, six degrees of freedom of the airplane, environmental factors and the like at fixed time intervals, and stores the information in a recording file.
(II) establishing a route
And mapping the qualified discrete track points to the route to form the discrete track points on the route.
The collected track point set meeting the conditions is as follows:
G={gi,i=1,2,3...,n} (1)
the straight-line course data map is shown in FIG. 2.
The method comprises the following steps: each track point is perpendicular to the route l and intersects with the route, and the attribute of the track point is the attribute of the intersection on the route, so that all points in the rectangle are mapped onto the route to form a discrete track point set on the route.
The arc course data map is shown in FIG. 3.
The method comprises the following steps: each track point is connected with the center of a circle to form an intersection point of a straight line and an arc line, and the attribute of the track point is the attribute of the intersection point on the arc line, so that all points in the sector are mapped onto the arc line to form a discrete track point set on the arc line.
Obtaining a discrete track point set on the route after data mapping:
G'={g'i,i=1,2,3...,n} (2)
as shown in FIG. 2, each discrete point g can be obtained according to the distance formulaiTo the voyageDistance s of origin Ei. Similarly, as shown in FIG. 3, each discrete point g 'may be calculated'iDistance s 'to arc origin B'i. The set of discrete track points on the course with respect to distance and speed is:
W'={(si,vi),i=1,2,…,n} (3)
siis the distance, v, from the sample point to the next fixed pointiIs a one-dimensional output vector, represented at siThe speed of the aircraft at location, n being the number of samples.
For the acquired sample set W', due to siAnd viThe linear fitting method is a nonlinear relation, the speed situation of the aircraft in the air cannot be well described by a simple linear fitting method, and in order to solve the problem, an LSSVM (Least square support vector machine) method in machine learning is selected.
The LSSVM is an improvement of an SVM (support vector machine), and has the advantages of less resources for introducing least square loss functions and equality constraints and higher solving speed. Compared with SVM, the empirical risk of LSSVM is based on the distance xi from each sample point to the respective hyperplaneiIs expressed by the sum of squares, where ξiRepresenting the point-to-face distance.
The least empirical risk for training is
Figure BDA0002922997920000101
Minimum (here, it is expressed that the square sum of the distances from each sampling point to the respective hyperplane is minimum) and its mathematical model is:
Figure BDA0002922997920000102
according to the principle of minimizing the structural risk, the LSSVM also needs to ensure the distance maximization of two classification hyperplanes, and the solved mathematical model is a compromise between the empirical risk and the structural risk:
Figure BDA0002922997920000103
c is a penalty factor, ξiTo train the error. To solve this optimization problem, a Lagrange function may be introduced:
Figure BDA0002922997920000104
wherein alpha isi1, n is Lagrange multiplier, and the following relation is obtained under the KKT condition:
Figure BDA0002922997920000111
kernel function
Figure BDA0002922997920000112
The solution of equation (7) can be formalized as:
Figure BDA0002922997920000113
wherein Q is an element KijK × k order kernel matrix of (1), I is the identity matrix, and vector e ═ 1, …,1]TThe vector α ═ α1,…,αn]TVector v ═ v1,…,vn]T. Solving formula (8) to obtain alphaiAnd substituting the value of b into the formula (6) to obtain the chaotic time series regression model of the LSSVM, wherein the chaotic time series regression model comprises the following steps:
Figure BDA0002922997920000114
the speed value of each position point s on the corresponding route is as follows:
Figure BDA0002922997920000115
and after mapping of the flight line s-v is obtained, a flight line model is concluded.
(III) reinforcement learning algorithm
After the four-dimensional track of the air route is calculated, how to allocate and schedule the aircraft to join the air route becomes a more critical problem. The problem of difficult direct decision-making can be solved by applying a reinforcement learning algorithm. The reinforced learning algorithm is used for airplane scheduling and can be divided into two stages:
(1) experimental training phase
The core of the reinforcement learning algorithm is a neural network, and the finally trained neural network can be directly used in a flight simulation environment for airplane deployment.
The reinforcement learning experiment can be mainly divided into two parts: environment and algorithm.
The state, the motion mode, the pneumatic constraint, the reward function, the training target and the like of the aircraft are defined in the environment: the state of the airplane is a list comprising longitude and latitude, heading, altitude, speed, state of a target point and the like of the airplane, the airplane can output action values according to the currently trained neural network and the input state list of the intelligent agent, the action values are processed into actions which can be executed by the airplane after constraint of the pneumatic performance model, such as heading speed and the like, the next state can be calculated through calculation, and one step in one cycle (one screen) is completed.
The most important thing of the reinforcement learning algorithm is the process of intelligent agent learning, namely the process of neural network updating. In the main cycle of the experiment, the current reward can be calculated by the current aircraft state value in each step, and the updating of the neural network is consistent with the increasing direction of the reward value, so that the neural network is updated, namely the learning process of the aircraft.
The method comprises the following specific steps:
1) setting up experimental environment in simulation system
The experimental environment is mainly based on the Bluesky simulation environment. Bluesky is an open air traffic control simulator using the Openap aircraft performance model.
The Bluesky simulator provides a plug-in function module, is a simple and extensible tool for communicating and interacting with the server, and can help to call the function of the flight control module. Therefore, a reinforcement learning experiment environment is established in the plug-in module.
The specific experimental environment settings were as follows:
we chose F16 as the training aircraft type; we select the aircraft birth location at latitude 30 to 38, longitude 103 to 106; to process a large amount of sampled data, we choose to use the maximum simulation speed in the training.
The experiment simulates the landing process of an airplane at the ZUUU airport. At the start of training, the environment has been initialized as follows:
the system randomly generates 3-5 navigation points at reasonable positions of the ZUUU airport landing direction, and simultaneously randomly generates a reasonable time delay sequence, so that the training airplane can land through the navigation points in a correct time sequence. The delay time sequence should be such that 65,180, 252 … … is a random, reasonable time sequence in seconds. The delay time series cannot be too high or too low, which can cause the speed of the aircraft to become unreasonably high or too low. The aircraft randomly emerges in the south airspace of ZUUU, and has random heading, the speed of 500 km/h and the height of 500 m. The AI controller is not involved in the adjustment of the aircraft altitude, and we use the basic "ALT" command of BlueSky to change the altitude, enabling landing simulation.
2) Building reinforced learning algorithm
The motion of the airplane is a continuous motion in a sparse space, and a random strategy is generally selected to update a neural network, so that the defect that parameter adjustment and local optimization are easy to fall into caused by a deterministic strategy can be effectively avoided. The algorithm adopts a PPO (Proximal Policy optimization) algorithm as a prototype, and comprises a Policy network and an evaluation network (a state value function network).
State space:
in the context of reinforcement learning experiments, all possible states can have an impact on the results of the experiment. Therefore, when setting the state space, all parameters that may have an influence on the experimental results need to be considered. Our experimental goal is to reach the target location (latitude, longitude, heading) within a certain time. And in the experiment, two intelligent agents exist, wherein the speed intelligent agent selects the speed, and the heading intelligent agent changes the heading, so that a state space needs to be designed for the two models respectively.
For a heading agent, the goal of the model is to reach the target location (latitude, longitude, altitude) of the target heading, so we design a state space that includes [ Δ lat, Δ lon, tarhdg, hdg ]. Δ lat is the difference between the target latitude and the aircraft latitude, and Δ lon is the difference between the target longitude and the actuator longitude. tarhdg represents the target heading and hdg represents the actuator heading. cas represents the calibrated speed of the aircraft.
For a speed agent, the goal of the model is to reach the target location (latitude, longitude, altitude, heading) at a certain time. Therefore, the first state space we consider is [ Δ lat, Δ lon, tarhdg, hdg, cas, time ]. However, we have finally found that the state space of a velocity agent may only be related to increments of distance. One distance is the distance from the aircraft to the target location and the other distance is the delay time cas. The state space of the model has an additional parameter time, which represents the time remaining from the target.
An action space:
in the nested reinforcement learning algorithm, there are two models of output motion, which are heading and speed, respectively. The motion space for each model is defined as follows:
the motion space of the course intelligent agent is as follows: a. thet=[0,hdg,360]
Wherein the minimum course is 0 degree, the maximum course is 360 degrees, and the motion space is distributed from 0 to 360 degrees.
Motion space of speed agent: a. thet=[vmin,vt-1,vmax]
Wherein v isminIs the minimum allowable calibration speed, vmaxIs the maximum allowable calibration speed and the motion space is a distribution from 0 to 1000.
The reward function:
the aim of the experiment is to reach a defined state (latitude and longitude, heading, speed) within a defined time, and within a large time frame. There are two agents that calculate rewards separately. The input state of one agent is [ delta lat, delta lon, tarhdg, hdg ], and the output is header, which is used to select the path. The input states of the nested agent are [ Δ lat, Δ lon, tarhdg, hdg, cas, time ], output cas, which is used to control arrival time and velocity. Thus, there are two different reward functions in the experiment.
Theoretically, reinforcement learning rewards can be summarized as | | | current state-target state | |, i.e., the canonical or abstract distance between the current state and the target. This is an abstract concept and we should design the reward function according to the specific circumstances.
The reward function of the first agent directs the agent to select a heading, in other words, it selects a route. Based on the input state, we propose the following reward function:
R=αd*d+αh*Δhdg (11)
where d is the distance (m) from the current position of the aircraft to the target position, Δ hdg is equal to the current heading of the aircraft minus the target heading, αdAnd alphahAre coefficients of distance and heading.
The goal of the second agent is to change the speed of the aircraft to arrive at the target location at the correct time.
Δd=αd*(d-d′) (12)
Where d' is the current speed of the aircraft multiplied by the delay time.
3) Carry out training
Reaching a target location (latitude, longitude, heading) at the correct time is a difficult task. The traditional reinforcement learning method and algorithm are both single network structures, and the air traffic control task is difficult to completely solve. If we use a single model, perhaps we can adapt to a single task, for example: constant speed to a certain state requires no time or variable speed to a certain position requires no heading. Therefore, we propose nested reinforcement learning to solve this problem. A new nested reinforcement learning method is proposed to manage air traffic control tasks.
Based on a PPO algorithm method, a nested reinforcement learning model for respectively selecting course and speed is designed by taking advantage of a nested reinforcement learning method proposed by Marc in 2018. By using nested reinforcement learning models, we can train an agent and get more actions than just one. We use one master agent (select heading) with a second agent (select speed) nested in it. The master agent will take an action (select heading) and then the nested model will control the action of changing speed. One important difference between a master agent and a nested agent is the state space. The state space of the main agent is [ Δ lat, Δ lon, tarhdg, hdg ]. The state of the nested agent is [ Δ d ]. In the following we will describe the design details and training process of these two models separately.
The main agent and the nested speed agent of the nested reinforcement learning model have the same neural network structure, namely an AC structure; for the evaluation network, the merit function is defined as:
Figure BDA0002922997920000141
wherein, thetacIs a parameter for evaluating a neural Network (Critic Network) matrix, RtIs an instant reward for the user,
Figure BDA0002922997920000142
is a function of the state value of the next state,
Figure BDA0002922997920000143
is a function of the state value of the current state; γ is a value between 0 and 1, representing a future attenuation factor, which is considered less the further away from the present;
using least squares, parameter θcThe update formula of (2) is:
Figure BDA0002922997920000144
wherein α is an evaluationLearning rate of network definition, thetacAre parameters of a neural network matrix of the evaluation network,
Figure BDA0002922997920000145
representing the parameter thetacThe step size of the update;
for Policy Network (Policy Network), a Policy gradient method is adopted, and pi (a | S)tp) Is shown in state StProbability of lower selection a action; parameter theta of action neural networkpThe update formula of (2) is:
Figure BDA0002922997920000146
where α is a learning rate defined by the policy network, and is the same as the learning rate in equation (14), θ is expressed by the same learning rate αpAre parameters of the neural network matrix of the policy network,
Figure BDA0002922997920000147
representing the parameter thetapThe step size of the update.
And finally, optimizing the random sampling data by adopting a failure experience playback method, thereby optimizing the convergence direction of the neural network and solving the problem of sparse reward.
How the agent explores the environment is crucial to the experiment. The sampling effect in the training process directly influences the convergence efficiency and the convergence result of the algorithm. The best result we can expect is that each sample is good data to reach the target. Therefore, random sampling data are optimized by using the HER method, the neural network has a better convergence direction, and the problem of sparse reward is solved. Nested agents have the same network structure as the master agent. The nested model is different from the master agent in the hyper-parameter.
The simulation system carries out simulation:
the PPO algorithm can be an off-policy algorithm, and the state of the neural network can be stored and loaded in real time.
In the flight simulation system, a trained neural network is called to dispatch the aircraft, target state information is transmitted in real time, and the aircraft can automatically add a four-dimensional track.
Updating of (four) four-dimensional trajectory model and collision avoidance algorithm for four-dimensional trajectory model
And updating the four-dimensional track model according to the simulation time of the simulation system, wherein each four-dimensional track point has a time tag as an identifier. In the simulation system, for each four-dimensional track, the earliest four-dimensional track point of the time label disappears at the tail end of the four-dimensional track firstly, and a new four-dimensional track point is generated after the starting end of the four-dimensional track reaches a certain simulation time.
The conflict of the four-dimensional trajectory model means: aiming at a certain four-dimensional track point, when an aircraft occupies or is about to occupy, a new aircraft selects to occupy the same four-dimensional track point. The conflict avoidance algorithm of the four-dimensional track model is that according to the time identification of the four-dimensional track point, when the new aircraft selects to occupy the four-dimensional track point, whether the time identification of the current four-dimensional track point is occupied or not is judged, if so, the current four-dimensional track point is abandoned, and the selection is carried out again.
(V) results
In order to verify the effectiveness and the applicability of the algorithm, a set of algorithm evaluation system is designed, the design experiment analyzes and compares the implementation effect of the algorithm from three angles of distance, angle and time, the obtained experiment result and data are shown, and the stability and the accuracy of the algorithm are further verified according to the established evaluation system and the corresponding index requirements.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (8)

1. An air control method based on reinforcement learning and four-dimensional track is characterized by comprising the following steps:
s1: establishing airplane pneumatic performance models of different types by modeling the engine performance of different types;
s2: acquiring four-dimensional track data of different types of airplanes on different routes according to the airplane pneumatic performance model; generating a four-dimensional track model of the airline-type through data playback;
s3: based on a reinforcement learning algorithm, a neural network is built, four-dimensional tracks are pressed on the movement of the airplane for training, a nesting reinforcement learning model of a nesting speed intelligent body in a heading intelligent body is built, the selection of the route of the airplane is realized by selecting the target heading of the airplane, and the control of the arrival time of the airplane is realized by selecting the target speed of the airplane, so that the function that the airplane presses the four-dimensional track model according to the specified time, speed, heading and altitude is realized.
2. The air control method based on reinforcement learning and four-dimensional trajectory according to claim 1, wherein the specific process of S1 is as follows: the method comprises the steps of defining a key position point with airplane motion state information, selecting an airplane of a specific airplane type in a flight simulation system with an airplane pneumatic performance model to simulate flight according to a route summarized by a specified position point, recording information including flight time, six degrees of freedom of the airplane and environmental factors at fixed time intervals, and storing the information in a recording file.
3. The air control method based on reinforcement learning and four-dimensional trajectory according to claim 1, wherein the specific process of S2 is as follows:
s21: acquiring the track points meeting the conditions to form a track point set G, and mapping each track point to a flight line to obtain a discrete track point mapping point set G' on the flight line;
G={gi,i=1,2,3...,n} (1)
G'={g'i,i=1,2,3...,n} (2)
wherein, giIs a navigation point, g 'meeting the conditions'iIs a track point giA mapped point on the flight path; n is the number of samples;
s22: calculating mapping points g 'of each navigation point'iDistance s to the start of each legiObtaining a sample set W' of discrete track point mapping points on the route with respect to distance and speed;
W'={(si,vi),i=1,2,…,n} (3)
wherein s isiIs the distance from the sampling point to the beginning of the flight, viIs a one-dimensional output vector and is expressed at the starting point s of the range flightiSpeed of the aircraft at location;
s23: for the collected sample set W', an LSSVM in machine learning is selected, and the distance xi between each sample point and each hyperplane is usediRepresents the empirical risk of LSSVM, and the least empirical risk of training is
Figure FDA0002922997910000011
Minimum, its mathematical model is:
Figure FDA0002922997910000021
wherein w is viAbout siA linear parameter of (d); b is a linear offset;
according to the principle of minimizing the structural risk, the LSSVM needs to ensure the distance maximization of two classification hyperplanes, and the solved mathematical model is a compromise between empirical risk and structural risk, namely
Figure FDA0002922997910000022
Where C is a penalty factor and the distance ξ from a sample point to its hyperplaneiIs a training error;
s33: to solve this optimization problem, Lagrange's function is introduced:
Figure FDA0002922997910000023
wherein alpha isiN is Lagrange multiplier, e is unit vector;
Figure FDA0002922997910000024
representation wsiw/|w|;
The following relationship is obtained from the KKT condition:
Figure FDA0002922997910000025
kernel function
Figure FDA0002922997910000026
sjIs a navigation point mapping point g'jDistance to the starting point of each leg; then the solution form of equation (7) is converted into:
Figure FDA0002922997910000027
wherein Q is an element KijK × k order kernel matrix of (1), I is the identity matrix, and vector e ═ 1, …,1]TThe vector α ═ α1,…,αn]TVector v ═ v1,…,vn]T
Solving formula (8) to obtain alphaiAnd substituting the value of b into the formula (6) to obtain the chaotic time series regression model of the LSSVM, wherein the chaotic time series regression model of the LSSVM is as follows:
Figure FDA0002922997910000028
the speed value of each position point s on the corresponding route is as follows:
Figure FDA0002922997910000031
and after mapping of the flight path s-v is obtained, a four-dimensional trajectory model of the flight path-model is induced.
4. The air control method based on reinforcement learning and four-dimensional trajectory according to claim 3, wherein the mapping of each trajectory point to a route in S21 comprises:
mapping the data of the straight line route: drawing a perpendicular line to the straight course l through each course point, and generating an intersection point with the course, namely a mapping point corresponding to the course point;
and (3) arc course data mapping: and connecting each track point with the center of a circle of an arc course, wherein the intersection point of the formed straight line and the arc line is the mapping point corresponding to the track point.
5. The reinforced learning and four-dimensional trajectory-based air control method according to claim 1, wherein the S3 specifically comprises:
s31: setting up an experimental environment in a simulation system, determining the type of a training aircraft, the birth position of the aircraft and the simulation speed, and initializing the environment;
s32: according to the PPO algorithm, a reinforcement learning algorithm is built:
(1) setting a state space:
there are two agents in the reinforcement learning experiment: selecting a speed intelligent agent of the speed and a course intelligent agent for changing the course;
setting the state space of a course intelligent agent as follows: [ Δ lat, Δ lon, tarhdg, hdg ];
wherein, Δ lat is the difference between the target latitude and the aircraft latitude, and Δ lon is the difference between the target longitude and the aircraft latitude longitude;
tarhdg represents the target heading, hdg represents the actuator heading;
the state space for setting the speed agent is: [ Δ lat, Δ lon, tarhdg, hdg, cas, time ];
wherein cas represents the calibrated speed of the aircraft and time represents the remaining time from the target;
(2) setting an action space:
defining an action space of a heading agent: a. thet=[0,hdg,360]
Wherein the minimum course is 0 degree, the maximum course is 360 degrees, and the motion space is distributed from 0 to 360 degrees;
defining the motion space of the velocity agent: a. thet=[vmin,vt-1,vmax]
Wherein v isminIs the minimum allowable calibration speed, vmaxIs the maximum allowable calibration speed, the motion space is a distribution from 0 to 1000;
(3) setting a reward function
The reward function of the heading agent is used for guiding the agent to select the heading, and is expressed as:
R=αd*d+αh*Δhdg (11)
wherein d is the distance from the current position of the airplane to the target position, delta hdg is the current heading minus the target heading, alphadAnd alphahThe coefficients of the distance and the course are respectively;
the velocity intelligent reward function is used to change the speed of the aircraft to arrive at the target location at the correct time, expressed as:
Δd=αd*(d-d′) (12)
wherein d' is the current speed of the aircraft multiplied by the delay time;
s33: training a nested reinforcement learning model with a course and a speed selected respectively:
(1) taking a course intelligent agent as a main intelligent agent, nesting a speed intelligent agent in the main intelligent agent, adopting an action of selecting the course through the main intelligent agent, and realizing an action of controlling and changing the speed through the nested speed intelligent agent; the state space of the main agent is [ delta lat, delta lon, tarhdg, hdg ], and the state of the nested agent is [ delta d ];
(2) the main agent and the nested speed agent of the nested reinforcement learning model have the same neural network structure, namely an AC structure; for the evaluation network, the merit function is defined as:
Figure FDA0002922997910000041
wherein, thetacIs to evaluate the parameters of the neural network matrix, RtIs an instant reward for the user,
Figure FDA0002922997910000042
is a function of the state value of the next state,
Figure FDA0002922997910000043
is a function of the state value of the current state; γ is a value between 0 and 1, representing a future attenuation factor, which is considered less the further away from the present;
using least squares, parameter θcThe update formula of (2) is:
Figure FDA0002922997910000044
where α is the learning rate of the evaluation network definition, θcAre parameters of a neural network matrix of the evaluation network,
Figure FDA0002922997910000045
representing the parameter thetacThe step size of the update;
for the strategy network, a strategy gradient method is adopted, and pi (a | S)tp) Is shown in state StProbability of lower selection a action; parameter theta of action neural networkpThe update formula of (2) is:
Figure FDA0002922997910000046
where α is a learning rate defined by the policy network, and is the same as the learning rate in equation (14), θ is expressed by the same learning rate αpAre parameters of the neural network matrix of the policy network,
Figure FDA0002922997910000047
representing the parameter thetapThe step size of the update;
and finally, optimizing the random sampling data by adopting a failure experience playback method, thereby optimizing the convergence direction of the neural network and solving the problem of sparse reward.
6. The reinforced learning and four-dimensional trajectory-based air control method according to claim 5, wherein in the step S31, initializing the environment comprises: randomly generating a plurality of navigation points in the landing direction of the airport, and randomly generating a delay time sequence to enable the training airplane to land through the navigation points in a correct time sequence; the plane is randomly born in a designated area, the course is random, and the speed and the altitude are set.
7. The reinforced learning and four-dimensional trajectory-based air control method according to claim 1, wherein the S3 is followed by further comprising:
s4: updating the four-dimensional track model according to the simulation time of the simulation system, so that each four-dimensional track point has a time tag as a time identifier; and judging whether the time mark of the current four-dimensional track point is occupied or not when the new aircraft selects to occupy the four-dimensional track point according to the time mark of the four-dimensional track point, and if so, giving up the current four-dimensional track point and reselecting.
8. The air control method based on reinforcement learning and four-dimensional track according to claim 1, wherein in the route-model four-dimensional track model established in S2, each four-dimensional track point is generated in a single route or distributed in different routes having different routes with an intersection, and the same four-dimensional track point distributed on different routes becomes the same four-dimensional track point at the route with the intersection.
CN202110134760.1A 2021-01-29 2021-01-29 Air control method based on reinforcement learning and four-dimensional track Expired - Fee Related CN112818599B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110134760.1A CN112818599B (en) 2021-01-29 2021-01-29 Air control method based on reinforcement learning and four-dimensional track

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110134760.1A CN112818599B (en) 2021-01-29 2021-01-29 Air control method based on reinforcement learning and four-dimensional track

Publications (2)

Publication Number Publication Date
CN112818599A true CN112818599A (en) 2021-05-18
CN112818599B CN112818599B (en) 2022-06-14

Family

ID=75860960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110134760.1A Expired - Fee Related CN112818599B (en) 2021-01-29 2021-01-29 Air control method based on reinforcement learning and four-dimensional track

Country Status (1)

Country Link
CN (1) CN112818599B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221469A (en) * 2021-06-04 2021-08-06 上海天壤智能科技有限公司 Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator
CN113393495A (en) * 2021-06-21 2021-09-14 暨南大学 High-altitude parabolic track identification method based on reinforcement learning
CN114115304A (en) * 2021-10-26 2022-03-01 南京航空航天大学 Aircraft four-dimensional climbing track planning method and system
CN114141062A (en) * 2021-11-30 2022-03-04 中国电子科技集团公司第二十八研究所 Aircraft interval management decision method based on deep reinforcement learning
CN115524964A (en) * 2022-08-12 2022-12-27 中山大学 Rocket landing real-time robust guidance method and system based on reinforcement learning
CN115691231A (en) * 2023-01-03 2023-02-03 中国电子科技集团公司第二十八研究所 Method and system for simulation deduction and conflict resolution by using air plan

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2430278A (en) * 2004-04-29 2007-03-21 Blaga N Iordanova Global neural network for conflict resolution of flights
CN101692315A (en) * 2009-09-25 2010-04-07 民航总局空管局技术中心 Method for analyzing high precision 4D flight trajectory of airplane based on real-time radar data
CN106340209A (en) * 2015-01-07 2017-01-18 江苏理工学院 Control method of air traffic control system for 4D trajectory-based operation
US10037704B1 (en) * 2017-02-01 2018-07-31 David Myr Automatic real-time air traffic control system and method for maximizing landings / takeoffs capacity of the airport and minimizing aircrafts landing times
CN109542876A (en) * 2018-11-20 2019-03-29 南京莱斯信息技术股份有限公司 Extracting method based on Hadoop data mining aircraft experience locus model key factor
CN110806759A (en) * 2019-11-12 2020-02-18 清华大学 Aircraft route tracking method based on deep reinforcement learning
CN110930770A (en) * 2019-11-06 2020-03-27 南京莱斯信息技术股份有限公司 Four-dimensional track prediction method based on control intention and airplane performance model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2430278A (en) * 2004-04-29 2007-03-21 Blaga N Iordanova Global neural network for conflict resolution of flights
CN101692315A (en) * 2009-09-25 2010-04-07 民航总局空管局技术中心 Method for analyzing high precision 4D flight trajectory of airplane based on real-time radar data
CN106340209A (en) * 2015-01-07 2017-01-18 江苏理工学院 Control method of air traffic control system for 4D trajectory-based operation
US10037704B1 (en) * 2017-02-01 2018-07-31 David Myr Automatic real-time air traffic control system and method for maximizing landings / takeoffs capacity of the airport and minimizing aircrafts landing times
CN109542876A (en) * 2018-11-20 2019-03-29 南京莱斯信息技术股份有限公司 Extracting method based on Hadoop data mining aircraft experience locus model key factor
CN110930770A (en) * 2019-11-06 2020-03-27 南京莱斯信息技术股份有限公司 Four-dimensional track prediction method based on control intention and airplane performance model
CN110806759A (en) * 2019-11-12 2020-02-18 清华大学 Aircraft route tracking method based on deep reinforcement learning

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
LAN MA等: "A Hybrid CNN-LSTM Model for Aircraft 4D Trajectory Prediction", 《IEEE ACCESS》 *
PAVEEN JUNTAMA等: "A Distributed Metaheuristic Approach for Complexity Reduction in Air Traffic for Strategic 4D Trajectory Optimization", 《IN PROCEEDINGS OF THE 2020 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND》 *
ZHI-JUN WU等: "A 4D Trajectory Prediction Model Based on the BP Neural Network", 《JOURNAL OF INTELLIGENT SYSTEM》 *
季玉龙等: "飞行器的仪表着陆仿真系统", 《系统仿真学报》 *
宋歌: "基于细分着色的飞行仿真地形建模方法", 《工程科学与技术》 *
江波等: "基于深度强化学习的航路点飞行冲突解脱", 《航空计算技术》 *
王敏 等: "基于集群共享存储的飞行数据处理系统设计", 《信息化研究》 *
许文君: "空管自动化系统及数据融合方法研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221469A (en) * 2021-06-04 2021-08-06 上海天壤智能科技有限公司 Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator
CN113393495A (en) * 2021-06-21 2021-09-14 暨南大学 High-altitude parabolic track identification method based on reinforcement learning
CN113393495B (en) * 2021-06-21 2022-02-01 暨南大学 High-altitude parabolic track identification method based on reinforcement learning
CN114115304A (en) * 2021-10-26 2022-03-01 南京航空航天大学 Aircraft four-dimensional climbing track planning method and system
CN114141062A (en) * 2021-11-30 2022-03-04 中国电子科技集团公司第二十八研究所 Aircraft interval management decision method based on deep reinforcement learning
CN114141062B (en) * 2021-11-30 2022-11-01 中国电子科技集团公司第二十八研究所 Aircraft interval management decision method based on deep reinforcement learning
CN115524964A (en) * 2022-08-12 2022-12-27 中山大学 Rocket landing real-time robust guidance method and system based on reinforcement learning
CN115691231A (en) * 2023-01-03 2023-02-03 中国电子科技集团公司第二十八研究所 Method and system for simulation deduction and conflict resolution by using air plan

Also Published As

Publication number Publication date
CN112818599B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN112818599B (en) Air control method based on reinforcement learning and four-dimensional track
Zeng et al. A deep learning approach for aircraft trajectory prediction in terminal airspace
CN100591900C (en) Flight control system having a three control loop design
US20180261101A1 (en) Apparatus to generate aircraft intent and related methods
Razzaghi et al. A survey on reinforcement learning in aviation applications
Zhang et al. 3D path planning and real-time collision resolution of multirotor drone operations in complex urban low-altitude airspace
Dong et al. Deep learning in aircraft design, dynamics, and control: Review and prospects
Swierstra et al. Common trajectory prediction capability for decision support tools
Brittain et al. Autonomous separation assurance with deep multi-agent reinforcement learning
Dong et al. Study on the resolution of multi-aircraft flight conflicts based on an IDQN
Pham et al. A generative adversarial imitation learning approach for realistic aircraft taxi-speed modeling
Rodriguez-Sanz et al. 4D-trajectory time windows: definition and uncertainty management
Zijian et al. Imaginary filtered hindsight experience replay for UAV tracking dynamic targets in large-scale unknown environments
De Marco et al. A deep reinforcement learning control approach for high-performance aircraft
Başpınar et al. Optimization-based autonomous air traffic control for airspace capacity improvement
Li et al. A warm-started trajectory planner for fixed-wing unmanned aerial vehicle formation
Xie et al. Long and short term maneuver trajectory prediction of UCAV based on deep learning
Jiang et al. A deep reinforcement learning strategy for UAV autonomous landing on a platform
Başpinar et al. Mission planning and control of multi-aircraft systems with signal temporal logic specifications
CN113093568A (en) Airplane automatic driving operation simulation method based on long-time and short-time memory network
Keong et al. Reinforcement learning for autonomous aircraft avoidance
Lee et al. Predicting interactions between agents in agent-based modeling and simulation of sociotechnical systems
Zhu et al. Multi-constrained intelligent gliding guidance via optimal control and DQN
Xu et al. Reinforcement learning for autonomous morphing control and cooperative operations of UAV cluster
Konyak et al. A demonstration of an aircraft intent interchange specification for facilitating trajectory-based operations in the national airspace system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220614