CN114141062B - Aircraft interval management decision method based on deep reinforcement learning - Google Patents

Aircraft interval management decision method based on deep reinforcement learning Download PDF

Info

Publication number
CN114141062B
CN114141062B CN202111443511.7A CN202111443511A CN114141062B CN 114141062 B CN114141062 B CN 114141062B CN 202111443511 A CN202111443511 A CN 202111443511A CN 114141062 B CN114141062 B CN 114141062B
Authority
CN
China
Prior art keywords
flight
network
action
current
flights
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111443511.7A
Other languages
Chinese (zh)
Other versions
CN114141062A (en
Inventor
刘泽原
徐秋程
丁辉
史艳阳
吴靓浩
张婧婷
陈飞飞
徐珂
谈青青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN202111443511.7A priority Critical patent/CN114141062B/en
Publication of CN114141062A publication Critical patent/CN114141062A/en
Application granted granted Critical
Publication of CN114141062B publication Critical patent/CN114141062B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G5/00Traffic control systems for aircraft, e.g. air-traffic control [ATC]
    • G08G5/04Anti-collision systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G5/00Traffic control systems for aircraft, e.g. air-traffic control [ATC]
    • G08G5/0073Surveillance aids

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention provides an aircraft interval management decision method based on deep reinforcement learning, which can realize end-to-end direct control from input to output. The method applies deep reinforcement learning to the aviation field, and designs an aircraft interval management decision method based on the deep reinforcement learning. And predicting and judging the situation in the terminal area by using a deep circulation Q network, and realizing the independent maintenance of the flight safety interval in the terminal area by designing and training a flight speed regulation strategy. The method is used for allocating the terminal area air aircrafts under the busy operation condition, so that conflict resolution and continuous conflict-free operation of busy sectors are realized, and the pressure of interval allocation decision of a controller on a complex operation scene is relieved, and the sector control operation efficiency and the safety guarantee capability are improved.

Description

Aircraft interval management decision method based on deep reinforcement learning
Technical Field
The invention relates to the field of civil aviation air traffic control, in particular to an aircraft interval management decision method based on deep reinforcement learning.
Background
With the rapid development of the air transportation industry, the demands of daily travel, cargo transportation and the like are rapidly increased, daily average flights of a busy airport are more than 1000 times, and certain efficiency needs to be sacrificed to ensure the safe flight of the aircraft at the safe interval of the aircraft under the busy condition, so that other influences are brought, such as the lengthening of the average flight time of the flights, the deviation of flight paths from standard on-off routes, the increase of the control pressure of controllers and the like. The existing control system can only detect short-term conflicts, can not predict medium-term and long-term conflicts, and can not provide interval maintenance decision suggestions for controllers.
Reinforcement learning is an important branch of machine learning, which is essential to describe and solve the problem of an agent learning a strategy to maximize reward or achieve a specific goal during interaction with the environment. With the development of deep learning, the reinforcement learning can directly extract and learn feature knowledge from original input data by means of a neural network in the deep learning, and then a control strategy is learned by utilizing a traditional reinforcement learning algorithm according to the extracted feature information without manually extracting or heuristically learning features. Such a reinforcement learning method combined with deep learning is called deep reinforcement learning. Therefore, the interval self-maintenance between the aircrafts is realized by using the deep reinforcement learning method, and the method has practical significance for improving the sector operation efficiency and reducing potential conflicts.
Disclosure of Invention
The purpose of the invention is as follows: the method aims to provide a control decision suggestion for a controller, realize automatic aircraft conflict resolution and autonomous flight safety interval maintenance, and improve the operation efficiency of the sector in a high-density operation state.
The method is based on the current operating control automation system, acquires basic situation information such as the position, speed, course and the like of flights in a sector, and sends speed-regulating and height-regulating instructions to flights with potential conflicts according to information such as the structure of a sector standard departure and departure program, the aircraft wake interval type and the like, so that the potential conflicts are eliminated, continuous normal operation of the sector under high-density operation is realized, and the control pressure of a controller under the condition is reduced.
In order to achieve the purpose, the method provides an aircraft interval management decision method based on deep reinforcement learning, so that the situation in the current sector is analyzed, speed regulation suggestions are provided for flights which may conflict, and potential conflicts are resolved.
The invention comprises the following steps:
step 1: defining action and state space of a deep reinforcement learning environment for aircraft flight command;
step 2: constructing an interval fine decision deep reinforcement learning network of the aircraft;
and step 3: training an interval fine decision deep reinforcement learning network of the aircraft;
and 4, step 4: and fine management of the aircraft interval is realized through an aircraft interval fine decision deep reinforcement learning network.
The step 1 comprises the following steps:
the method comprises the steps that two deep reinforcement learning intelligent bodies are used for selecting an intelligent body and an action selecting intelligent body for a flight respectively, wherein the state space of the flight selecting intelligent body is position information, course information and model information of all controllable flights in a current sector, and the action space is that less than or equal to two flights are selected from all controllable flights in the current sector for maneuvering; the state space of the action selection agent selects the standby movable flight selected by the agent for the flight, the position information, the course information and the model information of the three flights closest to the standby movable flight, and the distance from the standby movable flight, and the action space is the maneuvering action of the current flight at the next moment.
The step 2 comprises the following steps:
the aircraft interval fine decision depth reinforcement learning network comprises a flight selection intelligent agent and an action selection intelligent agent, wherein the flight selection intelligent agent comprises a Target value calculation network Target Q and an action value calculation network Eval Q; the Target Q network is used for training the Eval Q network and judging the output of the Eval Q network;
the action selection intelligent body comprises a Target value calculation network Target Q, an action value calculation network EvalQ and two long-short term memory neural networks LSTM, wherein the Eval Q network is used for receiving the positions, the speeds and the heights of the standby movable flight and three flights nearest to the standby movable flight selected by the flight selection intelligent body at the current moment, outputting the action values Q of all optional actions of the standby movable flight at the current moment, selecting the action with the highest action value to execute, the Target Q network is used for training the Eval Q network and judging the output of the Eval Q network, and the LSTM network is used at the last ends of the Target Q network and the Eval Q network and is used for processing time sequence data generated by flight after the flight enters a sector to judge the future movement trend of the Target flight.
The step 3 comprises the following steps:
step 3.1: initializing parameters of the deep reinforcement learning algorithm, including the total number of rounds E of algorithm iteration and the characteristic dimension n of the state spacesDimension n of motion spaceaThe step length alpha of each updating parameter, the attenuation factor gamma of the action value function, the action exploration rate belongs to, the weight parameters of the flight selection intelligent body and the action selection intelligent body Eval Q network, the weight parameter of the initialized Target Q network is the same as that of the Eval Q network, the soft updating step length tau of the initialized Target Q network, the number m of batch training samples, the size d of the experience playback pool and the playback data volume d of starting trainingstartAnd the number c of updating rounds of the Target Q network; initializing a simulation environment state;
step 3.2: receiving real-time broadcast type automatic correlation surveillance radar ADS-B data, screening all controllable flights in the current time period, and acquiring longitude, latitude, altitude, speed, course and model information from the ADS-B data; normalizing longitude, latitude, altitude, speed and heading information, and scaling the data to [0,1]Obtaining normalized features in the interval, coding the machine type information one-hot to obtain one-hot feature vector, and obtaining the one-hot feature vectorThe normalized features are spliced with one-hot feature vectors to form feature vectors of the environment at the current moment
Figure BDA0003384363000000031
Step 3.3: feature vector of current time of using environment
Figure BDA0003384363000000032
Obtaining flight selection agent action as input of Eval Q network of flight selection agent
Figure BDA0003384363000000033
Feature vector of environment at current moment
Figure BDA0003384363000000034
And
Figure BDA0003384363000000035
stitching to form a new eigenvector
Figure BDA0003384363000000036
And inputting the motion selection intelligent agent into the Eval Q network to obtain the motion of the motion selection intelligent agent
Figure BDA0003384363000000037
Step 3.4: will be provided with
Figure BDA0003384363000000038
Executing in a simulation environment, obtaining new longitude and latitude, altitude, course and speed information from ADS-B data after waiting for the flight to execute the action, and forming the characteristic vector of the next moment by the method of step 3.2
Figure BDA0003384363000000039
Step 3.5: calculating a reward function r for an action selection agentfcaAnd action selection agent's inbound flight altitude descent reward function rfaaJudging whether the current training is performedNeeds to be ended, gets the ending identifier is _ endtWill be
Figure BDA00033843630000000310
And
Figure BDA00033843630000000311
storing the experience data into respective experience playback pools;
step 3.5: if the current simulation is not finished, repeating the steps 2 to 8, and if the current simulation is finished, starting the next round of simulation;
step 3.6: when the data amount in the experience playback pool is greater than or equal to dstartWhen the training is started, the training process is started;
step 3.7: and if the number of the current training rounds is integral multiple of c, updating the weight parameters of the Target Q network by adopting a soft updating strategy.
Step 3.2 comprises: will be provided with
Figure BDA00033843630000000312
As the input of the Eval Q network of the flight selection agent, outputting an action value set Q corresponding to all actionst
Step 3.3 includes: feature vector of environment at current moment
Figure BDA00033843630000000313
The action value set Q of the currently selected flight is calculated by forward propagation in the Eval Q network of the input flight selection agenttWhile generating [0,1 ]]Random number n within intervalrandomIf n israndom<E, e is (0, 1), e represents any value between 0 and 1, the initial value of the e is 0.99, the e is multiplied by a threshold value of 0.95 along with the end of each training round, and when the e is<When the content is 0.1, setting the epsilon to 0, and randomly selecting an action in the action space, namely randomly selecting one flight or two flights as target flights; otherwise, selecting the flight in the sector with the maximum action value as the target flight and storing as
Figure BDA0003384363000000041
Feature vector of current time of environment
Figure BDA0003384363000000042
And with
Figure BDA0003384363000000043
Stitching to form a new eigenvector
Figure BDA0003384363000000044
In an Eval Q network with input action selection agents, the value of each maneuver is calculated through forward propagation, and 0,1 is generated]Random number n within intervalrandomIf n israndom<E, randomly selecting a mechanical action in the action space; otherwise, selecting the action with the maximum action value as the current maneuver action and storing the maneuver action as
Figure BDA0003384363000000045
Step 3.5 comprises: calculating the reward function r of the flight selection agent by adopting the following formulafcaAnd action selection agent's inbound flight altitude descent reward function rfaa
rfca=1000/disflights+10000/disairport (1)
Figure BDA0003384363000000046
Step 3.6 comprises: judging whether the current training needs to be finished or not, and using x for flight ii,yi,hiRespectively representing the current longitude, latitude, altitude of the flight,
Figure BDA0003384363000000047
respectively representing the longitude, latitude and altitude of the destination point, if all flights arrive at the destination point, namely:
Figure BDA0003384363000000048
or a flight exceeds the boundary of the sector, where sector represents the sector longitude and latitude boundary, hmin,hmaxRespectively represent the upper and lower bounds of the sector height:
Figure BDA0003384363000000049
or the sector handover condition is not satisfied:
Figure BDA0003384363000000051
or two flights have collided, where δ represents a distance safety threshold:
Figure BDA0003384363000000052
if one of the above four conditions occurs, the end identifier is _ end is usedtThe variable is stored as True value True, otherwise it is stored as False value False.
The step 4 comprises the following steps:
step 4.1: establishing network connection with a control automation system, interacting with the control automation system in a message middleware communication mode, extracting current control sector structure information, selecting a corresponding trained model according to the current control sector structure information, and reloading trained model parameters;
step 4.2: receiving the longitude and latitude, the height, the speed course and the model information of the flight in the current sector from a control automation system, normalizing and splicing the longitude and latitude, the height, the speed course and the model information of the flight in the current sector into a flight characteristic vector which is used as the input of a flight selection intelligent agent; the flight selection agent selects flights to be regulated according to the current situation, the flights are spliced with flight information characteristic vectors to be used as input of the action selection agent, the action selection agent selects the maneuver action to be executed by the current flight, and a corresponding control instruction is generated;
step 4.3: synthesizing a control instruction into a control voice, pushing the control voice into a control automation system through a message middleware, and displaying the control voice on a control interface; and monitoring the execution condition of the pilot in real time through the return situation of the control automation system, retransmitting the instruction when the execution instruction is inconsistent or is not executed, and cancelling the command of the flight after commanding the flight to the transit point.
The invention has the following beneficial effects:
1. improving sector operation efficiency under busy conditions
Taking the current approach control as an example, when the number of flights in a sector reaches a certain value, a controller can preferentially adopt a flight spiral waiting control means under the condition that a potential conflict cannot be mediated, and a situation of multiple spirals can occur under an extreme condition. The method can solve the potential conflict without using a hovering waiting means, improve the sector capacity and the operation efficiency, reduce the average flight time of flights and save fuel economy. The control load of the controller is lightened.
2. Reducing controller workload
The method can provide a control decision suggestion for the controller, and the controller only needs to judge whether the decision needs to be executed according to the current situation in the sector, thereby reducing the control load of the controller in a busy operation state.
3. Improve 4D prediction accuracy and facilitate continuous descending/continuous ascending operation
At present, one reason why the 4D forecast longitude is inaccurate is that the flight path of the flight in the approach sector is affected by factors such as a control strategy and the like, and the flight path cannot fly according to a standard approach and departure procedure strictly, so that the path in the approach sector cannot be forecasted. After the method is used, the airplane can avoid collision along a standard approach and departure procedure in the approach sector. Furthermore, the standard entering and leaving field program considering continuous descending/continuous ascending is beneficial to the running of the continuous descending/continuous ascending program after the method is used.
Drawings
The foregoing and other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a flow chart of a method for fine management of aircraft separation based on deep reinforcement learning.
Fig. 2 is a schematic diagram of a network architecture.
Fig. 3 is a structural diagram of a sector of the Nanjing Access AP 01.
Fig. 4 is a swing height sectional view.
Detailed Description
The application scene of the method is that the ADS-B data is required to be accessed when the traffic control seats in the control area are approached, and the method carries out targeted training for different control area structures and runway operation modes. When the above factors change, such as seat setting change, sector structure adjustment, and switching to the operation mode, calculation needs to be performed on the trained model corresponding to the switched content switching. The invention comprises the following steps:
1. defining action and state space of a deep reinforcement learning environment for aircraft flight command;
the invention uses two deep reinforcement learning agents, which respectively select agents for flights and agents for actions. The state space of the flight selection agent is position information (longitude, latitude and height), course information and machine type information of all controllable flights in the current sector, and the action space is that less than or equal to two flights are selected from all controllable flights in the current sector for maneuvering; the state space of the action selection agent selects the standby flight selected by the agent for the flight and the position information (longitude, latitude and height) of three flights nearest to the flight, the course information, the model information and the distance from the flight, and the action space is the maneuvering action of the next moment of the current flight, such as descending and ascending of the altitude, acceleration and deceleration of the speed and the like.
2. Method for constructing interval fine decision deep reinforcement learning network of aircraft
The deep reinforcement learning network comprises an agent for selecting flights and an agent for selecting actions. The flight selection agent consists of a Target Q network and an Eval Q network, wherein the Eval Q network is used for receiving the environment state at the current moment, outputting the action values Q of all selectable actions at the current moment and selecting the action with the highest action value to execute; the Target Q network is used for training the Eval Q network and judging the output of the Eval Q network. The action selection intelligent body consists of a Target Q network, an Eval Q network and two LSTM networks, wherein the Eval Q network is used for receiving information of a standby flight selected by the flight selection intelligent body at the current moment and positions, speeds, heights and the like of three flights nearest to the flight, outputting action values Q of all selectable actions of the flight at the current moment, selecting an action with the highest action value to execute, the Target Q network is used for training the Eval Q network and judging the output of the Eval Q network, and the LSTM networks are used at the last ends of the Target Q network and the Eval Q network and used for processing time sequence data to judge the future movement trend of the Target flight.
3. Deep reinforcement learning network for training interval fine decision of aircraft
The training steps of the aircraft interval fine decision deep reinforcement learning network are as follows
Step 3.1: initializing parameters of the deep reinforcement learning algorithm, including the total number of rounds E of algorithm iteration and the characteristic dimension n of the state spacesDimension n of motion spaceaUpdating the step length alpha of the parameters each time, the attenuation factor gamma of the action value function, the action exploration rate epsilon, the weight parameters of the Eval Q network, initializing the weight parameters of the Target Q network to be the same as the Eval Q network, the soft updating step length tau of the Target Q network, the number m of samples trained in batches, the size d of an experience playback pool, and the playback data volume d of the starting trainingstartThe Target Q network updates the round number c. The simulation environment state is initialized.
Step 3.2: processing input data, receiving real-time ADS-B data, screening all controllable flights in the current time period, and acquiring information such as longitude, latitude, altitude, speed, course, model and the like from the ADS-B data. Normalizing longitude, latitude, altitude, speed and heading information according to possible maximum value and minimum value thereof, and scaling the data to [0,1 ]]Within a regionThe model information is coded by one-hot, and the normalized features are spliced with one-hot feature vectors to form the feature vectors of the environment at the current moment
Figure BDA0003384363000000071
Step 3.3: and inputting the feature vector of the current moment of the environment into an Eval Q network of the flight selection agent, and calculating the action value of each currently selected flight through forward propagation. Simultaneous generation of [0,1]Random number n within intervalrandom. If n israndom<Selecting an action in the action space at random, namely selecting one or two flights as target flights at random; otherwise, selecting the flight in the sector with the maximum action value as the target flight and storing as
Figure BDA0003384363000000072
Feature vector of environment
Figure BDA0003384363000000073
And
Figure BDA0003384363000000074
stitching to form a new feature vector
Figure BDA0003384363000000075
The value of each maneuver is calculated by forward propagation in the EvalQ network of the input action selection agent. Simultaneous generation of [0,1]Random number n within a spanrandom. If n israndom<Selecting a mechanical action in the action space at random; otherwise, selecting the action with the maximum action value as the current maneuver action and storing the maneuver action as
Figure BDA0003384363000000081
Step 3.4: will be provided with
Figure BDA0003384363000000082
The method is applied to a simulation environment, and new data is acquired from ADS-B data after the flight executes actionsLongitude and latitude, altitude, course, speed, etc. to form
Figure BDA0003384363000000083
Feature vectors, generated according to step 3.5
Figure BDA0003384363000000084
A feature vector;
step 3.5: training reinforcement learning network calculates flight selection reward according to formulas (1) and (2)
Figure BDA0003384363000000085
And action selection rewards
Figure BDA0003384363000000086
rfca=1000/disflights+10000/disairport (1)
Figure BDA0003384363000000087
Step 3.6: judging whether the current training needs to be finished or not, and using x for flight ii,yi,hiIndicating the current longitude, latitude, altitude,
Figure BDA0003384363000000088
representing the longitude, latitude, altitude of the destination point, if all flights arrive at the destination point, then:
Figure BDA0003384363000000089
or a flight is beyond a sector boundary (latitude and longitude boundary or altitude boundary):
Figure BDA00033843630000000810
or the sector handover condition is not satisfied:
Figure BDA00033843630000000811
or two flights conflict
Figure BDA00033843630000000812
If one of the above four conditions occurs, is _ end istThe variable is stored as True, otherwise it is stored as False.
Step 3.7: will be provided with
Figure BDA00033843630000000813
And
Figure BDA00033843630000000814
storing the data into respective experience playback pools when the data amount in the experience playback pools is less than dstartAnd then, repeating the steps from 3.2 to 3.8,
step 3.7.1 when the data size of the empirical playback pool is greater than dstartAt that time, m samples are sampled from the empirical playback set,
step 3.7.2 calculates the target Q value for each sample as shown in equation (7). When the amount of data in the empirical playback pool is greater than d, the oldest data is deleted with each new addition.
Figure BDA0003384363000000091
Step 3.7.3, calculating the mean square error loss (loss) of all samples, and updating the weight parameters of the Eval Q network through inverse gradient propagation, wherein the calculation formula of the loss is as follows:
Figure BDA0003384363000000092
step 3.8: and if the current training round number is an integral multiple of c, updating the weight parameters of the flight selection agent and the action selection agent Target Q network by adopting a soft updating strategy, wherein w' represents the weight parameters of the Target Q network, and w represents the weight parameters of the Eval Q network. The formula for the soft update strategy is as follows.
w′=τw+(1-τ)w′ (9)
4. Aircraft interval fine decision deep reinforcement learning network for realizing aircraft interval fine management
Step 4.1: after the model training is finished, the model training system establishes network connection with the control automation system, and interacts with the control automation system in a message middleware communication mode. Firstly, extracting the structural information of the current control sector, the standard approach and departure route, the runway running mode, the airspace restriction information and the like, and selecting a corresponding trained model according to the information and overloading the trained model parameters.
Step 4.2: and receiving the flight information such as the longitude and latitude, the height, the speed and the course, the model and the like of the flight in the current sector from the control automation system. And normalizing and splicing the data into a feature vector which is used as the input of the flight selection intelligent agent. And the flight selection agent selects flights to be regulated according to the current situation, the flights are spliced with the flight information characteristic vectors to be used as the input of the action selection agent, and the action selection agent selects the maneuver action to be executed by the current flight to generate a corresponding control instruction.
Step 4.3: and synthesizing the suggested control instruction into control voice, pushing the control voice into a control automation system through message middleware, and displaying the control voice on a control interface. And monitoring the execution condition of the pilot in real time through the return situation of the control automation system, retransmitting the command when the execution command is inconsistent or is not forceful, and canceling the command of the flight when the command flight reaches the transfer point. The method comprises the steps of obtaining data information in a control automation system once every four seconds, calculating current flights to be commanded and action instructions to be sent specifically after receiving the information every time, evaluating the state after executing the instructions, marking the flights possibly with conflicts on a situation map when the potential conflict flights possibly exist in a sector, setting the potential conflict flights as the current command flights, and eliminating the potential conflicts in time.
Examples
The overall process of the present invention is shown in FIG. 1. The invention provides an aircraft interval management decision method based on deep reinforcement learning, which comprises the following steps:
1. defining action and state space of a deep reinforcement learning environment for aircraft flight command;
the invention uses two deep reinforcement learning agents, which respectively select agents for flights and agents for actions. The state space of the flight selection agent is position information (longitude, latitude and height), course information and machine type information of all controllable flights in the current sector, and the action space is that less than or equal to two flights are selected from all controllable flights in the current sector for maneuvering; the state space of the action selection agent selects the standby flight selected by the agent for the flight and the position information (longitude, latitude and altitude), the course information, the model information and the distance from the flight for the three flights closest to the flight, and the action space is the maneuvering action at the next moment of the current flight, such as descending and ascending of the altitude, acceleration and deceleration of the speed and the like.
2. Constructing an aircraft interval fine decision deep reinforcement learning network
The aircraft flight command depth reinforcement learning intelligent agent comprises a flight selection intelligent agent and an action selection intelligent agent. The flight selection agent consists of a Target Q network and an Eval Q network, wherein the Eval Q network is used for receiving the environmental state at the current moment, outputting the action values Q of all selectable actions at the current moment and selecting the action with the highest action value to execute; the Target Q network is used for training the Eval Q network and judging the output of the Eval Q network. The action selection intelligent body is composed of a Target Q network, an Eval Q network and two LSTM networks, wherein the Eval Q network is used for receiving information of a standby flight selected by the flight selection intelligent body at the current moment and positions, speeds, heights and the like of three flights nearest to the flight, outputting action values Q of all selectable actions of the flight at the current moment, selecting the action with the highest action value to execute, the Target Q network is used for training the Eval Q network and judging the output of the Eval Q network, and the LSTM networks are used at the rearmost ends of the Target Q network and the Eval Q network and used for processing time sequence data to judge the future movement trend of the Target flight. The algorithm structure is shown in fig. 2.
3. Deep reinforcement learning network for training interval fine decision of aircraft
The training strategy flows of the flight selection agent and the action selection agent are basically the same, so that the flight selection agent is taken as an example to explain the training flow of aircraft flight command and interval deployment deep reinforcement learning.
Step 3.1: initializing parameters of the deep reinforcement learning algorithm, including the total round number E of algorithm iteration and the characteristic dimension n of the state spacesDimension n of motion spaceaUpdating the step length alpha of the parameters each time, the attenuation factor gamma of the action value function, the action exploration rate epsilon, the weight parameters of the Eval Q network, initializing the weight parameters of the Target Q network to be the same as the Eval Q network, the soft updating step length tau of the Target Q network, the number m of samples trained in batches, the size d of an experience playback pool, and the playback data volume d of the starting trainingstartThe Target Q network updates the round number c. The simulation environment state is initialized.
Step 3.2: and acquiring information such as longitude, latitude, altitude, speed, course, model and the like from the ADS-B data. The longitude, latitude, altitude, speed and heading information are normalized according to the possible maximum value and minimum value, and the data are scaled to [0,1 ]]In the interval, the model information is coded by one-hot, and the normalized features are spliced with one-hot feature vectors to form feature vectors of the environment at the current moment
Figure BDA0003384363000000111
Step 3.3: and inputting the characteristic vector of the current moment of the environment into an Eval Q network of the flight selection intelligent agent, and calculating the action value of each currently selected flight through forward propagation. Simultaneous generation of [0,1]Random number n within a spanrandom. If n israndom<E, then randomly select one action in the action space, i.e.Randomly selecting one or two flights as target flights; otherwise, selecting the flight in the sector with the maximum action value as the target flight and storing as
Figure BDA0003384363000000112
Feature vector of environment
Figure BDA0003384363000000113
And
Figure BDA0003384363000000114
stitching to form a new eigenvector
Figure BDA0003384363000000115
The value of each maneuver is calculated by forward propagation in the EvalQ network of the input action selection agent. Simultaneous generation of [0,1]Random number n within a spanrandom. If n israndom<Selecting a mechanical action in the action space at random; otherwise, selecting the action with the maximum action value as the current maneuver action and storing the maneuver action as
Figure BDA0003384363000000116
Step 3.4: will be provided with
Figure BDA0003384363000000117
The method is applied to a simulation environment, and new longitude and latitude, altitude, course, speed and other data are obtained from ADS-B data after the flight executes actions to form
Figure BDA0003384363000000118
Feature vectors, regenerated according to step 3.2
Figure BDA0003384363000000119
A feature vector;
step 3.5: prize rtThe calculation method of (2) uses a reward shaping method, namely, a smaller reward value is returned for each non-key action to solve the problem of sparse reward distribution so as to accelerate the training speed.Reward function r for selecting agents on flightsfcaAnd action selection agent's inbound flight altitude descent reward function rfaaFor example, the calculation of the reward is as follows:
rfca=1000/disflights+10000/disairport (1)
Figure BDA0003384363000000121
step 3.6: judging whether the current training needs to be finished or not, and using x for flight ii,yi,hiIndicating the current longitude, latitude, altitude,
Figure BDA0003384363000000122
representing the longitude, latitude, altitude of the destination point, if all flights arrive at the destination point, then:
(1) All flights arrive at the target point, namely, the incoming flights successfully land at the airport, and the outgoing flights arrive at the corridor intersection. For flight i, use xi,yi,hiIndicating the current longitude, latitude, altitude,
Figure BDA0003384363000000123
representing the longitude, latitude, and altitude of the target point, the condition can be expressed as:
Figure BDA0003384363000000124
(2) A flight that exceeds a sector boundary (latitude and longitude boundary or altitude boundary) may be represented as:
Figure BDA0003384363000000125
(3) The sector handover condition is not satisfied, which can be expressed as:
Figure BDA0003384363000000126
(4) The conflict occurs between two flights, and the situation can be expressed as follows:
Figure BDA0003384363000000127
step 3.7: will be provided with
Figure BDA0003384363000000128
And
Figure BDA0003384363000000129
storing the data into respective experience playback pools, when the data amount in the experience playback pools is less than dstartAnd (5) repeating the step 3.2 to the step 3.8.
Step 3.7.1: sampling m samples from an empirical playback set
Figure BDA00033843630000001210
Step 3.7.2: calculating a target Q value y for each samplejIf the simulation is finished, the Target Q value is the reward returned by the simulation environment at the end, otherwise, the Target Q value is the reward of the simulation environment plus the estimation of the attenuated Target Q network on the action of the EvalQ network:
Figure BDA00033843630000001211
step 3.7.3: calculating the mean square error loss (loss) of all samples, and updating the weight parameters of the Eval Q network through inverse gradient propagation, wherein the calculation formula of the loss is as follows:
Figure BDA0003384363000000131
step 3.8: and if the number of the current training rounds is integral multiple of c, updating the weight parameter of the Target Q network by adopting a soft updating strategy, representing the weight parameter of the Target Q network by using w', and representing the weight parameter of the Eval Q network by using w.
The formula of the soft update strategy is as follows
w′=τw+(1-τ)w′ (9)
4. Aircraft interval fine management is realized through an aircraft interval fine decision deep reinforcement learning network
Step 4.1: after the model training is finished, the model training system establishes network connection with the control automation system, and interacts with the control automation system in a message middleware communication mode. Firstly, extracting the structure information of a current control sector, a standard approach and departure route, a runway operation mode, airspace restriction information and the like, selecting a corresponding trained model according to the information, and selecting a model trained under the simulation environment of the east-oriented operation of the near 01 sector of Nanjing when the near 01 sector of Nanjing is controlled, and reloading the trained model parameters.
And 4.2: and receiving the flight information such as the longitude and latitude, the height, the speed and the course, the model and the like of the flight in the current sector from the control automation system. And normalizing and splicing the flight parameters into a feature vector to be used as the input of the flight selection agent. The flight selection agent selects flights to be regulated according to the current situation, the flights are spliced with flight information characteristic vectors to be used as input of the action selection agent, the action selection agent selects the maneuver actions which the current flight should execute, such as an altitude ascending and descending instruction, a flight acceleration and deceleration instruction and the like, and generates corresponding control instructions, such as 'east 5254 descends to 2700 for maintenance'.
Step 4.3: and synthesizing the suggested control instruction into control voice, pushing the control voice into a control automation system through message middleware, and displaying the control voice on a control interface. And monitoring the execution condition of the pilot in real time through the return situation of the control automation system, retransmitting the instruction when the execution instruction is inconsistent or is not forceful, and cancelling the command of the flight after the command flight reaches the transfer point. The method comprises the steps of obtaining data information in a control automation system once every four seconds, calculating current flights to be commanded and action instructions to be sent specifically after receiving the information every time, evaluating the state after executing the instructions, marking the flights possibly with conflicts on a situation map when the potential conflict flights possibly exist in a sector, setting the potential conflict flights as the current command flights, and eliminating the potential conflicts in time. The method is verified in the Nanjing approach AP01 sector by generating historical flight flow in a simulation system, so that the approach and departure command and control of multiple flights are realized, the structure of the Nanjing approach AP01 sector is shown in figure 3, and the command height profile is shown in figure 4.
The present invention provides a method for determining aircraft interval management based on deep reinforcement learning, and a variety of methods and approaches for implementing the technical solution are provided, and the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a plurality of improvements and modifications may be made without departing from the principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. All the components not specified in this embodiment can be implemented by the prior art.

Claims (6)

1. An aircraft interval management decision method based on deep reinforcement learning is characterized by comprising the following steps:
step 1: defining an action space and a state space of a deep reinforcement learning environment for aircraft flight command;
step 2: constructing an interval fine decision deep reinforcement learning network of the aircraft;
and step 3: training an interval fine decision deep reinforcement learning network of the aircraft;
and 4, step 4: the fine management of the interval of the aircraft is realized through an interval fine decision deep reinforcement learning network of the aircraft;
the step 1 comprises the following steps:
the method comprises the steps that two deep reinforcement learning intelligent bodies are used for selecting an intelligent body and an action selecting intelligent body for a flight respectively, wherein the state space of the flight selecting intelligent body is position information, course information and model information of all controllable flights in a current sector, and the action space is that less than or equal to two flights are selected from all controllable flights in the current sector for maneuvering; the state space of the action selection agent selects the standby movable flight selected by the agent for the flight, the position information, the course information, the machine type information of the three flights closest to the standby movable flight and the distance from the standby movable flight, and the action space is the maneuvering action of the current flight at the next moment;
the step 2 comprises the following steps:
the aircraft interval fine decision depth reinforcement learning network comprises a flight selection intelligent agent and an action selection intelligent agent, wherein the flight selection intelligent agent comprises a Target value calculation network Target Q and an action value calculation network Eval Q; the Target Q network is used for training the Eval Q network and judging the output of the Eval Q network;
the action selection intelligent body comprises a Target value calculation network Target Q, an action value calculation network Eval Q and two long-short term memory neural networks LSTM, wherein the Eval Q network is used for receiving the positions, the speeds and the heights of the standby movable flight and three flights nearest to the standby movable flight selected by the flight selection intelligent body at the current moment, outputting the action values Q of all selectable actions of the standby movable flight at the current moment, selecting the action with the highest action value to execute, the Target Q network is used for training the Eval Q network and judging the output of the Eval Q network, and the LSTM network is used at the last ends of the Target Q network and the Eval Q network and is used for processing time sequence data generated by flight after the flight enters a sector to judge the future movement trend of the Target flight;
the step 3 comprises the following steps:
step 3.1: initializing parameters of the deep reinforcement learning algorithm, including the total round number E of algorithm iteration and the characteristic dimension n of the state spacesDimension n of motion spaceaThe step length alpha of each updating parameter, the attenuation factor gamma of the action value function, the action exploration rate belongs to, the weight parameters of the flight selection intelligent body and the action selection intelligent body Eval Q network, the weight parameter of the initialized Target Q network is the same as that of the Eval Q network, the soft updating step length tau of the initialized Target Q network, the number m of batch training samples, the size d of the experience playback pool and the playback data volume d of starting trainingstartAnd Target Q networkUpdating the number of rounds c; initializing a simulation environment state;
step 3.2: receiving real-time broadcast type automatic correlation monitoring radar data, screening all controllable flights in the current time period, and acquiring longitude, latitude, height, speed, course and model information from ADS-B data; the longitude, latitude, altitude, speed and course information are normalized, and the data is scaled to [0, 1%]Obtaining normalized features in the interval, coding the model information one-hot to obtain one-hot feature vector, and splicing the normalized features and the one-hot feature vector to form the feature vector of the environment at the current moment
Figure FDA0003834062470000021
Step 3.3: feature vector of current time of using environment
Figure FDA0003834062470000022
Obtaining flight selection agent actions as input of Eval Q network of flight selection agent
Figure FDA0003834062470000023
Feature vector of environment at current moment
Figure FDA0003834062470000024
And
Figure FDA0003834062470000025
stitching to form a new eigenvector
Figure FDA0003834062470000026
And inputting the motion selection intelligent agent into the Eval Q network to obtain the motion of the motion selection intelligent agent
Figure FDA0003834062470000027
Step 3.4: will be provided with
Figure FDA0003834062470000028
Executing in a simulation environment, obtaining new longitude and latitude, altitude, course and speed information from ADS-B data after waiting for flight to execute actions, and forming a feature vector of the next moment
Figure FDA0003834062470000029
Step 3.5: calculating a reward function r for an action selection agentfcaAnd the inbound flight altitude descent reward function r of the action selection agentfaaJudging whether the current training needs to be ended or not to obtain an end identifier is _ endtWill be
Figure FDA00038340624700000210
And
Figure FDA00038340624700000211
storing the experience data into respective experience playback pools;
step 3.6: if the current simulation is not finished, repeating the steps from 3.2 to 3.8, and if the current simulation is finished, starting the next round of simulation;
step 3.7: when the data amount in the experience playback pool is larger than or equal to dstartWhen the training is started, the training process is started;
step 3.8: and if the number of the current training rounds is integral multiple of c, updating the weight parameters of the Target Q network by adopting a soft updating strategy.
2. The method according to claim 1, characterized in that step 3.2 comprises: will be provided with
Figure FDA00038340624700000212
As the input of the Eval Q network of the flight selection agent, outputting an action value set Q corresponding to all actionst
3. The method according to claim 2, characterized in that step 3.3 comprises: feature vector of environment at current moment
Figure FDA0003834062470000031
In the Eval Q network of the input flight selection agent, the action value set Q of the current selected flight is calculated through forward propagationtWhile generating [0,1 ]]Random number n within a spanrandomIf n israndom<E is an action exploration rate, the value of the e is any value between 0 and 1, the e is multiplied by a threshold value along with the end of each round of training, and when the e is<When the content is 0.1, setting the epsilon to 0, and randomly selecting an action in the action space, namely randomly selecting one flight or two flights as target flights; otherwise, selecting the flight in the sector with the maximum action value as the target flight and storing the flight as
Figure FDA0003834062470000032
Feature vector of environment at current moment
Figure FDA0003834062470000033
And
Figure FDA0003834062470000034
stitching to form a new eigenvector
Figure FDA0003834062470000035
In an Eval Q network with input action selection agents, the value of each maneuver is calculated through forward propagation, and 0,1 is generated]Random number n within a spanrandomIf n israndom<If epsilon, selecting a motor action in the action space at random; otherwise, selecting the action with the maximum action value as the current maneuver and storing the maneuver as
Figure FDA0003834062470000036
4. A method according to claim 3, characterised in that step 3.5 comprises: calculating the reward function r of the flight selection agent by adopting the following formulafcaAnd action selection agent's inbound flight altitude descent reward function rfaa
rfca=1000/disflights+10000/disairport (1)
Figure FDA0003834062470000037
5. The method of claim 4, wherein step 3.5 comprises: judging whether the current training needs to be finished or not, and for the flight i, using xi,yi,hiRespectively representing the current longitude, latitude, altitude of the flight,
Figure FDA0003834062470000038
longitude, latitude, and altitude of the target point are respectively represented:
(1) All flights arrive at the destination point, that is, all inbound flights successfully descend to the airport, and all outbound flights arrive at the corridor exit, the condition is expressed as:
Figure FDA0003834062470000039
(2) If a flight exceeds a sector boundary, where src represents a sector longitude and latitude boundary, hmin,hmaxRepresenting the upper and lower bounds of the sector height, respectively, the situation is expressed as:
Figure FDA00038340624700000310
(3) The sector handover condition is not satisfied, which is represented as:
Figure FDA0003834062470000041
(4) Two flights have collided, where δ represents a distance safety threshold, and this condition is expressed as:
Figure FDA0003834062470000042
if one of the above four cases occurs, the end identifier is _ end is usedtThe variable is stored as True value True, otherwise it is stored as False value False.
6. The method of claim 5, wherein step 4 comprises:
step 4.1: establishing network connection with a control automation system, interacting with the control automation system in a message middleware communication mode, extracting current control sector structure information, selecting a corresponding trained model according to the current control sector structure information, and reloading trained model parameters;
and 4.2: receiving the longitude and latitude, the height, the speed course and the model information of the flight in the current sector from a control automation system, normalizing and splicing the longitude and latitude, the height, the speed course and the model information of the flight in the current sector into a flight characteristic vector which is used as the input of a flight selection intelligent agent; the flight selection agent selects flights to be regulated according to the current situation, the flights are spliced with flight information characteristic vectors to serve as input of the action selection agent, the action selection agent selects the maneuver action to be executed by the current flight, and a corresponding control instruction is generated;
step 4.3: synthesizing a control instruction into a control voice, pushing the control voice into a control automation system through a message middleware, and displaying the control voice on a control interface; and monitoring the execution condition of the pilot in real time through the return situation of the control automation system, retransmitting the instruction when the execution instruction is inconsistent or is not executed, and cancelling the command of the flight after commanding the flight to the transit point.
CN202111443511.7A 2021-11-30 2021-11-30 Aircraft interval management decision method based on deep reinforcement learning Active CN114141062B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111443511.7A CN114141062B (en) 2021-11-30 2021-11-30 Aircraft interval management decision method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111443511.7A CN114141062B (en) 2021-11-30 2021-11-30 Aircraft interval management decision method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN114141062A CN114141062A (en) 2022-03-04
CN114141062B true CN114141062B (en) 2022-11-01

Family

ID=80389977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111443511.7A Active CN114141062B (en) 2021-11-30 2021-11-30 Aircraft interval management decision method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114141062B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114664120B (en) * 2022-03-15 2023-03-24 南京航空航天大学 ADS-B-based aircraft autonomous interval control method
CN114819760B (en) * 2022-06-27 2022-09-30 中国电子科技集团公司第二十八研究所 Airport flight area surface risk intelligent decision-making system based on reinforcement learning
CN115240475B (en) * 2022-09-23 2022-12-13 四川大学 Aircraft approach planning method and device fusing flight data and radar image
CN115660446B (en) * 2022-12-13 2023-07-18 中国民用航空飞行学院 Intelligent generation method, device and system for air traffic control plan

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408975A (en) * 2014-10-28 2015-03-11 北京航空航天大学 Aircraft conflict extrication method and apparatus
CN106373435A (en) * 2016-10-14 2017-02-01 中国民用航空飞行学院 Non-centralized safety interval autonomous keeping system for pilot
CN109064019A (en) * 2018-08-01 2018-12-21 中国民航大学 A kind of system and method tested and assessed automatically for controller's simulated training effect
CN109118111A (en) * 2018-08-29 2019-01-01 南京航空航天大学 Trail interval limitation and the time slot allocation comprehensive strategic management decision support system that takes off
CN110084414A (en) * 2019-04-18 2019-08-02 成都蓉奥科技有限公司 A kind of blank pipe anti-collision method based on the study of K secondary control deeply
CN111047917A (en) * 2019-12-18 2020-04-21 四川大学 Flight landing scheduling method based on improved DQN algorithm
CN111882047A (en) * 2020-09-28 2020-11-03 四川大学 Rapid empty pipe anti-collision method based on reinforcement learning and linear programming
CN112396871A (en) * 2020-10-21 2021-02-23 南京莱斯信息技术股份有限公司 Approach delay allocation and absorption method based on track prediction
CN112818599A (en) * 2021-01-29 2021-05-18 四川大学 Air control method based on reinforcement learning and four-dimensional track

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8606491B2 (en) * 2011-02-22 2013-12-10 General Electric Company Methods and systems for managing air traffic
US9536435B1 (en) * 2015-07-13 2017-01-03 Double Black Aviation Technology L.L.C. System and method for optimizing an aircraft trajectory
US10810892B2 (en) * 2017-02-01 2020-10-20 Honeywell International Inc. Air traffic control flight management
EP3422130B8 (en) * 2017-06-29 2023-03-22 The Boeing Company Method and system for autonomously operating an aircraft
GB2569789A (en) * 2017-12-21 2019-07-03 Av8Or Ip Ltd Autonomous unmanned aerial vehicle and method of control thereof
US11238744B2 (en) * 2019-06-27 2022-02-01 Ge Aviation Systems Llc Method and system for controlling interval management of an aircraft
CN113611158A (en) * 2021-06-30 2021-11-05 四川大学 Aircraft trajectory prediction and altitude deployment method based on airspace situation
CN113593308A (en) * 2021-06-30 2021-11-02 四川大学 Intelligent approach method for civil aircraft

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408975A (en) * 2014-10-28 2015-03-11 北京航空航天大学 Aircraft conflict extrication method and apparatus
CN106373435A (en) * 2016-10-14 2017-02-01 中国民用航空飞行学院 Non-centralized safety interval autonomous keeping system for pilot
CN109064019A (en) * 2018-08-01 2018-12-21 中国民航大学 A kind of system and method tested and assessed automatically for controller's simulated training effect
CN109118111A (en) * 2018-08-29 2019-01-01 南京航空航天大学 Trail interval limitation and the time slot allocation comprehensive strategic management decision support system that takes off
CN110084414A (en) * 2019-04-18 2019-08-02 成都蓉奥科技有限公司 A kind of blank pipe anti-collision method based on the study of K secondary control deeply
CN111047917A (en) * 2019-12-18 2020-04-21 四川大学 Flight landing scheduling method based on improved DQN algorithm
CN111882047A (en) * 2020-09-28 2020-11-03 四川大学 Rapid empty pipe anti-collision method based on reinforcement learning and linear programming
CN112396871A (en) * 2020-10-21 2021-02-23 南京莱斯信息技术股份有限公司 Approach delay allocation and absorption method based on track prediction
CN112818599A (en) * 2021-01-29 2021-05-18 四川大学 Air control method based on reinforcement learning and four-dimensional track

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
单亲遗传算法用于多跑道航班着陆调度;文优梅;《软件导刊》;20101030(第10期);全文 *
基于深度网络的空战态势特征提取;李高垒等;《系统仿真学报》;20171208;全文 *

Also Published As

Publication number Publication date
CN114141062A (en) 2022-03-04

Similar Documents

Publication Publication Date Title
CN114141062B (en) Aircraft interval management decision method based on deep reinforcement learning
CN111786713B (en) Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN106157700B (en) Air traffic control method based on the operation of 4D flight paths
Brittain et al. Autonomous aircraft sequencing and separation with hierarchical deep reinforcement learning
CN104504938B (en) The method of control of air traffic control system
US8798813B2 (en) Providing a description of aircraft intent
EP2801963A1 (en) Providing a description of aircraft intent
US8977411B2 (en) Providing a description of aircraft intent
EP2667275B1 (en) Method for providing a description of aircraft intent using a decomposition of flight intent into flight segments with optimal parameters
CN110059863B (en) Aircraft four-dimensional track optimization method based on required arrival time
JP2013173522A (en) Method for flying aircraft along flight path
Robinson, III et al. A fuzzy reasoning-based sequencing of arrival aircraft in the terminal area
CN113867354B (en) Regional traffic flow guiding method for intelligent cooperation of automatic driving multiple vehicles
EP3598261B1 (en) Method and system for determining a descent profile
JP2020077387A (en) Optimization of vertical flight path
CN114373337B (en) Flight conflict autonomous releasing method under flight path uncertainty condition
Dhief et al. Speed control strategies for e-aman using holding detection-delay prediction model
Deniz et al. A Multi-Agent Reinforcement Learning Approach to Traffic Control at Merging Point of Urban Air Mobility
JP2851271B2 (en) Landing scheduling device
CN115660446A (en) Intelligent generation method, device and system for air traffic control plan
CN116415480B (en) IPSO-based off-road planning method for aircraft offshore platform
Brittain et al. Towards autonomous air trac control for sequencing and separation-a deep reinforcement learning approach
Bianco et al. Coordination of Traffic Flows in the TMA
Spirkovska Vertiport Dynamic Density
Soares et al. Departure management with a reinforcement learning approach: Respecting CFMU slots

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant