CN114548893A

CN114548893A - Multi-airport collaborative release method based on deep reinforcement learning

Info

Publication number: CN114548893A
Application number: CN202111623998.7A
Authority: CN
Inventors: 蔡开泉; 杨杨; 李梓琦; 李悦
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-05-27

Abstract

The invention provides a multi-airport collaborative release method based on deep reinforcement learning, which belongs to the field of air traffic management intellectualization, and comprises the following steps of firstly, aiming at m airports to be collaboratively released, and all flights of a local airline department in the m airports are sequenced from high to low according to priorities; generating an initial departure flight queue meeting the minimum total delay time based on airport fairness; then, establishing a multi-airport collaborative release model with the aim of minimizing the total delay cost for the initial departure flight queue based on the priorities of different flights; converting the multi-airport collaborative release model into a corresponding Markov decision process; and finally, a deep reinforcement learning algorithm A3C is selected to solve the Markov decision process to obtain a final multi-airport collaborative release departure queue, so that the total delay cost is reduced. The invention designs a novel multi-airport collaborative release method to reduce flight delay and improve the operation benefit of a multi-airport system.

Description

Multi-airport collaborative releasing method based on deep reinforcement learning

Technical Field

The invention belongs to the field of air traffic management intellectualization, and particularly relates to a multi-airport collaborative releasing method based on deep reinforcement learning.

Background

With the high-speed development of the air transportation industry and the continuous improvement of regional economy, the multi-airport collaborative development becomes a new trend of the future civil aviation development. The formation of a multi-airport system greatly improves the development prospect of the area, meanwhile, the multi-airport system also bears heavier and heavier transportation responsibility, and huge economic loss is brought due to delay caused by the reasons of dense airports, limited airspace resources, busy flow and the like, so that urgent solution is needed. The method is an effective method for reducing delay cost and improving social benefits by considering flight priorities to optimally adjust the departure flight queues of the multi-airport system.

The cooperative release has important significance on the cooperative development of multiple airports, and limited airspace resources in the region can be fully utilized; however, the current research on collaborative release has two problems:

(1) the multi-airport collaborative release adopts the first-come first-serve principle, so that the total delay time of the airport is minimized. However, due to factors such as the model, passenger capacity, passenger carrying rate, whether a guest is carried, whether an emergency special task exists and the like of each flight, the priorities of the flights are different, the total delay cost of the takeoff queue generated by the first-come first-serve principle is not necessarily the minimum, and the fairness of delay allocation among airports is to be perfected.

(2) When the departure flight queue in multi-airport collaborative release needs to be adjusted, the adjusted departure flight queue is only related to the state of the current departure flight queue and is unrelated to the state of the previous departure flight queue, and although the adjusted departure flight queue conforms to the Markov characteristic, the collaborative release is not converted into a corresponding Markov decision process in the existing research.

Meanwhile, in consideration of solving efficiency, most of the existing methods for solving the multi-airport collaborative release problem mainly comprise integer programming and heuristic algorithms, but the methods are high in calculation cost and need to be recalculated after constraint conditions are finely adjusted.

Disclosure of Invention

In order to solve the problems, the invention provides a multi-airport collaborative release method based on deep reinforcement learning, which can quickly generate an outbound flight queue with lower total delay cost, thereby providing auxiliary support for units such as air traffic management departments, airlines, airports and the like.

The multi-airport collaborative release method for deep reinforcement learning comprises the following specific steps:

step one, aiming at m airports to be released in a coordinated mode, all flights of a local airline department in the m airports are ranked from high to low according to priority levels by each airline company related to each airport;

the priorities are sorted from high to low according to the model, passenger capacity, passenger carrying rate, whether a guest is carried and whether an emergency special task exists, and are sequentially from 10 to 1, each airline company formulates a priority setting standard according to the actual situation of the airline company, and sets the priority of each flight;

the priority setting criteria are: the fitted curve of the number of flights of 10 different priorities should satisfy the probability density function of the power law distribution:

f(x)＝cx^-α-1,x→∞ (1)

in the formula: c and alpha are constants, when the total number of flights of the airline companies is different, the corresponding c and alpha are different, but the fitted curves of the flight priority distribution of the airline company are in a long tail function state, the proportion of flights with different priorities of each airline company is basically the same, and the fairness of the airline companies is reflected.

Generating an initial departure flight queue meeting the minimum total delay time based on airport fairness;

the initial departure flight queue is: and allocating delay time to each airport according to the proportion of the number of flights, queuing the scheduled takeoff time of each flight plus the allocated delay according to time, and forming an initial departure flight queue according to a first-come-first-serve principle.

Firstly, generating an objective function which is based on airport fairness and meets the minimum total delay time;

the objective function is:

in the formula: v ═ V₁,v₂,...v_mThe set of all airports;

for airports v_mThe set of all flights of (a) is,

n is the number of flights; the set of all flights in m airports is

For flight f_nAn available departure time slot set;

for flight f_nThe delay time of (d); x is a radical of a fluorine atom_fniFor flight f_nWhether to assign to a decision variable in the departure time slot i.

Representing the average delay time of flights for a single airport v;

the latter term represents the average delay time for all flights in m airports;

the variance between the average delay time of the flight of a single airport and the average delay time of all the flights in multiple airports is minimum, so that the airport fairness is embodied;

then, an integer programming algorithm is used for solving the objective function, and an initial departure flight queue with baseline delay and meeting airport fairness is obtained.

Establishing a multi-airport collaborative release model with the aim of minimizing the total delay cost for the initial departure flight queue based on the priorities of different flights;

the formula of the multi-airport collaborative release model is as follows:

in the formula: c. C_f ^hA decision variable for whether to suspend a flight, E_hAllocating a maximum value of delay for flights in the hotspot h of the airport; the airport hotspot h is the hotspot formed by the situation that when a flight takes off, an airspace near a departure airport is blocked, and a departure transfer point of the airport becomes a hotspot; b is_f ^hThe time for flight f to enter airport hotspot h; s_f ^hPlanning the time for the flight f to enter the airport hotspot h; d_fThe waiting cost for flight f on the ground; p is a radical of_f ^hDecision variables for whether to protect flights, B_f ^h-S_f ^hA baseline delay for flight f; v_fValue created for airline to protect flight f; k is a radical of_f ^hA decision variable for whether to hold a flight; m is a group of_fFine for delayed flight f; o is_hThe OI value of the airport hotspot h; OI is the operation index of the airport handover point; OI 100 × D/C; d is the number of flights of the transition point when congestion occurs; c is the capacity of the handover point; c_hThe transition factors of the time-space domain resources of different hot zones at the transition point are shown, and the hot zones are time intervals in which hot spots exist.

Number of flights that can be protected at airport hotspot h;

the OC value is the operability index of each flight;

the objective function is the sum of four items, namely the ground waiting cost generated by suspending the flight, the ground waiting cost for protecting the flight reduction, the additional reward for protecting the flight, and the delay fine of suspending and keeping the flight.

The constraint conditions are as follows in sequence: c1 shows that in the baseline delay, the OC value of each flight is 100, the flight with low priority is paused to release the OC value, the OC value is reduced from 100 to 0, the flight with high priority is protected to promote the OC value of the flight to the OI value of the hotspot, the OC value of the baseline delay flight is kept unchanged and is still 100, and the sum of the OC values of all flights at the post-adjustment transit point cannot be larger than the sum of the OC values of all flights at the pre-adjustment transit point;

c2 shows that multiple hot spots may occur in one day at the same handoff point, when the previous hot spot occurs, the space domain resources saved by suspending a flight with low priority are switched to the current hot spot for continuous use, but the OC value saved by suspending a flight in the previous hot spot is influenced by time and is less than 100, therefore, the OC value saved by suspending a flight in the previous hot spot needs to be multiplied by C which is less than 1_h；

C3 indicates that the number of actual protection flights in the hotspot should be less than or equal to the number of flights in the hotspot within the protection threshold;

c4 indicates that a flight is in a unique state within the hotspot, i.e., protected, suspended, or held baseline delayed;

c5 indicates that the decision variables set are decision variables of the 0-1 integer type, all subject to binary constraints.

Step four, converting the multi-airport collaborative release model into a corresponding Markov decision process;

the problems to be solved by the Markov decision process are as follows: the multi-airport system provides N departure time slots for N departure flights according to a first-come-first-serve principle, and the N departure flights in the N departure time slots are taken out and reinserted into the departure time slots to generate a departure queue with lower total delay cost.

The specific elements of the Markov decision process comprise: state (state), action (action), reward value (reward), and policy (policy); respectively set up as follows:

(1) state s_t: and setting N observations for the N departure time slots, and observing whether a flight is inserted into each time slot, wherein the N observations are used for observing the current state of the environment, and the initial state of the environment is that no flight is inserted into the N departure time slots.

The status that may occur when N flights insert N departure slots is:

this S_NThe seed states constitute a state space S for the multi-airport collaborative release markov decision process.

(2) Action a_t: each flight can select N different actions to be correspondingly inserted into the Nth departure time slot, the flight is protected when the time of the newly selected departure time slot is earlier than the time of the original departure time slot, the flight is suspended when the time of the newly selected departure time slot is later than the time of the original departure time slot, and the delay of keeping the baseline of the flight is indicated when the time of the newly selected departure time slot is equal to the time of the original departure time slot. N actions are selected for insertion of N different departure slots.

These N actions constitute the action space a of the multi-airport collaborative release markov decision process.

(3) Reward value r_t: the reward value is set as the inverse number of the objective function of the multi-airport collaborative release model based on the total delay cost, namely the reward value function is as follows:

the sum of the reward values after the multiple actions are selected is called as the reward R; after inserting N flights into N different departure time slots, the opposite number of return values of the multi-airport collaborative release Markov decision process is the total delay cost of the multi-airport system, namely the total delay cost is as follows:

(4) strategy pi: the policy selects an action according to the maximum reward value; the agent selection action to enter the next state is determined according to its policy and the current state:

a_t＝π(s_t-1|ε) (7)

wherein epsilon is the exploration rate;

and step five, selecting a deep reinforcement learning algorithm A3C to solve the Markov decision process to obtain a final multi-airport collaborative release departure queue, so that the total delay cost is reduced.

The invention has the advantages and positive effects that:

(1) a multi-airport collaborative release method based on deep reinforcement learning gives a user of limited airspace resources-an airline company certain flexibility, and flight priority is analyzed from the perspective of the airline company.

(2) A multi-airport collaborative release method based on deep reinforcement learning is characterized in that total delay cost is used as an objective function of a multi-airport release model under the drive of airport fairness, an outbound flight queue with lower total delay cost is found on the premise that the minimum total delay time obtained by the current first-come first-serve principle is guaranteed not to be destroyed, and the operation benefit of a multi-airport system is improved.

(3) In the multi-airport collaborative release problem, the departure flight queue is adjusted, only a few flights are affected by each adjustment, and the departure flight queue after each adjustment is only related to the state of the current departure flight queue and is unrelated to the state of the previous departure flight queue, so that the Markov characteristic is met. In consideration of the capability of the deep reinforcement learning algorithm for effectively solving the problems and the rapid solving capability of the problems, a multi-airport collaborative release solving method is provided, the optimized and adjusted departure flight queue is rapidly obtained, and meanwhile the regulation and control pressure of an air traffic control department is effectively relieved.

Drawings

FIG. 1 is a flow chart of a multi-airport collaborative releasing method based on deep reinforcement learning according to the present invention;

FIG. 2 is a schematic diagram of an embodiment of the present invention for optimizing and adjusting the departure flight queue.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be described in further detail below with reference to the accompanying drawings in the embodiments.

Aiming at the application requirement of the optimization configuration of limited airspace resources, the invention provides a multi-airport collaborative release method based on deep reinforcement learning; firstly, analyzing key factors influencing flight priority from the perspective of airspace users (airlines), and setting the flight priority on the premise of ensuring that a fitting curve of flight priority distribution of each airline conforms to power law distribution, namely showing the fairness of the airlines. Then, an initial departure flight queue satisfying the minimum total delay time is generated based on airport fairness. Then, aiming at the characteristic that flights with different priorities cause different delay costs, a multi-airport collaborative release model which takes the minimum total delay cost as an objective function is established, and on the basis, the environment of the multi-airport collaborative release Markov decision process is established, a deep reinforcement learning algorithm which can fully utilize computing resources and has higher solving efficiency is designed, and the model is trained in the multi-airport collaborative release environment. Finally, the trained model is used for quickly obtaining the optimized and adjusted departure flight queue with the minimum total delay time and lower total delay cost. The invention provides support for relevant units such as air traffic management departments, airlines, airports and the like, and designs a novel multi-airport cooperative release method to reduce flight delay and improve the operation benefit of a multi-airport system.

The multi-airport collaborative release method for deep reinforcement learning is shown in fig. 1 and comprises the following specific steps:

step one, aiming at m airports to be released in a coordinated mode, all flights of a local airline department in the m airports are ranked from high to low according to priorities by each airline company related to each airport;

the airline company is used as a main body of the use airspace, flight priorities are analyzed from the perspective of an airspace user, namely the airline company, the departure queue of the airline company is optimized and adjusted in a more cost-effective mode, delay cost can be effectively reduced, and social benefits are improved.

Considering from the perspective of an airline company, the priority is sequentially from 10 to 1 according to the type of the flight, the passenger capacity, the passenger carrying rate, whether a guest is carried and whether an emergency special task is carried, and the priority setting standard is set by each airline company according to the actual situation of each airline company and the priority of each flight is set;

for example, for 8 airlines involved in 3 airports, each airline prioritizes all flights located under the own airline name of the 3 airports.

In the actual scene of multi-airport collaborative release, special flights with visitors, flights endowed with emergency medical tasks and the like are not large in number of special flights with higher priority, most flights are in a common level with lower priority, so that when the flight priority is analyzed, a scheme of average distribution cannot be adopted, a power law distribution scheme conforming to the actual situation should be adopted, and fitting curves of 10 flight numbers with different priorities should meet the probability density function of power law distribution:

f(x)＝cx^-α-1,x→∞ (1)

multiple airports in one area have a shared terminal control center, and have extremely strong coherence on air routes and handover points, so that competition relationship exists on limited airspace resources. Limited airspace resources in a region need to be reasonably distributed, and the cooperative operation is fully considered to realize the optimal configuration. The multi-airport collaborative release requires accurate information sharing of each sub-airport, and the air management department reasonably and comprehensively plans the airspace resources to realize optimization of a decision-making target.

The cooperative release requires active participation of each party, and the original operation efficiency of each airport must be influenced by the participation of the cooperative release, so that fairness among the airports must be considered and guaranteed in order to guarantee benefits of the airports and guarantee smooth proceeding of the cooperative release.

The initial departure flight queue is: the delay time generated by a multi-airport system (namely the transition point of the airport becomes a hot point) is distributed to each airport according to the proportion of the number of flights, each flight adds the own planned takeoff time with the distributed delay, queues up according to time, and forms an initial departure flight queue according to the first-come-first-serve principle.

In order to guarantee airport fairness, it needs to satisfy:

in the formula: v ═ V₁,v₂,...v_mThe method comprises the steps of (1) collecting all airports, wherein m is the number of the airports;

for airports v_mThe set of all flights of (a) is,

n is the number of flights; the set of all flights in m airports is

For flight f_nAn available departure time slot set;

for flight f_nThe delay time of (d); x is the number of_fniFor flight f_nWhether or not to allocateDecision variables in the outgoing time slot i.

Representing the average delay time of flights for a single airport v;

then, the objective function is a classical quadratic integer programming problem, and the objective function is solved by using an integer programming algorithm to obtain an initial departure flight queue with baseline delay meeting airport fairness.

when the demand and capacity of the transfer points are unbalanced, hot areas occur at the transfer points, and the transfer points in the hot areas are generally called hot points. Due to the hot spots, the control difficulty of the air traffic control department is improved, the delay rate of each airport is increased, and more delay time and delay cost are brought to the users in the airspace. At this time, the order of flights of different priorities passing through the hot spot also has a great influence on the delay cost, and an unreasonable release order causes a larger delay cost.

The established multi-airport collaborative release model is used for carrying out optimization adjustment on an initial departure flight queue formed according to a first-come first-serve principle, and the flight after optimization adjustment is divided into three states of protection, suspension and holding; suspending flights with relatively low priority saves airspace resources and protecting flights with relatively high priority, and the method is called selective protection.

In the pre-tactical phase, the tactical phase and the real-time phase, the selective protection has different adjustment rules, and the specific adjustment method is shown in fig. 2. The blocks with numbers represent departure flights in a multi-airport system, the depth of each block represents an airline company to which the flight belongs, the shape of each block represents an airport from which the flight takes off, small dots with different depths on each block represent terminal area transition points through which the flight passes after taking off, and the total number of the terminal area transition points is 12 flights, 3 different airlines, 3 different airports and 4 different terminal area transition points. When the handoff point becomes a hot spot, the multi-airport system initially queues the departure flights creating a baseline delay.

Adjusting the pre-tactical stage: in order to guarantee the business requirements of the airspace users in hot spots, the airspace users are allowed to be selectively protected under the condition that the airspace users are not limited by a flight departure airport and a terminal area transit point where a flight passes through in the pre-tactical stage. Initial departure flight queue Q with baseline deferral₀As shown in the first row of the diagram, flight 10 has a higher priority and flight 3 has a lower priority, so the airline chooses to suspend flight 3, protects flight 10, suspends flight 3 to the last departure time slot in the current hot zone, and then protects flight 10 to the least delayed departure time slot.

After the adjustment, the delays for

flights

4, 5, 6, 11, 12 are slightly reduced,

flights

1, 2, 7, 8, 9 remain the original delays, flight 3 increases the delay because it is suspended, and flight 10 greatly reduces the delay because it is protected. The total delay time for all flights does not change, but due to the higher priority of flight 10, after the optimal adjustment, the departure flight queue Q₁The total delay cost of all flights in the system is greatly reduced.

Similarly, the departure flight queue Q is continuously checked₁Optimization is performed with flight 11 having a higher priority and flight 4 having a lower priority, and if flight 4 is selected to be suspended and flight 11 is protected, T will be obtained₂Departure queue Q of time₂. Due to the long pre-tactical period, each airline has sufficient time to selectively protect until a departure queue Q is available during the tactical period_r-1。

Tactical phase adjustment: and pre-tactical stepsCompared with the tactical phase, the tactical phase is closer to the actual takeoff time of the flight, the selective protection is more limited, and only the airspace user is allowed to selectively protect the flight which departs from the same airport and passes through the same terminal area transition point in the tactical phase. In the tactical phase, the airline elects to suspend the departure queue Q_r-1Flight 6, flight 8 is protected and T will be obtained_rDeparture queue Q of time_r。

Adjusting the real-time stage: after the pre-tactical stage and the tactical stage are finished, a stage of flight departure in real time is carried out, and the stage does not allow any adjustment of airspace users. However, in order to ensure that each departing flight takes off safely and efficiently in the face of actual changing conditions, the air traffic control department can finely adjust the departing flight queue formed in the tactical stage according to the actual conditions, and the adjustment can only occur between flights which depart from the same airport and pass through the same terminal area transition point.

Assuming the airline adjusts the departure slots for flight 1 and flight 2 in real time, the true departure queue for the flights will be from T_rDeparture queue Q of time_rConversion to T_r+1Departure queue Q of time_r+1。

Under the adjusted departure queue scheme, the number of flights an airline can protect depends on the operation index OI value of the transit point, defined as:

OI＝100*D/C (3)

where 100 is an operation index normally set for the transit point, D is the number of flights at the time of congestion at the transit point, and C is the capacity of the transit point. When the operation index is larger than 100, congestion occurs at the handover point, a hot area is generated, and the handover point becomes a hot spot, and the congestion condition of the hot spot is more serious when the OI value is larger.

In the selective protection method, each flight has its own operational index OC value, which is usually set to 100, and suspending a flight to the last departure time slot in the hot zone releases all OC values for the flight, which can be used to protect other higher priority flights and protect the number of flights in the hot spot

Comprises the following steps:

wherein when

When not an integer, only the number of flights of an integer part size can be protected, i.e. some released OC values cannot be used.

Based on the adjustment method and the definition, the objective function of the multi-airport collaborative release model is set as follows:

in the formula: c. C_f ^hA decision variable for whether to suspend a flight, E_hAllocating a maximum value of delay for flights in the hotspot h of the airport; the airport hotspot h is the hotspot formed by the situation that when a flight takes off, an airspace near a departure airport is blocked, and a departure transfer point of the airport becomes a hotspot; b is_f ^hThe time for flight f to enter airport hotspot h; s. the_f ^hPlanning the time for the flight f to enter the airport hotspot h; d_fThe waiting cost for flight f on the ground; p is a radical of_f ^hDecision variables for whether to protect flights, B_f ^h-S_f ^hA baseline delay for flight f; v_fValue created for airline to protect flight f; k is a radical of_f ^hA decision variable for whether to hold a flight; m_fA fine delayed for flight f; o is_hThe OI value of the airport hotspot h; c_hIs the conversion factor of time-space domain resources in different hot zones at the handover point.

For a suspended flight, the latency increases from the baseline latency to the maximum assigned latency, and for a protected flight, the latency decreases from the baseline latency to the time scheduled to enter the hotspot. The objective function is the sum of four items, namely the ground waiting cost generated by suspending the flight, the ground waiting cost for protecting the flight reduction, the additional reward for protecting the flight, and the delay fine of suspending and keeping the flight.

c4 indicates that a flight is in a unique status within the hotspot, i.e., protected, suspended, or held baseline delayed;

different questions correspond to different Markov decision processes and have different elements. In order to adjust the departure queue by using a deep reinforcement learning algorithm, a multi-airport collaborative release model needs to be converted into a corresponding Markov decision process, and the problem to be solved by the Markov decision process can be described as follows: the multi-airport system provides N departure time slots for N departure flights according to a first-come-first-serve principle, and the N departure flights in the N departure time slots are taken out and reinserted into the departure time slots to generate a departure queue with lower total delay cost.

The status that may occur when N flights insert N departure slots is:

(3) Reward value r_t: the invention aims to obtain a higher reward value in a machine learning environment, and aims to obtain an outbound flight queue with lower total delay cost in a multi-airport collaborative release environment, wherein the reward value is set as the inverse number of a multi-airport collaborative release model target function based on the total delay cost, namely a reward value function is as follows:

the sum of the reward values after the multiple actions are selected is called as the reward R; the reward may be used to reflect the effect of the agent learning the current environment. After inserting N flights into N different departure time slots, the return value of the multi-airport collaborative release Markov decision process is the opposite number of the total delay cost of the multi-airport system, namely the total delay cost is as follows:

(4) strategy pi: the policy selects an action according to the maximum reward value; however, in order to improve the randomness, the invention sets the strategy as 90% probability to select the optimal action, and 10% probability to randomly select the action, and the intelligent agent selection action entering the next state can be determined according to the strategy and the current state:

a_t＝π(s_t-1|ε) (9)

wherein epsilon is the exploration rate; when ε is set to 0.1, the optimal action is selected at each step with a 90% probability and the action is selected randomly with a 10% probability.

After a Markov decision process environment corresponding to the multi-airport collaborative release problem is designed, a computer needs to find out a law in the Markov decision process based on multi-airport collaborative release through continuous attempts, a multi-airport collaborative release method with lower total delay cost is found, and a large amount of calculation resources need to be consumed during training.

The invention selects an Asynchronous advantageous Actor Critic (A3C) algorithm to train the environment, the algorithm has the greatest characteristic and Advantage of asynchrony, the Actor for selecting an action and the Critic for evaluating a reward value corresponding to the action can be placed in a plurality of threads to carry out Asynchronous training by utilizing the characteristic of multiple cores of a computer, and a central brain Global _ Net, eight copy Worker and eight synchronizer Sync can be established like a four-core processor supporting eight threads. The central brain and the copies are both in the form of Actor-Critic, the network structure of the central brain is mainly used for storing parameters, and the network structure in the copies undertakes more complex tasks.

The specific task is that the Critic in the copy generates an updating amplitude after evaluating the value of an operator selecting action in the copy, so that the updating amplitude can calculate a virtual error TD error of the operator and the Critic in the copy, and the two virtual errors are experiences of the copy transmitted to a central brain to guide the central brain to update parameters of the operator and the Critic. The synchronizer is internally provided with two options of pulling out a pull and pushing in a push, when the pull is selected, the copy takes out the parameters needing to be updated from the central brain, and when the push is selected, the copy pushes the updated experience into the central brain.

After each round is started, an empty list state is established by the initial function reset () in the copy, N0 s are added to the list, so that the list becomes a list with the length of N, wherein each element is of an integer type and is used for storing the number of flights of the corresponding time slot, and the list is the state of the multi-airport collaborative release environment. And meanwhile, setting the sequence number order of the departure time slot flight to be inserted to be 0.

Then step operation function step () will make state [ action ] according to action selected by the Actor]＝state[action]+1, this process inserts flights into departure slots on recovery. Meanwhile, the step () function calculates the reward value corresponding to the action required by Critic according to the action and the order: when action is less than order, it indicates that the flight is protected, and the reward value corresponds to a in the formula (7)_tA case of < t; when action equals order, it means that the flight is held, and the reward value corresponds to a in equation (7)_tThe case of t; when action is greater than order, it means that the flight is suspended, and the reward value corresponds to a in the formula (7)_tT.

According to a strategy, an Actor-Critic in the copy selects 90% of actions capable of generating optimal reward values and 10% of actions randomly according to value evaluation provided by Critic when the Actor selects the actions, N flights are reinserted into N departure time slots by using a step () function for N times, the training round is ended, and the N reward values are summed to obtain the return of the round, namely the counter of the total delay cost of the multi-airport system.

And each thread is regarded as a copy, the copy learning multi-airport collaborative release environment obtains experience, the learned experience is transmitted to the central brain through the synchronizer after the turn is finished, the central brain collects the experience of all the copies, after all the experiences are integrated, the parameters are updated in real time, and then new parameters are transmitted to all the copies through the synchronizer. And the copy starts the next round of learning according to the updated parameters, so that the real-time updated parameters are transmitted in one round until the set number of training rounds is reached, the training of the multi-airport collaborative release model is completed, the return value of each round and the corresponding training round number sequence number are plotted, a return value curve of the whole training can be obtained, and when the return value curve is converged, the model training is proved to be better.

And packaging the model at the moment to obtain the trained multi-airport cooperative release model. The multi-airport collaborative release departure queue adjusted by the deep reinforcement learning algorithm can be quickly obtained by utilizing the trained model, and the multi-airport collaborative release departure queue adjusted based on the deep reinforcement learning algorithm can obtain lower total delay cost on the premise of not increasing the total delay time of a multi-airport system. The method is a multi-airport collaborative release method based on deep reinforcement learning.

Finally, it should be noted that: the above description is only a technical solution of the present invention, but the scope of the present invention is not limited thereto, and those skilled in the art can still make modifications to the technical solution described in the foregoing embodiments or make equivalent substitutions for some technical features within the technical scope of the present invention; and such modifications and substitutions are intended to be included within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A multi-airport collaborative release method based on deep reinforcement learning is characterized by comprising the following specific steps:

firstly, aiming at m airports to be released in a coordinated mode, all flights of a local airline department in the m airports are ranked from high to low according to priority levels by each airline company related to each airport; ensuring that a fitting curve of flight priority distribution of each airline company conforms to power law distribution, namely showing fairness of the airline company;

then, generating an initial departure flight queue meeting the minimum total delay time based on airport fairness; aiming at the characteristics that flights with different priorities cause different delay costs, a multi-airport collaborative release model which aims at minimizing the total delay cost is established for an initial departure flight queue and is converted into a corresponding Markov decision process;

and finally, a deep reinforcement learning algorithm A3C is selected to solve the Markov decision process, and a multi-airport cooperative release departure queue with the minimum final total delay time and the lower total delay cost is obtained.

2. The multi-airport collaborative release method based on deep reinforcement learning as claimed in claim 1, wherein the priorities are in sequence from priority 10 to priority 1 according to model, passenger capacity, passenger carrying rate, whether a honoured guest is carried and whether an emergency special task exists, and each airline company makes a priority setting standard according to the actual situation of the airline company and sets the priority of each flight;

the priority setting criteria are: the fitted curve of the number of flights of 10 different priorities satisfies the probability density function of the power law distribution:

f(x)＝cx^-α-1,x→∞ (1)

in the formula: c and alpha are constants, when the total number of flights of the airlines is different, the corresponding c and alpha are different, but the fitted curves of the flight priority distribution of the airline company are in a long tail function state, and the proportion of flights with different priorities of each airline company is basically the same.

3. The multi-airport collaborative release method based on deep reinforcement learning as claimed in claim 1, wherein the initial departure flight queue is: distributing delay time to each airport according to the proportion of the number of flights, queuing each flight according to the time by adding the distributed delay to the own planned takeoff time, and forming an initial departure flight queue according to a first-come-first-serve principle;

the specific process is as follows:

the objective function is:

in the formula: v ═ V₁,v₂,...v_mThe set of all airports;

for airports v_mThe set of all flights of (a) is,

n is the number of flights; the set of all flights in m airports is

For flight f_nAn available departure time slot set;

for flight f_nThe delay time of (d); x is the number of_fniFor flight f_nWhether a decision variable in the departure time slot i is allocated;

representing the average delay time of flights for a single airport v;

4. The multi-airport collaborative release method based on deep reinforcement learning as claimed in claim 1, wherein the multi-airport collaborative release model formula is as follows:

in the formula: c. C_f ^hA decision variable for whether to suspend a flight, E_hAllocating a maximum value of delay for flights in the hotspot h of the airport; the airport hotspot h is the hotspot formed by the situation that when a flight takes off, an airspace near a departure airport is blocked, and a departure transfer point of the airport becomes a hotspot; b is_f ^hThe time for flight f to enter airport hotspot h; s_f ^hPlanning the time for the flight f to enter the airport hotspot h; d_fThe waiting cost for flight f on the ground; p is a radical of_f ^hDecision variables for whether to protect flights, B_f ^h-S_f ^hA baseline delay for flight f; v_fValue created for airline to protect flight f; k is a radical of_f ^hA decision variable for whether to hold a flight; m_fFine for delayed flight f; o is_hThe OI value of the airport hotspot h; OI is the operation index of the airport handover point; OI 100 × D/C; d is the crossing point when congestion occursThe number of flights; c is the capacity of the handover point; c_hThe method comprises the steps of obtaining a transfer factor of time-space domain resources of different hot zones at a handover point, wherein the hot zones are time intervals in which hot spots exist;

number of flights that can be protected at airport hotspot h;

the OC value is the operability index of each flight;

the objective function is the sum of four items, namely, the ground waiting cost additionally generated by suspending the flight, the ground waiting cost reduced by protecting the flight, the extra reward for protecting the flight, and the delay fine originally generated by suspending and keeping the flight;

c5 indicates that the decision variables that are set are decision variables of the 0-1 integer type, all subject to binary constraints.

5. The multi-airport collaborative releasing method based on deep reinforcement learning as claimed in claim 1, wherein the markov decision process needs to solve the following problems: the multi-airport system provides N departure time slots for N departure flights according to a first-come-first-serve principle, and the N departure flights in the N departure time slots are taken out and reinserted into the departure time slots to generate a departure queue with lower total delay cost;

the specific elements of the Markov decision process comprise: state, action reward value, and policy; respectively set up as follows:

(1) state s_t: setting N observations for N departure time slots, and observing whether flights are inserted into each time slot, wherein the N observations observe the current state of the environment, and the initial state of the environment is that no flight is inserted into the N departure time slots;

the status that may occur when N flights insert N departure slots is:

this S_NThe seed states form a state space S of a multi-airport collaborative release Markov decision process;

(2) action a_t: each flight can select N different actions and is correspondingly inserted into the Nth departure time slot, the flight is protected when the time of the newly selected departure time slot is earlier than the time of the original departure time slot, the flight is suspended when the time of the newly selected departure time slot is later than the time of the original departure time slot, and the base line delay is kept when the time of the newly selected departure time slot is equal to the time of the original departure time slot; selecting N actions for inserting N different departure time slots;

the N actions form an action space A of a multi-airport collaborative release Markov decision process;

a_t＝π(s_t-1|ε) (7)

wherein ε represents the search rate.