CN116502776B

CN116502776B - Flight recovery modeling method, electronic equipment and storage medium

Info

Publication number: CN116502776B
Application number: CN202310763116.XA
Authority: CN
Inventors: 丁建立; 刘德康; 王静
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-08-25
Anticipated expiration: 2043-06-27
Also published as: CN116502776A

Abstract

The invention provides a flight recovery modeling method, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring flight delay information of a target airport, and acquiring an initial scheduling state based on the acquired flight delay information; constructing an initial flight scheduling prediction model; acquiring a state transition sequence TS based on an initial flight scheduling state and an initial flight scheduling prediction model; h state transition sequences are randomly obtained from TS to serve as target state transition sequences, and shift states corresponding to the target state transition sequences serve as training samples; and training the initial flight scheduling prediction model by using a training sample to obtain a target flight scheduling prediction model. The invention can improve the efficiency and accuracy of the flight scheduling.

Description

Flight recovery modeling method, electronic equipment and storage medium

Technical Field

The invention relates to the field of civil aviation flight recovery and deep learning research, in particular to a flight recovery modeling method, electronic equipment and a storage medium.

Background

The delayed flight recovery problem is used as a real-time optimization problem, has numerous and very complex constraint conditions, and belongs to the NP-hard problem. The solution complexity of such dynamic optimization problems grows exponentially with increasing decision and state variable observables, and this computational challenge is called dimensional curse. In recent years, researches on a flight delay recovery scheduling algorithm mainly focus on integer planning and column vector generation, meta-heuristic optimization algorithm, reinforcement learning and the like.

Integer programming and column vector generation algorithms build integer programming models based on constraints to generate a restoration scheduling scheme, but often do not consider minimization of comprehensive delay loss. The meta heuristic optimization algorithm is used for continuously iterating and optimizing the objective function to approach the optimal solution by establishing the objective function, but the problems of sinking into the local optimal solution and the like are often faced. The deep reinforcement learning uses the strong fitting capability of the deep neural network to continuously iterate learning to achieve the optimal strategy instead of the optimal solution by establishing a Markov decision process, so that the training neural network tends to have faster solving speed and higher accuracy for the same problem after convergence.

Disclosure of Invention

Aiming at the technical problems, the invention adopts the following technical scheme:

the embodiment of the invention provides a flight recovery modeling method, which comprises the following steps:

s100, acquiring flight delay information of a target airport, and acquiring an initial scheduling state based on the acquired flight delay information, wherein the initial scheduling state comprises the flight delay information and a flight take-off sequence;

s200, an initial flight scheduling prediction model is built, the input of the initial flight scheduling prediction model is a scheduling state, the probability prediction value of a decision action executed on the input scheduling state is output, and the decision action is an operation of exchanging the take-off sequence of any two flights in the input scheduling state; wherein, the decision action executed on the input shift state meets the set constraint condition;

s300, acquiring a state transition sequence TS= { TS based on the initial flight scheduling state and the initial flight scheduling prediction model ₁ ，TS ₂ ，……，TS _j ，……，TS _m -a }; wherein the j-th state transition sequence TS _j =（S _j ，a _j ，r _j ，φ _j ），S _j A is a shift state corresponding to the j-th state transition sequence _j Is S _j A corresponding target decision action; r is (r) _j Is a as _j Corresponding return value phi _j Termination flag corresponding to the j-th state transition sequence, wherein phi ₁ To phi _m-1 Is a first set value phi _m Is a second set value; wherein, the latter shift state in the shift states corresponding to any two adjacent state transition sequences in TS is obtained by executing corresponding target decision action on the former shift state, r _j =DL _j+1 -DL _j ，DL _j+1 To S pair _j Execution a _j The next shift state S obtained later _j+1 Corresponding delay loss，DL _j Is S _j Corresponding delay loss; the value of j is 1 to m, and m is the number of state transition sequences;

s400, randomly acquiring h state transition sequences from a TS (transport stream) to serve as target state transition sequences, and taking a scheduling state corresponding to the target state transition sequences as a training sample;

s500, inputting training samples of a current batch into a current flight scheduling prediction model for training to obtain a maximum probability prediction value corresponding to each sample;

s600, acquiring a target decision action corresponding to each sample based on a probability prediction value corresponding to each sample, acquiring a next scheduling state corresponding to each sample based on the acquired target decision action, and inputting the acquired next scheduling state into a current flight scheduling prediction model for training to acquire a maximum probability prediction value corresponding to the next scheduling state corresponding to each sample;

s700, acquiring a current loss function value based on a maximum probability prediction value and a return value of a training sample of a current batch and a maximum probability prediction value corresponding to a next scheduling state corresponding to the training sample of the current batch, judging whether the current loss function value accords with a preset model training ending condition, if so, taking the current flight scheduling prediction model as a target flight scheduling prediction model, and if not, adjusting parameters of the current flight scheduling prediction model, and taking the training sample of the next batch as the training sample of the current batch, and executing S500.

Embodiments of the present invention provide a non-transitory computer readable storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the foregoing method.

An embodiment of the present invention provides an electronic device including a processor and the aforementioned non-transitory computer-readable storage medium.

The invention has at least the following beneficial effects:

according to the flight recovery modeling method provided by the embodiment of the invention, for an initial flight delay state, the flight scheduling is performed by using reinforcement learning, the flight scheduling problem is regarded as a sequence decision process, the condition of the flight scheduling is regarded as a state, and the take-off sequence of exchanging certain two flights or the allocation of aircrafts is regarded as a decision action. For a given scheduling state, a decision action is selected randomly with probability through an epsilon-schedule strategy, or an intelligent agent selects two flights to exchange with the aircraft allocated by the flights according to delay loss of the current scheduling, so that the current scheduling state is changed, and meanwhile, the return value is given to judge whether the action is good or bad. Training the intelligent agent according to the return value, wherein the converged intelligent agent can be used as a target intelligent agent, when similar conditions occur again, the trained target intelligent agent can be directly used for scheduling prediction, an optimal scheduling scheme is obtained, and scheduling efficiency and accuracy can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a flight recovery modeling method according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

When large-area delays occur at an airport, a large number of flight vehicles remain at the airport, resulting in economic losses. In order to reduce economic loss caused by flight delay and improve the efficiency of delayed recovery and flight scheduling, the embodiment of the invention provides a deep learning-based flight recovery modeling method, which is used for scheduling delayed flights and detained aircrafts based on a deep reinforcement learning algorithm so as to achieve the purpose of delayed recovery of flights.

The flight recovery modeling method provided by the embodiment of the invention, as shown in fig. 1, can include the following steps:

s100, acquiring flight delay information of a target airport, and acquiring an initial scheduling state based on the acquired flight delay information, wherein the initial scheduling state comprises the flight delay information and a flight take-off sequence.

In the embodiment of the invention, the flight delay information can be acquired through the information release platform of the target airport. In an exemplary embodiment of the present invention, the flight delay information may include at least an ID of the delayed flight, a delay time of the delayed flight, a flight time of the delayed flight, an average riding cost of the delayed flight, a number of users of the delayed flight, that is, a number of passengers of the delayed flight, a maximum user load capacity of an aircraft corresponding to the delayed flight, that is, a maximum passenger load number, an aircraft unit delay loss corresponding to the delayed flight, and a user unit delay loss corresponding to the delayed flight.

In the embodiment of the invention, the unit delay loss of the aircraft corresponding to each delay flight refers to the loss generated by the corresponding aircraft due to the fact that each delay flight delays for one unit time. In an embodiment of the present invention, the unit of time may be minutes. The user unit delay loss corresponding to each delay flight refers to loss brought to the user by each delay of the delay flight for one unit time, for example, for some important users, the delay can generate corresponding loss. In one exemplary embodiment, aircraft unit delay loss and user unit delay loss may be obtained based on existing relevant literature, for example, literature (mixed particle swarm algorithm for flight delay recovery scheduling, journal of transportation engineering, volume 8, phase 2, month 4 of 2008) published methods. In another exemplary embodiment, aircraft unit delay loss and user unit delay loss may be derived based on historical data, e.g., for an aircraft, a benefit from service may be obtained, and the benefit divided by the total length of service may result in a corresponding aircraft unit delay loss. For a certain flight, the total user delay loss and the total delay user number caused by delay since the operation can be obtained, and the ratio of the total user delay loss and the total delay user number is the delay loss of the user unit corresponding to the flight.

In the embodiment of the invention, the initial flight scheduling state can be randomly generated, and delay information corresponding to each flight and a corresponding take-off sequence are taken as a characteristic vector in the state. For example, the initial flight status may be represented as s0= (F ₁ ，F ₂ ，……，F _s ，……，F _n ），F _s Delay information for the s-th delayed flight in the target airport, F _s =（Dt _s ，t _s ，v _s ，vm _s ，w _s ，C _as ，Cp _s ，d _s ），Dt _s To delay the delay time of the flight s, t _s To delay the flight time of flight s, v _s To delay the number of users for a flight s delay vm _s Maximum user load, w, for delayed flights s _s To delay the average riding cost of the flight s, i.e. average fare, C _as Loss of delay for aircraft unit corresponding to delayed flight s, cp _s Delay loss of user unit corresponding to delay flight s, d _s To delay the take-off sequence of flights s; s has a value of 1 to n, n being the number of delayed flights in the destination airport.

S200, an initial flight scheduling prediction model is built, the input of the initial flight scheduling prediction model is a scheduling state, the probability prediction value of the decision action executed on the input scheduling state is output, the probability prediction value represents the preference degree of the corresponding decision action, and the larger the probability prediction value is, the larger the preference degree of the corresponding decision action is represented. The decision action is the operation of exchanging the take-off sequence of any two flights in the input scheduling state; wherein, the decision action executed on the input shift state meets the set constraint condition.

In an embodiment of the present invention, the framework of the initial flight scheduling prediction model may be a neural network structure. In one exemplary embodiment, the neural network structure may be ANN, RNN, LSTM, transformer or the like.

In the embodiment of the invention, when the corresponding decision action is executed on the input shift state, the corresponding feature vector is modified, because if the take-off sequence is changed, the corresponding delay time is changed, so that each feature vector in each shift state is accurate.

In an embodiment of the present invention, the set constraint condition may at least include the following conditions:

condition 1: the departure time of the flight cannot be earlier than the planned departure time;

condition 2: the number of users delayed by a flight cannot be larger than the maximum user bearing capacity of the aircraft corresponding to the flight;

condition 3: the aircraft can only execute one flight task at the same time;

condition 4: each flight can only be executed once.

In the embodiment of the invention, the decision action executed on the input scheduling state meets the set constraint condition, so that the action which cannot be executed can be removed, and the decision action can be realized through a screening mechanism function f (x). If the decision action executed by the model on the input shift state meets the set constraint condition, the corresponding probability prediction value of the action is the value calculated by the model without change, otherwise, the probability prediction value corresponding to the action is 0, namely the action is not selected.

S300, acquiring a state transition sequence TS= { TS based on the initial flight scheduling state and the initial flight scheduling prediction model ₁ ，TS ₂ ，……，TS _j ，……，TS _m -a }; wherein the j-th state transition sequence TS _j =（S _j ，a _j ，r _j ，φ _j ），S _j A is a shift state corresponding to the j-th state transition sequence _j Is S _j The corresponding target decision action is S _j The executed decision action has the decision action corresponding to the maximum probability predicted value; r is (r) _j Is a as _j Corresponding return value phi _j Termination flag corresponding to the j-th state transition sequence, wherein phi ₁ To phi _m-1 For a first set point, e.g. 0, phi _m For the second set value, for example, 1; wherein, the latter shift state in the shift states corresponding to any two adjacent state transition sequences in TS is obtained by executing corresponding target decision action on the former shift state, r _j =DL _j+1 -DL _j ，DL _j+1 To S pair _j Execution a _j The next shift state S obtained later _j+1 Corresponding delay loss, DL _j Is S _j Corresponding delay loss; the value of j is 1 to m, and m is the number of state transition sequences.

Further, S300 specifically includes:

s301, setting j=1, c=0, and c as a number counter for calculating the number of iterations;

s302, the current shift state S _j Inputting the initial flight scheduling prediction model to obtain a corresponding output result X _j =（X _j1 ，X _j2 ，……，X _jh ，……，X _jn ）；X _jh For model pair S _j A probability prediction value obtained when the h decision action is executed; obviously S ₁ For initial shift state, i.e. S ₁ =S ₀ 。

S303, obtaining X _j The decision action corresponding to the maximum probability predicted value in the sequence is taken as S _j Corresponding target decision action to obtain the next shift state S _j+1 And obtain r _j 。

S304, set c=c+1, if S _j For optimal shift status or c=c ₀ Phi is set _j =1, give TS _j Adding the control program into the current TS, and exiting the current control program; otherwise, set phi _j =0, give TS _j And adds to the current TS, executing S305; the initial value of TS is the null set. C (C) ₀ The threshold value for the set iteration number may be an empirical value.

In the embodiment of the invention, S _j The following conditions are satisfied for the optimal shift state: for S _j The return value obtained by all the decision actions executed is negative.

S305, j=j+1 is set, and S302 is executed.

In the embodiment of the invention, the return value is used for measuring the future quality of the decision action, and is mainly related to the delay loss of the flight, if the delay loss is increased due to the action, the return is negative, and the return is positive. The magnitude of the reward is related not only to the current state after the action is performed, but also to the best state that is possible to reach after that state.

Further, in the embodiment of the present invention, the delay loss corresponding to each shift state satisfies the following conditions: DL (DL) _j =∑ ⁿ _i=1 （1+γ _ji ）（P _ji +k _ji1 ×C _jif + k _ji2 ×C _jia + k _ji3 ×C _jip ) Wherein, gamma _ji Is S _j Delayed flight ST corresponding to the ID of the ith delayed flight in the system _ji Importance coefficient, P of (2) _ji For ST _ji Corresponding invisible loss, C _jif For ST _ji Corresponding profit loss, namely that of the affiliated airlines, C _jia For ST _ji Corresponding aircraft delay loss, C _jip For ST _ji Corresponding user delay loss, k _ji1 、k _ji2 And k _ji3 Respectively C _jif 、C _jia And C _jip And the corresponding weight, n, is the number of delayed flights. In an embodiment of the invention, γ _ji 、k _ji1 、k _ji2 And k _ji3 The determination can be made according to the actual situation. For example, the more important a flight is, the greater the corresponding importance coefficient is. For another example, if a flight has a greater impact on the loss of profit than on the loss of delay of the user, the weight associated with the loss of profit is greater than the weight associated with the loss of delay of the user.

In the embodiment of the invention, the invisible loss is mainly determined by the probability that the flight passengers can not select civil aviation travel due to flight delay, in particular P _ji =v _ji ×w _ji ×β _ji ，v _ji For ST _ji Delay of timeNumber of users, w _ji For ST _ji Average riding cost of beta _ji For ST _ji Correspondingly, the disappointing rate function of the user is provided. In one exemplary embodiment, β _ji =[（△LF _ji /60） ² ] ^1/3 /29，0≤β _ji ≤1。△LF _ji For ST _ji Corresponding aircraft unit delay loss.

Further, C _jif =v _ji ×Pf _ji ×w _ji ×Dt _ji /t _ji ，Pf _ji For ST _ji Can be obtained based on the corresponding profit recording table, dt _ji For ST _ji Delay time t of (2) _ji For ST _ji Is a time of flight of (a).

Further, C _jia =Dt _ji ×△LF _ji ，C _jip = Dt _ji ×△LP _ji ，Dt _ji For ST _ji Delay time of DeltaLP _ji For ST _ji Corresponding subscriber units are lost.

S400, randomly acquiring h state transition sequences from the TS to serve as target state transition sequences, and taking a scheduling state corresponding to the target state transition sequences as a training sample.

S500, inputting the training samples of the current batch into the current flight scheduling prediction model for training, and obtaining the maximum probability prediction value corresponding to each sample.

S600, based on the probability prediction value corresponding to each sample, acquiring a target decision action corresponding to each sample, acquiring a next scheduling state corresponding to each sample based on the acquired target decision action, and inputting the acquired next scheduling state into a current flight scheduling prediction model for training to obtain a maximum probability prediction value corresponding to the next scheduling state corresponding to each sample.

In the embodiment of the present invention, the current loss function value satisfies the following condition:

L=X-（R-λ×X _n ) L is the current loss function value, R is the sum of the return values of the training samples of the current batch, X is the sum of the maximum probability prediction values of all the training samples of the current batch, X _n And (3) taking the sum of maximum probability predicted values corresponding to the next scheduling state and corresponding to the training samples of the current batch as a discount factor, wherein lambda is a value from 0 to 1.

In the embodiment of the invention, the current loss function value considers the long-term return value, and the loss of the long-term return due to blind selection of the current maximum return can be avoided. Furthermore, the introduction of a discounting factor can balance the estimated bias and variance due to the over-simplified future rewards.

In the embodiment of the present invention, the preset model training ending condition may be that L is smaller than the set loss threshold, or the training iteration number is larger than the set iteration number.

In the embodiment of the present invention, the set loss threshold and the set iteration number may be empirical values. Those skilled in the art know that if the training number reaches the set iteration number, but the loss function has not converged, it is unreasonable to indicate that the training parameter is set, the training number needs to be increased, and the specific implementation may be in the prior art.

Further, the method provided by the embodiment of the invention can further comprise the following steps:

s800, the received waiting shift state is input into the target flight shift prediction model, and a corresponding target decision action is obtained and displayed.

Embodiments of the present invention also provide a non-transitory computer readable storage medium that may be disposed in an electronic device to store at least one instruction or at least one program for implementing one of the methods embodiments, the at least one instruction or the at least one program being loaded and executed by the processor to implement the methods provided by the embodiments described above.

Embodiments of the present invention also provide an electronic device comprising a processor and the aforementioned non-transitory computer-readable storage medium.

Embodiments of the present invention also provide a computer program product comprising program code for causing an electronic device to carry out the steps of the method according to the various exemplary embodiments of the invention as described in the specification, when said program product is run on the electronic device.

While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method of modeling flight recovery, the method comprising the steps of:

s300, based on the initial flight scheduling state and the initial flight scheduling prediction model, acquiring a state transition sequence TS= { TS ₁ ，TS ₂ ，……，TS _j ，……，TS _m -a }; wherein the j-th state transition sequence TS _j =（S _j ，a _j ，r _j ，φ _j ），S _j A is a shift state corresponding to the j-th state transition sequence _j Is S _j A corresponding target decision action; r is (r) _j Is a as _j Corresponding return value phi _j Termination flag corresponding to the j-th state transition sequence, wherein phi ₁ To phi _m-1 Is a first set value phi _m Is a second set value; wherein, the latter shift state in the shift states corresponding to any two adjacent state transition sequences in TS is obtained by executing corresponding target decision action on the former shift state, r _j =DL _j+1 -DL _j ，DL _j+1 To S pair _j Execution a _j The next shift state S obtained later _j+1 Corresponding delay loss, DL _j Is S _j Corresponding delay loss; the value of j is 1 to m, and m is the number of state transition sequences;

s700, acquiring a current loss function value based on a maximum probability prediction value and a return value of a training sample of a current batch and a maximum probability prediction value corresponding to a next scheduling state corresponding to the training sample of the current batch, judging whether the current loss function value accords with a preset model training ending condition, if so, taking a current flight scheduling prediction model as a target flight scheduling prediction model, if not, adjusting parameters of the current flight scheduling prediction model, and taking the training sample of the next batch as the training sample of the current batch, and executing S500;

the flight delay information at least comprises an ID of the delayed flight, delay time of the delayed flight, flight time of the delayed flight, average riding cost of the delayed flight, number of users of the delayed flight, maximum user bearing capacity of an aircraft corresponding to the delayed flight, aircraft unit delay loss corresponding to the delayed flight and user unit delay loss corresponding to the delayed flight;

wherein DL is _j =∑ ⁿ _i=1 （1+γ _ji ）（P _ji +k _ji1 ×C _jif + k _ji2 ×C _jia + k _ji3 ×C _jip ) Wherein, gamma _ji Is S _j Delayed flight ST corresponding to the ID of the ith delayed flight in the system _ji Importance coefficient, P of (2) _ji For ST _ji Corresponding invisible loss, C _jif For ST _ji Corresponding loss of profit, C _jia For ST _ji Corresponding aircraft delay loss, C _jip For ST _ji Corresponding user delay loss, k _ji1 、k _ji2 And k _ji3 Respectively C _jif 、C _jia And C _jip Corresponding weights, n being the number of delayed flights;

P _ji =v _ji ×w _ji ×β _ji ，v _ji for ST _ji Number of delayed users, w _ji For ST _ji Average riding cost of beta _ji For ST _ji A corresponding user disappointment rate function;

C _jif =v _ji ×Pf _ji ×w _ji ×Dt _ji /t _ji ，v _ji for ST _ji Number of delayed users, w _ji For ST _ji Average riding cost, pf _ji For ST _ji Is the average yield of Dt _ji For ST _ji Delay time t of (2) _ji For ST _ji Is a time of flight of (2);

C _jia =Dt _ji ×△LF _ji ，C _jip = Dt _ji ×△LP _ji ，Dt _ji for ST _ji Delay time of DeltaLF) _ji For ST _ji Corresponding aircraft unit delay loss, Δlp _ji For ST _ji Delay loss of corresponding user units;

the current loss function value satisfies the following condition:

L=X-（R-λ×X _n ) L is the current loss function value, R is the sum of the return values of the training samples of the current batch, X is the sum of the maximum probability prediction values of all the training samples of the current batch, X _n The sum of maximum probability prediction values corresponding to the next scheduling state corresponding to the training samples of the current batch is calculated, lambda is a discount factor, and the value is 0 to 1;

the set constraint conditions at least comprise the following conditions:

condition 3: the aircraft can only execute one flight task at the same time;

condition 4: each flight can only be executed once.

2. A non-transitory computer readable storage medium having at least one instruction or at least one program stored therein, wherein the at least one instruction or the at least one program is loaded and executed by a processor to implement the method of claim 1.

3. An electronic device comprising a processor and the non-transitory computer readable storage medium of claim 2.