CN114781704A - Flight delay prediction method based on station-passing flight guarantee process - Google Patents

Flight delay prediction method based on station-passing flight guarantee process Download PDF

Info

Publication number
CN114781704A
CN114781704A CN202210368680.7A CN202210368680A CN114781704A CN 114781704 A CN114781704 A CN 114781704A CN 202210368680 A CN202210368680 A CN 202210368680A CN 114781704 A CN114781704 A CN 114781704A
Authority
CN
China
Prior art keywords
time
flight
delay
node
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210368680.7A
Other languages
Chinese (zh)
Other versions
CN114781704B (en
Inventor
羊钊
陈怡欣
宋溢露
曾维理
包杰
丛玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202210368680.7A priority Critical patent/CN114781704B/en
Publication of CN114781704A publication Critical patent/CN114781704A/en
Application granted granted Critical
Publication of CN114781704B publication Critical patent/CN114781704B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Game Theory and Decision Science (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a flight delay prediction method based on a station-passing flight guarantee process, which comprises the following steps: collecting and cleaning time data generated on flow nodes when flights pass by stations in a field in a specific time period; calculating a time difference value set of the actual take-off time and the planned departure time of all flights in the original data set to obtain a standard time period of each node difference; constructing each flow node of the flight into a non-European space graph network structure with a logical relationship; calculating the time difference between the standard time interval of the node difference and the delayed flight, loading the time difference into a graph network structure as the node characteristic, and constructing and packaging a graph data set; building a graph convolution neural network for information transmission and information updating; and obtaining an optimal flight delay time prediction model. The method corrects incorrect time node data by constructing the graph network structure with the logical relation, predicts flight delay time, can consider the correlation of different delay occurrence links, and improves prediction precision.

Description

Flight delay prediction method based on station-passing flight guarantee flow
Technical Field
The invention belongs to the technical field of air traffic management, and particularly relates to a flight delay prediction method based on a station-passing flight guarantee process.
Background
With the rapid development of the civil aviation industry, aviation travel more closely focuses on improving the efficiency and quality of flight service. However, flight delay problems due to rapid increase in flight volume and restriction of airspace resources have become more serious, and have become a factor of deterioration of flight service quality.
At present, civil aviation flights run continuously at high positions, and airports, air pipes and navigation drivers simultaneously participate in interweaving in multiple ways when the flights cross stations, so that full-load operation of the system is guaranteed, and the flight punctuality rate is difficult to improve. Although a great deal of research aiming at the problem of flight delay exists at present, most of the current delay prediction methods regard the flight execution process as a whole, and link and front-back relevance generated by delay are not deeply analyzed according to the preorder and postorder flow of flight execution, so that the delay prediction precision is not high. Based on the operation process node data, the method considers the node time and the front-back correlation in the aircraft station-passing process when researching the ground delay, can research the flight delay prediction problem from a new perspective, is favorable for improving the flight delay prediction precision, and provides a scientific method for improving the airport operation efficiency, accelerating the station-passing guarantee of the flight department and relieving the frequent flight delay.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a flight delay prediction method based on a station-passing flight guarantee process, so as to solve the problems that the correlation before and after a delay link generated when an airport station passes is not considered in the prior art and the flight delay prediction precision is not high; the method can start from the time data generated by various operations of the flight in the ground guarantee flow, considers the relevance of the flow nodes before and after the flight passes the station, and converts the time data on the flow nodes into the graph network structural characteristics of the non-European space to predict the flight delay.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the invention relates to a flight delay prediction method based on a station-passing flight guarantee process, which comprises the following steps of:
(1) collecting time data generated by each flow node when the flight passes by the airport in a specific time period, cleaning the original data of the time of each flow of the flight according to the logical relation of the flight operation flow, and taking the original data as an original data set Tp
(2) Computing a raw data set TpTime difference value set D of actual take-off time and planned departure time of all flights in the systemdiffAccording to the flight take-off normal time range [0, t ]r]Wherein t isrRepresenting the maximum normal time of flight departure, and dividing the delayed flight time set TdelayAnd non-delayed flight time set Tnon-delaySeparately calculating delayed flight time sets TdelayAnd non-delayed flight time set Tnon-delayThe time difference value set on each subsequent-preamble flow node of the medium flight is marked as DiffdelayAnd Diffnon-delayCounting time difference set Diff on the following-preamble flow nodes of the non-delayed flightnon-delaySelecting the data segment of the upper quartile QU and the lower quartile QL of the time difference value of each subsequent-preorder process node as a standard time segment set Dstd
(3) Aiming at the forward and backward connection characteristics of each flight flow node, constructing each flow node of the flight into a non-European space graph network structure G with a logical relationship;
(4) standard time period set D using time difference values of each subsequent-preceding flow nodestdCalculating DstdWith delayed flight on the successor-preamble flow nodeSet of difference values Diff betweendelaySet of distances TstdSet the distances TstdInputting a characteristic input set X serving as a process node, loading the characteristic input set X serving as the process node into a graph network structure G, and arranging an edge set E', an index set I, a node characteristic set X and a time difference value set D of delayed flight actual takeoff time and planned departure timediff-delayCollectively packaged into a graph dataset;
(5) constructing four graph convolution neural network models of double-layer GCN, double-layer GAT, double-layer GraphSAGE and combination of single-layer GCN and single-layer GraphSAGE by using the created graph network structure G;
(6) selecting three machine learning models, dividing a training set, a verification set and a test set, training the three machine learning models and four constructed graph convolution neural network models combining double-layer GCN, double-layer GAT, double-layer GraphSAGE, single-layer GCN and single-layer GraphSAGE, and obtaining optimal models of the three machine learning models and the four graph convolution neural network models by adjusting parameters;
(7) and comparing and evaluating the model result by using the obtained three machine learning models and the optimal model of the four graph convolution neural network models.
Further, the outbound flight flow node in step (1) includes: the system comprises a planned entry node, an actual landing node, an actual gear shift node, an open cabin door node, a close cabin door node, a boarding gate open node, a boarding gate close node, an actual gear shift removal node, an actual sliding start node, an actual take-off node and a planned departure node.
Further, the flight time data in step (1) includes: planned entry time, actual landing time, actual gear shift time, cabin door opening time, cabin door closing time, boarding gate opening time, boarding gate closing time, actual gear shift removing time, actual sliding starting time, actual take-off time and planned departure time.
Further, the specific process of the step (1) is as follows:
(11) checking the time context correlation of each collected flight time set, and eliminating abnormal values recorded in the time data (for example, a certain time point of the flight is not on the same day as the rest time points, the next time point of the flight is earlier than the previous time point, the previous time point of the flight is later than the next time point, and the like);
(12) according to the difference length of the opening time and the closing time recorded in each flight time set, classifying the data of the difference length exceeding 400, the opening time in the evening and the closing time in the next morning into an overnight flight set Tp-overClassifying the data of which the difference length does not exceed 400 or does not satisfy the door opening time in the evening and the door closing time in the next morning into a non-overnight flight set Tp-nonover
(13) To collect the actual landing time t to each flightalPlanned arrival time teaActual takeoff time tadAnd planned departure time tedCleaning each sample flight time set data for the reference execution time of the flight; processing abnormal and missing time close to actual landing time and actual takeoff time in priority, and dividing time data to be processed into two types, namely first type time and actual landing time t of the flightalOr actual takeoff time tadAssociating, wherein the second type of time is associated with two or more times before and after the flight; aiming at two types of abnormal values or missing values, respectively collecting T in the overnight flight setp-overAnd non-overnight flight set Tp-nonoverCleaning by using a logical relation with complete time data on different process nodes;
(14) taking all time data after cleaning as an original data set TpWherein T ispComprising a set of overnight flights Tp-overAnd non-overnight flight set Tp-nonoverAnd p represents the number of flights.
Further, the step (13) specifically includes:
(131) for the first class of outliers, if the actual taxi-starting time t of the ith flight isi-atIf the flight is missing or abnormal, the actual taxiing starting time formula of the ith flight is recalculated as follows;
Figure BDA0003586847570000031
in the formula, ti-at(cal) represents the recalculated actual taxi-starting time of the ith flight, p-name represents the number of flights in the data set, ti-adThe actual takeoff time of the ith flight is represented, and the p-noover and the p-over respectively represent the flight numbers of the non-overnight flights and the overnight flights;
(132) for the abnormal value of the second kind, if the actual door closing time t of the ith flighti-cdIf the door is missing or abnormal, the actual door closing time formula of the ith flight is obtained by recalculation as follows;
Figure BDA0003586847570000032
in the formula, ti-cd(cal) represents the actual closing door time of the ith flight calculated again, p-name represents the number of flights in the data set, ti-boffIndicating the actual off-gear time, t, for the ith flighti-odRepresenting the actual opening door time of the ith flight and p-noover and p-over representing the flight numbers of the non-overnight flights and the overnight flights, respectively.
Further, the specific process of the step (2) is as follows:
(21) computing a raw data set TpTime difference value set D of actual takeoff time and planned departure time of each flightdiffThe formula is as follows:
Figure BDA0003586847570000041
in the formula, diRepresenting the calculated differential time, t, for the ith flighti-adRepresenting the actual departure time, t, for the ith flighti-edRepresenting the scheduled departure time of the ith flight;
(22) set of flights T at nightp-overAnd non-overnight flight set Tp-nonoverIn and respectively drawDelayed flight time set TdelayAnd non-delayed flight time set Tnon-delayTwo types, the formula is as follows;
Figure BDA0003586847570000042
in the formula, m represents the number of flow nodes contained in one flight, p-name represents the number of flights in the data set, and p-notover and p-over represent the number of flights of non-overnight flights and overnight flights respectively;
(23) set of flights T at nightp-overAnd non-overnight flight set Tp-nonoverRespectively calculating delayed flight time sets TdelayAnd non-delayed flight time set Tnon-delayTime difference value set Diff on each subsequent-preamble flow node of medium flightdelayAnd Diffnon-delayWherein the time difference value on the subsequent-preorder process nodes represents the difference value between the subsequent time and the preorder time on two adjacent process nodes, and comprises actual landing time-planned arrival time, actual gear-landing time-actual landing time, door opening time-gear opening time, gate closing time-door opening time, actual gear withdrawing time-door closing time, actual gear withdrawing time-gate closing time, actual gear starting time-actual gear withdrawing time, actual takeoff time-actual start coasting time and actual takeoff time-planned time;
(24) respectively counting overnight flight sets Tp-overAnd non-overnight flight set Tp-nonoverMedium non-delayed flight time set Tnon-delayTime difference value set Diff at each of the following-preamble flow nodes ofnon-delayThe mean value, the upper quartile, the lower quartile, the maximum value and the minimum value of the time difference value set are drawn by using a box diagramnon-delaySelecting the data segment of the upper quartile QU and the lower quartile QL of each time difference on each subsequent-preorder process node as a standard time segment set Dstd
Further, the specific process of step (3) is as follows:
(31) creating graph network structure G (V, E) aiming at used flight flow node data, G represents graph network structure after creation, VaE V represents non-null by a finite number of flow nodes V ═ V1,v2,v3,......vnSet of points, v1Represents the 1 st flow node, (v)a,vb) E denotes a finite edge E ═ E1,e2,e3,......ep-delay*(m-1)Set of edges, e1Representing the 1 st edge, and p-delay representing the number of flights that delay a flight;
(32) respectively constructing an adjacency matrix and a degree matrix on the graph aiming at the graph network structure G, wherein the formula is as follows:
Figure BDA0003586847570000051
Figure BDA0003586847570000052
in the formula, AabAn adjacency matrix representing the constructed graph network structure, a and b respectively represent the flow node number, vaAnd vbRespectively representing a flow node and b, R represents a real number field, N represents the number of all flow nodes of a delayed flight, N is p-delay m, p-delay represents the number of flight pieces of the delayed flight, m represents the number of the flow nodes contained in one flight, DabA degree matrix representing the network structure of the graph.
Further, the specific process of step (4) is as follows:
(41) taking out the edge sets E of all delayed flight samples, converting the edge sets E through the process node numbers, sequentially arranging the edge sets E according to the order of the edges and the size of the process node numbers, and storing the arranged edge sets E' into a data set 1;
(42) arranging flow nodes belonging to different graphs in sequence according to indexes, wherein the flow nodes in each graph have the same index value to obtain an index set I, and storing the index set I into a data set 2;
Figure BDA0003586847570000053
in the formula, m represents the number of flow nodes contained in one flight, and p-delay represents the number of flights delaying the flight;
(43) using the set of standard time periods D for non-delayed flights obtained in step (24)stdRespectively associated with overnight flight set Tp-overAnd non-overnight flight set Tp-nonoverDelayed flight time set T indelayTime difference value set Diff on a following-preceding flow node ofdelayComparing the flight time set T with the formula (8) to obtain the delayed flight time set TdelayTime difference set Diff with nodes of a subsequent-preceding flowdelaySet of distances T betweenstdSet the distances TstdThe feature input set X is used as a process node, the feature input set X of the process node is used as a node attribute and is sequentially arranged according to the flight sequence, and the arranged data is stored into a data set 3 according to a two-dimensional data format;
Figure BDA0003586847570000061
in the formula, TstdIndicating a set of delayed flight times TdelayTime difference set Diff with nodes of a subsequent-preceding flowdelaySet of distances between, TisDistance, QL, representing the s time difference of the ith flight from the standard time periodsLower quartile, QU, of standard time period representing the s-th time differencesUpper quartile of standard period, Diff, representing the s-th time differencedelay-isA value representing the s-th time difference for the ith delayed flight, p-delay representing the number of delayed flights;
(44) collecting the time difference values D of the actual departure time and the planned departure time of all delayed flights in the samplediff-delayAs graph attributes, arranging the time difference value sets D in sequence according to flight sequence and according to a two-dimensional data formatdiff-delayStoring the tags into the data set 4;
(45) and packaging the data set 1, the data set 2, the data set 3 and the data set 4 into a graph data set together, so that the graph network structure taken out each time is a subset of the original graph network structure.
Further, the specific process of the step (5) is as follows:
(51) for the created graph network structure G, transmitting the features on the graph to the lower layer by using a propagation rule f, wherein the formula is as follows;
Figure BDA0003586847570000062
in the formula, H(l)Features representing the l-th layer, A represents a graph network structure description of the flow node, where the adjacency matrix A is usedabDenotes, Z denotes the output, X(0)Representing and calculating a characteristic input set of a model needing to be input;
(52) mapping the delivery of layers to specific data, the formula is expressed as follows:
X(l+1)=f(X(l),A) (10)
wherein f represents a propagation rule, X(l)A feature input set representing the flow node of the l layer;
(53) in the constructed GCN layer, the propagation formula of the propagation rule f is as follows:
Figure BDA0003586847570000071
in the formula, X(l+1)A feature output set representing the flow nodes of the l layers, D a degree matrix of the graph network structure of the input l layers, InShowing the self-circulation of the network structure of the graph, W(l)A convolution kernel representing the l-th layer, i.e. a learnable weight, σ represents a nonlinear transformation;
(54) constructing a GAT layer, and realizing that different weights are distributed to different edges through attention coefficients;
(55) constructing a GraphSAGE layer, converting a full graph training mode of the process node characteristics into a small batch training mode taking the process node characteristics as the center through neighbor sampling, and performing characteristic aggregation on information of neighbor process nodes by adopting an aggregation function;
(56) and (2) aiming at different graph convolution neural network layers, considering the time of the station-crossing flight flow as characteristic input and a graph guide task needing to be predicted, and constructing a double-layer graph convolution neural network, wherein the method comprises the following steps of: a double-layer GCN layer, a double-layer GAT layer, a double-layer GraphSAGE layer, a single-layer GCN layer and a single-layer GraphSAGE layer are connected to form a graph neural network; each double-layer graph neural network is connected with the double-layer fully-connected neural network and the final pooling layer, and the node guide tasks and the edge guide tasks are converted into global graph guide tasks to form four different graph convolution neural network models.
Further, the step (54) specifically includes:
(541) when the GAT layer is transmitted, the flow node v is calculated according to the characteristic that each flow node is connected with different neighbor flow nodesaTo flow node vbCoefficient of correlation eabFurther calculating the attention coefficient of each edge, wherein the formula is as follows;
Figure BDA0003586847570000072
in the formula, LeakyReLU represents an activation function, αabRepresents a flow node vaTo flow node vbThe attention coefficient of (b), k represents a certain neighbor process node of the process node (a),
Figure BDA0003586847570000073
a neighbor process node set representing a process node a;
(542) according to the attention coefficient obtained by calculation, weighting and summing the characteristics, wherein the formula is as follows;
Figure BDA0003586847570000081
in formula (II), x'aRepresenting that each process node fuses new features of neighborhood information, W represents a learnable weight, xbRepresenting a flow node vbThe features of (a);
(543) new feature x 'from generation of each flow node'aAnd generating a new flow node feature set X', and forming a GAT layer to transmit features by using the information transmission mode of the GCN layer.
Further, the step (55) specifically includes:
(551) after aggregating the characteristics of the neighbor process nodes, the GraphSAGE layer aggregates the characteristics of the neighbor process nodes and the characteristics of the process nodes, wherein the aggregation method is specifically expressed as follows;
Figure BDA0003586847570000082
wherein k represents the total iterative polymerization number, W(k)Represents the weight to be learned at the k-th aggregation, σ represents the nonlinear transformation,
Figure BDA0003586847570000083
represents the characteristics of the a flow node after the k-1 aggregation, AGGREGATEkDenotes the aggregation function of the kth time, γ (v)a) A set of neighboring process nodes representing the a-th process node,
Figure BDA0003586847570000084
the CONCAT represents a function for splicing the characteristics of the a flow node and the characteristics of the neighboring flow nodes;
(552) performing L2 standardization on the aggregated characteristics of each flow node, wherein the formula is as follows;
Figure BDA0003586847570000085
wherein V represents a set of flow nodes in a graph network structure,
Figure BDA0003586847570000086
and representing the characteristic vector of the a-th flow node after k times of aggregation.
Further, the specific process of the step (6) is as follows:
(61) the three selected machine learning models are a decision tree model, a random forest model and an XGboost model;
(62) partition training set XtrainVerification set XvalAnd test set Xtest
(63) Normalizing the data, putting the normalized data into each model, averaging the model results of each operation by adopting a K-fold cross validation method, training and adjusting parameters to obtain the optimal model of three machine learning models;
(64) and respectively inputting the training set data into four graph convolution neural network models combined by double-layer GCN, double-layer GAT, double-layer GraphSAGE and single-layer GCN and single-layer GraphSAGE for training, taking the average absolute error (MAE) as the error of back propagation to update the weight, and performing model training and parameter adjustment for multiple times to obtain the optimal model of the four graph convolution neural network models.
Further, the specific process of step (7) is as follows:
(71) selecting three indexes of Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percent Error (MAPE) to measure the distance between the predicted value and the true value of each model;
(72) and (4) inputting the test set into the obtained optimal models of the three machine learning models and the four graph convolution neural network models, and comparing the performances of the three machine learning models and the four graph convolution neural network models by using the index (71).
The invention has the beneficial effects that:
the method focuses on multi-step processes of the over-station flight during ground guarantee, and starts from the angles of correlation of various processes and different lengths of operation time, a graph network structure with a logical relation is constructed, the operation process time of a standard flight is extracted from the time set of all process nodes, and the operation process time of a delayed flight is processed to obtain the node characteristics on the graph network, so that the flight operation process is combined with flight delay prediction, the data set of the flight delay prediction is enriched, and the accuracy of the flight delay time prediction is improved.
The method considers the relevant data of the operation flow and the logic relation between the data when the flight is ensured on the ground, and has practical application value in the aspects of the generation mechanism and the time prediction of the flight ground takeoff delay.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2a is a boxplot of time differences for non-overnight flights in a non-delayed flight, according to an embodiment of the invention.
FIG. 2b is a boxed graph of the time differences for the overnight flights in a non-delayed flight, in accordance with an embodiment of the present invention.
Fig. 3 is a flow chart related to a passenger according to an embodiment of the present invention.
FIG. 4 is a diagram illustrating a neural network according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of K-fold cross validation according to an embodiment of the present invention.
Detailed Description
In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention.
Referring to fig. 1, the flight delay prediction method based on the station-passing flight support process of the present invention includes the following steps:
(1) collecting time data generated by the airport in a specific time period on each process node when the flight passes the station, cleaning the original data of the time on each process of the flight according to the logical relation of the flight operation process, and using the original data as an original data set Tp
Wherein, the flow node of the stop-passing flight in the step (1) comprises: the system comprises a planned entry node, an actual landing node, an actual gear shift node, an open cabin door node, a close cabin door node, a boarding gate open node, a boarding gate close node, an actual gear shift removal node, an actual sliding start node, an actual take-off node and a planned departure node.
The flight time data in the step (1) comprises: planned arrival time, actual landing time, actual gear shift time, cabin door opening time, cabin door closing time, boarding gate opening time, boarding gate closing time, actual gear shift removing time, actual sliding starting time, actual take-off time and planned departure time.
In an example, time data of all the over-station flight guarantee process nodes of the Pudong international airport 2019 from 6/1/12/31 are collected, and the collected data comprises a time set including 11 process nodes such as planned take-off time, planned arrival time, actual take-off time, actual landing time and actual gear shift time of a front station;
due to the fact that the acquired time set has a front-back logic relationship, data needs to be analyzed and preprocessed reasonably; the principle of treatment is as follows:
(11) checking the time context correlation of each collected flight time set, and eliminating abnormal values recorded in the time data (for example, a certain time point of the flight is not on the same day as the rest time points, the next time point of the flight is earlier than the previous time point, the previous time point of the flight is later than the next time point, and the like);
(12) because the example data has the condition of overnight takeoff, according to the difference length of the opening time and the closing time recorded in each flight time set, the data of which the difference length exceeds 400, the opening time is in the evening, and the closing time is in the next morning is classified as the overnight flight set Tp-overClassifying the data of which the difference length does not exceed 400 or does not satisfy the door opening time in the evening and the door closing time in the next morning into a non-overnight flight set Tp-nonover
(13) To collect the actual landing time t to each flightalPlanned arrival time teaActual takeoff time tadAnd planned departure time tedFor reference of flightLine time, cleaning the flight time set data of each sample; the abnormal time and the missing time close to the actual landing time and the actual takeoff time are processed preferentially, the time data needing to be processed are divided into two types, one type of time is the actual landing time t of the flightalOr actual takeoff time tadAssociating another type of time with two or more times before and after the flight; aiming at two types of abnormal values or missing values, respectively collecting T in the overnight flight setp-overAnd non-overnight flight set Tp-nonoverCleaning by using a logical relation with complete time data on different process nodes; the following is described with two types of problem node examples respectively;
(131) the first class of outliers is illustrated using the actual taxi-start time, t, if the actual taxi-start time of the ith flight is ti-atIf the flight is missing or abnormal, the actual taxiing starting time formula of the ith flight is recalculated as follows;
Figure BDA0003586847570000101
in the formula, ti-at(cal) represents the recalculated actual taxi-starting time of the ith flight, p-name represents the number of flights in the data set, ti-adThe actual takeoff time of the ith flight is represented, and the p-noover and the p-over respectively represent the flight numbers of the non-overnight flights and the overnight flights;
(132) the second category of outliers is exemplified using the actual door closing time, t, if the actual door closing time for the ith flight isi-cdIf the flight door is missing or abnormal, the actual door closing time formula of the ith flight obtained by recalculation is as follows;
Figure BDA0003586847570000111
in the formula, ti-cd(cal) represents the recalculated actual door closing time for the ith flight, and p-name represents the flight in the data setNumber of strips, ti-boffActual gear-off time, t, representing the ith flighti-odRepresenting the actual opening door time of the ith flight and p-noover and p-over representing the number of flights for non-overnight flights and overnight flights, respectively.
(14) Taking all time data after cleaning as an original data set TpWherein T ispComprising a set of overnight flights Tp-overAnd non-overnight flight set Tp-nonoverAnd p represents the number of flights.
(2) Computing a raw data set TpTime difference value set D of actual takeoff time and planned departure time of all flights in the flight systemdiffAccording to the flight take-off normal time range [0, t ]r]Wherein t isrRepresenting the maximum normal time of flight departure, and dividing the delayed flight time set TdelayAnd non-delayed flight time set Tnon-delayRespectively calculating delayed flight time set TdelayAnd non-delayed flight time set Tnon-delayThe time difference value set on each subsequent-preamble flow node of the medium flight is recorded as DiffdelayAnd Diffnon-delayCounting time difference value set Diff on the nodes of the subsequent-preceding process of the non-delayed flightnon-delaySelecting the data segment of the time difference value of each subsequent-preorder process node, wherein the data segment is the mean value, the upper quartile, the lower quartile, the maximum value and the minimum value of the time difference value of each subsequent-preorder process node, and the data segment is the standard time segment set Dstd
The specific process of the step (2) is as follows:
(21) computing a raw data set TpTime difference value set D between actual take-off time and planned departure time of each flightdiffThe formula is as follows:
Figure BDA0003586847570000112
in the formula, diRepresenting the calculated differential time, t, for the ith flighti-adIndicating the actual departure time of the ith flight,ti-edrepresenting the planned departure time of the ith flight;
(22) set of flights T at nightp-overAnd non-overnight flight set Tp-nonoverIn the middle, the delayed flight time sets T are divided respectivelydelayAnd non-delayed flight time set Tnon-delayThe formula is as follows;
Figure BDA0003586847570000121
in the formula, m represents the number of flow nodes contained in one flight, p-name represents the number of flights in the data set, and p-notover and p-over represent the number of flights of non-overnight flights and overnight flights respectively;
(23) set of flights T at nightp-overAnd non-overnight flight set Tp-nonoverRespectively calculating delayed flight time sets TdelayAnd non-delayed flight time set Tnon-delayTime difference value set Diff on each subsequent-preamble flow node of medium flightdelayAnd Diffnon-delayThe difference between the actual takeoff time and the subsequent process nodes is 0, wherein the time difference between the subsequent process nodes and the previous process nodes represents the difference between the subsequent time and the previous time on the two adjacent process nodes, and comprises the actual landing time, the planned arrival time, the actual gear shifting time, the actual landing time, the door opening time, the gear closing time, the door opening time, the actual gear withdrawing time, the door closing time, the actual gear withdrawing time, the door closing time, the actual gear withdrawing time, the door opening time, the actual starting sliding time, the actual gear withdrawing time, the actual starting sliding time, the actual taking-the planned departure time;
(24) respectively counting overnight flight sets Tp-overAnd non-overnight flight set Tp-nonoverSet of medium and non-delayed flight times Tnon-delayTime difference value set Diff at each of the following-preamble flow nodes ofnon-delayThe mean value, the upper quartile, the lower quartile, the maximum value and the minimum value of the flight data are drawn by using a box diagramTime difference value set Diff on preamble flow nodesnon-delayThe distribution state of each time difference is shown in fig. 2 a-2 b, and the data segment where the upper quartile QU and the lower quartile QL of each time difference on each subsequent-preamble process node are located is selected as a standard time segment set Dstd(ii) a The selected node differences are shown in table 1:
TABLE 1
Figure BDA0003586847570000131
(3) Aiming at the front-back connection characteristics of each flight flow node, constructing each flow node of the flight into a non-European space graph network structure G with a logical relationship; as shown with reference to figure 3 of the drawings,
the specific process of the step (3) is as follows:
(31) creating graph network structure G (V, E) aiming at used flight flow node data, wherein G represents the graph network structure after creation, VaE V represents non-null by a finite number of flow nodes V ═ V1,v2,v3,......vnSet of points, v1Represents the 1 st flow node, (v)a,vb) E denotes a finite edge E ═ E1,e2,e3,......ep-delay*(m-1)Set of edges, e1Representing edge 1, p-delay represents the number of flights that delayed a flight;
(32) respectively constructing an adjacency matrix and a degree matrix on the graph aiming at the graph network structure G, wherein the formula is as follows:
Figure BDA0003586847570000132
Figure BDA0003586847570000133
in the formula, AabAn adjacency matrix representing the constructed graph network structure, a and b respectively represent flow node numbers, vaAnd vbRespectively representing a flow node and b, R represents a real number field, N represents the number of all flow nodes of all delayed flights, N is p-delay m, p-delay represents the number of the delayed flights, m represents the number of the flow nodes contained in one flight, DabA degree matrix representing the network structure of the graph.
(4) Standard time period set D using time difference values of each subsequent-preceding flow nodestdCalculating DstdTime difference value set Diff on follow-preamble flow node of delayed flightdelaySet of distances TstdSet the distances TstdInputting a set X as the characteristic of a process node, loading the X as the characteristic of the process node into a graph network structure G, and arranging an edge set E', an index set I, a node characteristic set X and a time difference value set D of actual takeoff time and planned departure timediffCollectively packaged into a graph dataset;
the specific process of the step (4) is as follows:
(41) taking out the edge sets E of all delayed flight samples, converting the edge sets E through the process node numbers, sequentially arranging the edge sets E according to the order of the edges and the size of the process node numbers, and storing the arranged edge sets E' into a data set 1;
(42) sequentially arranging flow nodes belonging to different graphs according to indexes, wherein the flow nodes in each graph have the same index value to obtain an index set I, and storing the index set I into a data set 2;
Figure BDA0003586847570000141
in the formula, m represents the number of flow nodes contained in one flight, and p-delay represents the number of flights delaying the flight;
(43) using the set D of standard time periods of the non-delayed flights obtained in step (24)stdRespectively with the set of overnight flights Tp-overAnd non-overnight flight set Tp-nonoverDelayed flight time set T indelayTime difference value set Diff on a following-preceding flow node ofdelayComparing the flight time set T with the formula (8) to obtain the delayed flight time set TdelayTime difference set Diff with nodes of a subsequent-preceding flowdelaySet of distances T betweenstdSet the distances TstdThe feature input set X serving as the process node is sequentially arranged according to the flight sequence by serving as the node attribute, and the arranged data is stored into a data set 3 according to a two-dimensional data format;
Figure BDA0003586847570000142
in the formula, TstdIndicating a set of delayed flight times TdelayTime difference value set Diff on nodes of following-preceding proceduredelaySet of distances between, TisDistance, QL, representing the s-th time difference of the ith flight from the standard time periodsLower quartile, QU, of standard period representing the s-th time differencesUpper quartile, Diff, of standard period representing the s-th time differencedelay-isA value representing the s time difference of the ith delayed flight, and p-delay represents the number of delayed flights;
(44) collecting the time difference values D of the actual takeoff time and the planned departure time of all delayed flights in the samplediff-delayAs graph attributes, D is arranged in sequence according to flight order and is in a two-dimensional data formatdiff-delayStored as a tag into the data set 4;
(45) and packaging the data set 1, the data set 2, the data set 3 and the data set 4 into a graph data set together, so that the graph network structure taken out each time is a subset of the original graph network structure.
(5) Constructing four graph convolution neural network models of double-layer GCN, double-layer GAT, double-layer GraphSAGE and combination of single-layer GCN and single-layer GraphSAGE by using the created graph network structure G;
the specific process of the step (5) is as follows:
(51) for the created graph network structure G, transmitting the features on the graph to a lower layer by using a propagation rule f, wherein the formula is as follows;
Figure BDA0003586847570000151
in the formula, H(l)Features representing the l-th layer, A represents a graph network structure description of the flow node, where the adjacency matrix A is usedabDenotes, Z denotes output, X(0)Representing and calculating a characteristic input set of a model needing to be input;
(52) mapping the delivery of layers onto specific data, the formula is expressed as follows:
X(l+1)=f(X(l),A) (10)
wherein f represents a propagation rule, X(l)A feature input set representing the flow node of the l layer;
(53) in the constructed GCN layer, the propagation formula of the propagation rule f is as follows:
Figure BDA0003586847570000152
in the formula, X(l+1)Representing the feature output set of the flow nodes of the l layers, D representing the degree matrix of the graph network structure of the input l layers, InRepresenting a self-loop of the network structure of the diagram, W(l)The convolution kernel representing the l-th layer, i.e. the learnable weight, σ represents the nonlinear transformation;
(54) constructing a GAT layer, and realizing that different weights are distributed to different edges through attention coefficients;
(55) constructing a GraphSAGE layer, converting a full graph training mode of the process node characteristics into a small batch training mode taking the process node characteristics as the center through neighbor sampling, and performing characteristic aggregation on information of neighbor process nodes by adopting an aggregation function;
(56) and (2) aiming at different graph convolution neural network layers, considering the time of the station-crossing flight flow as characteristic input and a graph guide task needing to be predicted, and constructing a double-layer graph convolution neural network, wherein the method comprises the following steps of: a double-layer GCN layer, a double-layer GAT layer, a double-layer GraphSAGE layer, a single-layer GCN layer and a single-layer GraphSAGE layer are connected to form a graph neural network; each double-layer graph neural network is connected with the double-layer fully-connected neural network and the final pooling layer, and the node guide tasks and the edge guide tasks are converted into global graph guide tasks to form four different graph convolution neural network models, which are shown in a figure 4.
Further, the step (54) specifically includes:
(541) when the GAT layer is transmitted, the flow node v is calculated according to the characteristic that each flow node is connected with different neighbor flow nodesaTo flow node vbCoefficient of correlation eabFurther calculating the attention coefficient of each edge, wherein the formula is as follows;
Figure BDA0003586847570000161
in the formula, LeakyReLU represents an activation function, αabRepresents a flow node vaTo flow node vbThe attention coefficient of (b), k represents a certain neighbor process node of the process node (a),
Figure BDA0003586847570000162
a neighbor process node set representing a process node a;
(542) according to the attention coefficient obtained by calculation, weighting and summing the characteristics, wherein the formula is as follows;
Figure BDA0003586847570000163
x 'in the formula'aRepresenting that each process node fuses new features of neighborhood information, W represents a learnable weight, xbRepresents a flow node vbThe features of (1);
(543) new feature x 'from generation of each flow node'aGenerating new flow node feature set X', forming GAT layer to transmit features by GCN layer information transmission mode。
Further, the step (55) specifically includes:
(551) after aggregating the characteristics of the neighbor process nodes, the GraphSAGE layer aggregates the characteristics of the neighbor process nodes and the characteristics of the process nodes, wherein the aggregation method is specifically expressed as follows;
Figure BDA0003586847570000164
wherein k represents the total iterative polymerization number, W(k)Represents the weight to be learned at the k-th aggregation, σ represents the nonlinear transformation,
Figure BDA0003586847570000171
represents the characteristics of the a flow node after the k-1 aggregation, AGGREGATEkDenotes the aggregation function of the kth time, γ (v)a) A set of neighboring process nodes representing the a-th process node,
Figure BDA0003586847570000172
the CONCAT represents a function for splicing the characteristics of the a flow node and the characteristics of the neighboring flow nodes;
(552) performing L2 standardization on the aggregated characteristics of each flow node, wherein the formula is as follows;
Figure BDA0003586847570000173
wherein V represents a set of flow nodes in a graph network structure,
Figure BDA0003586847570000174
and representing the characteristic vector of the a-th flow node after k times of aggregation.
(6) Selecting three machine learning models, dividing a training set, a verification set and a test set, training the three machine learning models and four constructed graph convolution neural network models combining double-layer GCN, double-layer GAT, double-layer GraphSAGE, single-layer GCN and single-layer GraphSAGE, and obtaining optimal models of the three machine learning models and the four graph convolution neural network models by adjusting parameters;
the specific process of the step (6) is as follows:
(61) the three selected machine learning models are a decision tree model, a random forest model and an XGboost model;
(62) the number of the finally available samples obtained by cleaning the data is 18794, and in the comparison model, the first 16500 samples are used as a training sample set XtrainAnd validating sample set XvalBy using a K-fold cross validation method, as shown in fig. 5, training of samples is performed, and 16501 to 18794 samples are used as a test sample set Xtest(ii) a In four graph convolutional neural network models combining double-layer GCN, double-layer GAT, double-layer GraphSAGE and single-layer GCN with single-layer GraphSAGE, the first 15000 strips are taken as training sample sets, the 15001 to 16500 strips are taken as verification sample sets, and the 16501 to 18794 strips are taken as test sample sets;
(63) normalizing the samples of the training set to accelerate the training speed of the model and improve the prediction accuracy of the model, wherein the formula is as follows;
Figure BDA0003586847570000175
in the formula, x _ stdijIs a new feature, x, of the ith flight normalized by the jth feature of the ith flightijRepresents the characteristics of the ith flight before the jth characteristic is normalized, xmin-jDenotes the minimum value, x, in the jth featuremax-jRepresents the maximum value in the jth feature;
aiming at three machine learning models of a decision tree, a random forest and an XGboost, in a K-fold cross validation method, making K equal to 5, averaging the model result of each operation, adjusting parameters, and referring to tables 2, 3 and 4 for the adjusted optimal parameter combination;
TABLE 2
Figure BDA0003586847570000181
TABLE 3
Figure BDA0003586847570000182
TABLE 4
Figure BDA0003586847570000183
(64) Respectively inputting training set data into four graph convolution neural network models combined by double-layer GCN, double-layer GAT, double-layer GraphSAGE and single-layer GCN and single-layer GraphSAGE for training, taking average absolute error (MAE) as counter-propagating error to update weight, and performing model training and parameter adjustment for multiple times to obtain an optimal model of the four graph convolution neural network models; as shown in table 5;
TABLE 5
Figure BDA0003586847570000184
Figure BDA0003586847570000191
(7) Comparing and evaluating model results by using the obtained three machine learning models and the optimal model of the four graph convolution neural network models;
the specific process of the step (7) is as follows:
(71) selecting three indexes of Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) to measure the distance between the predicted value and the true value of each model; the formulas are respectively as follows:
Figure BDA0003586847570000192
Figure BDA0003586847570000193
Figure BDA0003586847570000194
wherein p-delay represents the number of flights that delay a flight, yi' tag value, y, for the ith flight predicted by each modeliA true tag value representing the ith flight;
(72) and (4) inputting the test set into the obtained optimal models of the three machine learning models and the four image convolution neural network models, and comparing the performances of the three machine learning models and the four image convolution neural network models by using the index (71). The seven model predictions are shown in table 6:
TABLE 6
Figure BDA0003586847570000195
It can be seen from table 6 that the conventional methods such as random forest and decision tree are difficult to adapt to time node data, and the predicted flight delay time on the verification set has a larger error compared with the true value, but the method of the present invention takes into account the front-back logic relationship of various operation flows in the flight over-station flow, converts the logic relationship into a graph structure and loads the graph structure into node characteristics, and the predicted flight delay time is more accurate than that of the conventional methods, so the method of the present invention has better applicability on flight delay prediction.
While the invention has been described in terms of its preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims (10)

1. A flight delay prediction method based on a station-passing flight guarantee process is characterized by comprising the following steps:
(1) collecting time data generated by the airport in a specific time period on each process node when the flight passes the station, cleaning the original data of the time on each process of the flight according to the logical relation of the flight operation process, and using the original data as an original data set Tp
(2) Computing a raw data set TpTime difference value set D of actual takeoff time and planned departure time of all flights in the flight systemdiffAccording to the flight departure normal time range [0, tr]Wherein t isrRepresenting the maximum normal time of flight departure, and dividing the delayed flight time set TdelayAnd non-delayed flight time set Tnon-delaySeparately calculating delayed flight time sets TdelayAnd non-delayed flight time set Tnon-delayThe time difference value set on each subsequent-preamble flow node of the medium flight is marked as DiffdelayAnd Diffnon-delayCounting time difference set Diff on the following-preamble flow nodes of the non-delayed flightnon-delaySelecting the data segment of the time difference value of each subsequent-preorder process node, wherein the data segment is the mean value, the upper quartile, the lower quartile, the maximum value and the minimum value of the time difference value of each subsequent-preorder process node, and the data segment is the standard time segment set Dstd
(3) Aiming at the front-back connection characteristics of each flight flow node, constructing each flow node of the flight into a non-European space graph network structure G with a logical relationship;
(4) standard time period set D using time difference values of each subsequent-preceding flow nodestdCalculating DstdTime difference value set Diff on follow-preamble flow node of delayed flightdelaySet of distances TstdSet the distances TstdInputting a set X as the characteristic of a process node, loading the X into a graph network structure G as the characteristic of the process node, and inputting an edge set E', an index set I, a node characteristic set X after arrangement, and a time difference value set D of delayed flight actual takeoff time and planned departure timediff-delayCo-packaging to a graph dataset;
(5) constructing four graph convolution neural network models of double-layer GCN, double-layer GAT, double-layer GraphSAGE and combination of single-layer GCN and single-layer GraphSAGE by using the created graph network structure G;
(6) selecting three machine learning models, dividing a training set, a verification set and a test set, training the three machine learning models and four constructed graph convolution neural network models combining double-layer GCN, double-layer GAT, double-layer GraphSAGE, single-layer GCN and single-layer GraphSAGE, and obtaining optimal models of the three machine learning models and the four graph convolution neural network models by adjusting parameters;
(7) and comparing and evaluating the model result by using the obtained three machine learning models and the optimal model of the four graph convolution neural network models.
2. The flight delay prediction method based on the stop-passing flight support process according to claim 1, wherein the specific process of the step (1) is as follows:
(11) checking the time context correlation of each collected flight time set, and eliminating abnormal values recorded in the time data;
(12) according to the difference length of the door opening time and the door closing time recorded in each flight time set, classifying the data of which the difference length exceeds 400, the door opening time is in the evening, and the door closing time is in the next morning into an overnight flight set Tp-overClassifying the data of which the difference length does not exceed 400 or does not satisfy the door opening time in the evening and the door closing time in the next morning into a non-overnight flight set Tp-nonover
(13) To collect the actual landing time t to each flightalPlanned arrival time teaActual takeoff time tadAnd planned departure time tedCleaning each sample flight time set data for the reference execution time of the flight; processing abnormal and missing time close to actual landing time and actual takeoff time in priority, and dividing time data to be processed into two types, namely first type time and actual landing time t of the flightalOr actual takeoff time tadAssociating a second type of time with the flightThe front and back are associated with each other by two or more times; aiming at two types of abnormal values or missing values, respectively collecting T in the overnight flight setp-overAnd non-overnight flight set Tp-nonoverCleaning by using a logical relation with complete time data on different process nodes;
(14) taking all time data after cleaning as an original data set TpWherein T ispComprising a set of overnight flights Tp-overAnd non-overnight flight set Tp-nonoverAnd p represents the number of flights.
3. The flight delay prediction method based on the stop-passing flight support process according to claim 2, wherein the step (13) specifically comprises:
(131) for the first class of outliers, if the actual taxi-starting time t of the ith flight isi-atIf the absence or the abnormality exists, the actual taxiing starting time formula of the ith flight is recalculated as follows;
Figure FDA0003586847560000021
in the formula, ti-at(cal) represents the recalculated actual taxi-starting time for the ith flight, p-name represents the number of flights in the data set, ti-adThe actual takeoff time of the ith flight is represented, and the p-noover and the p-over respectively represent the flight numbers of the non-overnight flights and the overnight flights;
(132) for the abnormal value of the second kind, if the actual door closing time t of the ith flighti-cdIf the door is missing or abnormal, the actual door closing time formula of the ith flight is obtained by recalculation as follows;
Figure FDA0003586847560000031
in the formula, ti-cd(cal) represents the recalculated actual door closing time of the ith flight, p-name generationNumber of flights in the table dataset, ti-boffIndicating the actual off-gear time, t, for the ith flighti-odRepresenting the actual opening door time of the ith flight and p-noover and p-over representing the number of flights for non-overnight flights and overnight flights, respectively.
4. The flight delay prediction method based on the outbound flight support process according to claim 1, wherein the specific process of the step (2) is as follows:
(21) computing a raw data set TpTime difference value set D between actual take-off time and planned departure time of each flightdiffThe formula is as follows:
Figure FDA0003586847560000032
in the formula (d)iRepresenting the differential time, t, calculated for the ith flighti-adRepresenting the actual departure time, t, for the ith flighti-edRepresenting the scheduled departure time of the ith flight;
(22) set of flights T at nightp-overAnd non-overnight flight set Tp-nonoverIn the middle, the delayed flight time sets T are divided respectivelydelayAnd non-delayed flight time set Tnon-delayTwo types, the formula is as follows;
Figure FDA0003586847560000033
in the formula, m represents the number of flow nodes contained in one flight, p-name represents the number of flights in the data set, and p-notover and p-over represent the number of flights of non-overnight flights and overnight flights respectively;
(23) set of flights T at nightp-overAnd non-overnight flight set Tp-nonoverRespectively calculating delayed flight time sets TdelayAnd non-delayed flight time set Tnon-delayTime difference value set on each subsequent-preorder flow node of medium flightDiffdelayAnd Diffnon-delayWherein the time difference value on the subsequent-preorder process nodes represents the difference value between the subsequent time and the preorder time on two adjacent process nodes, and comprises actual land falling time-planned arrival time, actual gear shifting time-actual landing time, door opening time-gear shifting time, boarding gate closing time-boarding gate opening time, door closing time-cabin door opening time, actual gear withdrawing time-door closing time, actual gear withdrawing time-boarding gate closing time, actual starting sliding time-actual gear withdrawing time, actual take-off time-actual starting sliding time and actual take-off time-planned departure time;
(24) respectively counting overnight flight sets Tp-overAnd non-overnight flight set Tp-nonoverMedium non-delayed flight time set Tnon-delayTime difference value set Diff at each of the following-preamble flow nodes ofnon-delayThe mean value, the upper quartile, the lower quartile, the maximum value and the minimum value of the model are used for drawing a time difference value set Diff on a subsequent-preamble flow node of the non-delay flight by using a box diagramnon-delaySelecting the data segment of the upper quartile QU and the lower quartile QL of each time difference on each subsequent-preorder process node as a standard time segment set Dstd
5. The flight delay prediction method based on the outbound flight support process according to claim 1, wherein the specific process of the step (3) is as follows:
(31) creating graph network structure G (V, E) aiming at used flight flow node data, G represents graph network structure after creation, VaE V represents non-null by a finite number of flow nodes V ═ V1,v2,v3,......vnV set of points, v1Represents the 1 st flow node, (v)a,vb) E denotes a finite edge E ═ E1,e2,e3,......ep-delay*(m-1)Set of edges, e1Representing the 1 st edge, and p-delay representing the number of flights that delay a flight;
(32) respectively constructing an adjacency matrix and a degree matrix on the graph aiming at the graph network structure G, wherein the formula is as follows:
Figure FDA0003586847560000041
Figure FDA0003586847560000042
in the formula, AabAn adjacency matrix representing the constructed graph network structure, a and b respectively represent the flow node number, vaAnd vbRespectively representing a flow node and b, R represents a real number field, N represents the number of all flow nodes of a delayed flight, N is p-delay m, p-delay represents the number of flight pieces of the delayed flight, m represents the number of the flow nodes contained in one flight, DabA degree matrix representing the network structure of the graph.
6. The flight delay prediction method based on the outbound flight support process according to claim 1, wherein the specific process of the step (4) is as follows:
(41) taking out the edge sets E of all delayed flight samples, converting the edge sets E through the process node numbers, sequentially arranging the edge sets E according to the order of the edges and the size of the process node numbers, and storing the arranged edge sets E' into a data set 1;
(42) arranging flow nodes belonging to different graphs in sequence according to indexes, wherein the flow nodes in each graph have the same index value to obtain an index set I, and storing the index set I into a data set 2;
Figure FDA0003586847560000051
in the formula, m represents the number of flow nodes contained in one flight, and p-delay represents the number of flights delaying the flight;
(43) using the indicia of non-delayed flights obtained in step (24)Set of punctual periods DstdRespectively associated with overnight flight set Tp-overAnd non-overnight flight set Tp-nonoverDelayed flight time set T indelayTime difference set Diff on a subsequent-preamble flow node of (1)delayComparing the flight time set T with the formula (8) to obtain the delayed flight time set TdelayTime difference value set Diff on nodes of following-preceding proceduredelaySet of distances T betweenstdSet the distances TstdThe feature input set X is used as a process node, the feature input set X of the process node is used as a node attribute and is sequentially arranged according to the flight sequence, and the arranged data is stored into a data set 3 according to a two-dimensional data format;
Figure FDA0003586847560000052
in the formula, TstdIndicating a set of delayed flight times TdelayTime difference value set Diff on nodes of following-preceding proceduredelaySet of distances between, TisDistance, QL, representing the s-th time difference of the ith flight from the standard time periodsLower quartile, QU, of standard period representing the s-th time differencesUpper quartile of standard period, Diff, representing the s-th time differencedelay-isA value representing the s time difference of the ith delayed flight, and p-delay represents the number of delayed flights;
(44) collecting the time difference values D of the actual departure time and the planned departure time of all delayed flights in the samplediff-delayAs graph attributes, arranging the time difference values in sequence according to flight sequence, and collecting the time difference values D according to a two-dimensional data formatdiff-delayStored in the data set 4 for the tag;
(45) and packaging the data set 1, the data set 2, the data set 3 and the data set 4 into a graph data set, so that the graph network structure taken out each time is a subset of the original graph network structure.
7. The flight delay prediction method based on the outbound flight support process according to claim 6, wherein the specific process of the step (5) is as follows:
(51) for the created graph network structure G, transmitting the features on the graph to the lower layer by using a propagation rule f, wherein the formula is as follows;
Figure FDA0003586847560000061
in the formula, H(l)Features representing the l-th layer, A represents a graph network architecture description of the flow nodes, where the adjacency matrix A is usedabDenotes, Z denotes output, X(0)Representing and calculating a characteristic input set of a model needing to be input;
(52) mapping the delivery of layers to specific data, the formula is expressed as follows:
X(l+1)=f(X(l),A) (10)
wherein f represents a propagation rule, X(l)A feature input set representing the flow node of the l layer;
(53) in the constructed GCN layer, the propagation formula of the propagation rule f is as follows:
Figure FDA0003586847560000062
in the formula, X(l+1)Representing the feature output set of the flow nodes of the l layers, D representing the degree matrix of the graph network structure of the input l layers, InShowing the self-circulation of the network structure of the graph, W(l)A convolution kernel representing the l-th layer, i.e. a learnable weight, σ represents a nonlinear transformation;
(54) constructing a GAT layer, and distributing different weights to different edges through attention coefficients;
(55) constructing a GraphSAGE layer, converting a full graph training mode of the process node characteristics into a small batch training mode taking the process node characteristics as the center through neighbor sampling, and performing characteristic aggregation on information of neighbor process nodes by adopting an aggregation function;
(56) and (2) aiming at different graph convolution neural network layers, considering the time of the station-crossing flight flow as characteristic input and a graph guide task needing to be predicted, and constructing a double-layer graph convolution neural network, wherein the method comprises the following steps of: a double-layer GCN layer, a double-layer GAT layer, a double-layer GraphSAGE layer, a single-layer GCN layer and a single-layer GraphSAGE layer are connected to form a graph neural network; each double-layer graph neural network is connected with the double-layer fully-connected neural network and the final pooling layer, and the node guide tasks and the edge guide tasks are converted into global graph guide tasks to form four different graph convolution neural network models.
8. The method for predicting flight delay based on the outbound flight support flow according to claim 7, wherein the step (54) specifically comprises:
(541) when the GAT layer is transmitted, the flow node v is calculated according to the characteristic that each flow node is connected with different neighbor flow nodesaTo flow node vbCoefficient of correlation e ofabFurther calculating the attention coefficient of each edge, wherein the formula is as follows;
Figure FDA0003586847560000071
in the formula, LeakyReLU represents an activation function, αabRepresents a flow node vaTo flow node vbThe attention coefficient of (b), k represents a certain neighbor process node of the process node (a),
Figure FDA0003586847560000076
a neighbor process node set representing a process node a;
(542) according to the attention coefficient obtained by calculation, weighting and summing the characteristics, wherein the formula is as follows;
Figure FDA0003586847560000072
x 'in the formula'aRepresenting that each process node fuses new features of neighborhood information, W represents a learnable weight, xbRepresenting a flow node vbThe features of (1);
(543) new feature x 'from generation of each flow node'aAnd generating a new flow node feature set X', and forming a GAT layer to transmit features by using the information transmission mode of the GCN layer.
9. The flight delay prediction method based on the outbound flight support flow according to claim 7, wherein the step (55) specifically comprises:
(551) after aggregating the characteristics of the neighbor process nodes, the GraphSAGE layer aggregates the characteristics of the neighbor process nodes and the characteristics of the process nodes, wherein the aggregation method is specifically expressed as follows;
Figure FDA0003586847560000073
wherein k represents the total iterative polymerization degree, W(k)Represents the weight to be learned at the k-th aggregation, σ represents the nonlinear transformation,
Figure FDA0003586847560000074
represents the characteristics of the a flow node after the k-1 aggregation, AGGREGATEkDenotes the k-th aggregation function, γ (v)a) A set of neighboring process nodes representing the a-th process node,
Figure FDA0003586847560000075
representing all the characteristics of the adjacent process nodes of the a-th process node, wherein CONCAT represents a function for splicing the characteristics of the a-th process node and the characteristics of the adjacent process nodes;
(552) performing L2 standardization on the aggregated characteristics of each process node, wherein the formula is as follows;
Figure FDA0003586847560000081
wherein V represents a set of flow nodes in a graph network structure,
Figure FDA0003586847560000082
and representing the characteristic vector of the a-th flow node after k times of aggregation.
10. The flight delay prediction method based on the outbound flight support process according to claim 7, wherein the specific process of the step (6) is as follows:
(61) the three selected machine learning models are a decision tree model, a random forest model and an XGboost model;
(62) partition training set XtrainVerification set XvalAnd test set Xtest
(63) Normalizing the data, putting the normalized data into each model, averaging the model results of each operation by adopting a K-fold cross validation method, training and adjusting parameters to obtain an optimal model of three machine learning models;
(64) and respectively inputting the training set data into four graph convolution neural network models combined by double-layer GCN, double-layer GAT, double-layer GraphSAGE and single-layer GCN and single-layer GraphSAGE for training, taking the average absolute error as the back propagation error to update the weight, and performing model training and parameter adjustment for multiple times to obtain the optimal model of the four graph convolution neural network models.
CN202210368680.7A 2022-04-08 2022-04-08 Flight delay prediction method based on outbound flight guarantee flow Active CN114781704B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210368680.7A CN114781704B (en) 2022-04-08 2022-04-08 Flight delay prediction method based on outbound flight guarantee flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210368680.7A CN114781704B (en) 2022-04-08 2022-04-08 Flight delay prediction method based on outbound flight guarantee flow

Publications (2)

Publication Number Publication Date
CN114781704A true CN114781704A (en) 2022-07-22
CN114781704B CN114781704B (en) 2024-09-20

Family

ID=82430013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210368680.7A Active CN114781704B (en) 2022-04-08 2022-04-08 Flight delay prediction method based on outbound flight guarantee flow

Country Status (1)

Country Link
CN (1) CN114781704B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468186A (en) * 2023-06-14 2023-07-21 中国民航大学 Flight delay time prediction method, electronic equipment and storage medium
CN116910664A (en) * 2023-07-12 2023-10-20 南京航空航天大学 Cascade model-based flight ground guarantee dynamic prediction method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401601A (en) * 2019-12-23 2020-07-10 南京航空航天大学 Flight take-off and landing time prediction method facing delay propagation
CN111599219A (en) * 2020-05-27 2020-08-28 中航信移动科技有限公司 Multi-data-source flight takeoff time prediction method based on sequencing learning
CN113449915A (en) * 2021-06-28 2021-09-28 中国电子科技集团公司第二十八研究所 Flight delay prediction method based on knowledge graph

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401601A (en) * 2019-12-23 2020-07-10 南京航空航天大学 Flight take-off and landing time prediction method facing delay propagation
CN111599219A (en) * 2020-05-27 2020-08-28 中航信移动科技有限公司 Multi-data-source flight takeoff time prediction method based on sequencing learning
CN113449915A (en) * 2021-06-28 2021-09-28 中国电子科技集团公司第二十八研究所 Flight delay prediction method based on knowledge graph

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李阳;聂党民;温祥西;: "基于LS-SVM的航空器进场飞行时间预测", 航空计算技术, no. 03, 25 May 2018 (2018-05-25), pages 78 - 81 *
羊钊 等: "基于地面保障流程的过站航班延误预测方法", 重庆交通大学学报(自然科学版), vol. 42, no. 9, 30 September 2023 (2023-09-30), pages 122 - 154 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468186A (en) * 2023-06-14 2023-07-21 中国民航大学 Flight delay time prediction method, electronic equipment and storage medium
CN116468186B (en) * 2023-06-14 2023-08-25 中国民航大学 Flight delay time prediction method, electronic equipment and storage medium
CN116910664A (en) * 2023-07-12 2023-10-20 南京航空航天大学 Cascade model-based flight ground guarantee dynamic prediction method
CN116910664B (en) * 2023-07-12 2024-04-19 南京航空航天大学 Cascade model-based flight ground guarantee dynamic prediction method

Also Published As

Publication number Publication date
CN114781704B (en) 2024-09-20

Similar Documents

Publication Publication Date Title
CN111401601B (en) Delay propagation-oriented flight take-off and landing time prediction method
CN110517482B (en) Short-term traffic flow prediction method based on 3D convolutional neural network
WO2021082393A1 (en) Airport surface variable slide-out time prediction method based on big data deep learning
CN114781704A (en) Flight delay prediction method based on station-passing flight guarantee process
Yin et al. Machine learning techniques for taxi-out time prediction with a macroscopic network topology
Choi et al. Artificial neural network models for airport capacity prediction
CN116468186B (en) Flight delay time prediction method, electronic equipment and storage medium
CN101582203A (en) Realization system and method for airspace running simulation airflow engine
CN113808396A (en) Traffic speed prediction method and system based on traffic flow data fusion
CN112365037A (en) Airport airspace flow prediction method based on long-term and short-term data prediction model
CN116862061A (en) Multi-machine-place flight delay prediction method based on space-time diagram convolutional neural network
CN112270445A (en) Flight delay wave and comprehensive evaluation method based on statistical analysis and classification prediction
Lai et al. Bottleneck Analysis in JFK Using Discrete Event Simulation: An Airport Queuing Model
Ma et al. Integrated optimization of arrival, departure, and surface operations
CN112926809B (en) Flight flow prediction method and system based on clustering and improved xgboost
CN115966107A (en) Airport traffic flow prediction method based on graph neural network
CN109345139B (en) Airport group efficiency and policy evaluation method based on system dynamics
CN118211034A (en) Multi-dimensional civil aviation passenger flow prediction method based on KNN regression model
WO2024114250A1 (en) Xgboost-based flight delay cause prediction method
CN112182059B (en) High-order analysis method for flight delay characteristics
CN113642162A (en) Simulation comprehensive analysis method for urban road traffic emergency plan
CN115759386B (en) Method and device for predicting flight execution result of civil aviation flight and electronic equipment
CN116882584A (en) Flight delay prediction method and system
CN109961085A (en) The method for building up and device of flight delay prediction model based on Bayesian Estimation
CN112766300A (en) Aviation big data preprocessing technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant