CN111415039B

CN111415039B - Flight delay mode analysis method based on non-negative tensor decomposition

Info

Publication number: CN111415039B
Application number: CN202010197961.1A
Authority: CN
Inventors: 杜文博; 曹先彬; 张明远; 陈莘文; 马彦
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2020-03-19
Filing date: 2020-03-19
Publication date: 2021-05-28
Anticipated expiration: 2040-03-19
Also published as: CN111415039A

Abstract

The invention discloses a flight delay pattern analysis method based on nonnegative tensor decomposition, and belongs to the field of civil aviation delay analysis. Firstly, collecting all historical flight record data of an airport, and extracting main variables from the historical data record of each flight; all the main variables are then constructed as a fourth order tensor x. And performing pattern recognition on the fourth-order tensor x by using a non-negative tensor decomposition method, and decomposing into the product of four non-negative factor matrixes and one non-negative core tensor. Modeling the modulus of the nonnegative factor matrix and the nonnegative core tensor as an optimization problem, and solving by using a gradient descent method to obtain an optimal decomposition result

Finally, the optimal decomposition result is obtained

Corresponding to the PLSA model based on the probability, carrying out statistical analysis on the distribution situation of each potential mode; therefore, delay conditions are predicted and judged, and then targeted measures are taken for management and control. The method is easy to operate, simple in steps, accurate in result and innovative.

Description

Flight delay mode analysis method based on non-negative tensor decomposition

Technical Field

The invention belongs to the field of civil aviation delay analysis, and particularly relates to a flight delay pattern analysis method based on non-negative tensor decomposition.

Background

With the rapid development of the civil aviation industry, flight delay becomes a serious problem, and inconvenience is brought to people. The delay in flights can cause inconvenience to passengers, lead to a reluctance to take an airplane to travel or divert to select another airline, and force airlines to incur the additional cost of airplane maintenance and insufficient fleet utilization. In addition, flight delay can also lead to increase of fuel consumption and carbon dioxide emission, and harm is brought to the environment; in addition to the direct effects described above, flight delays can negatively impact various aspects of the overall economy. In summary, flight delays are a serious and common problem with significant negative impact.

The complexity and difficulty of the flight delay problem is caused by a number of factors that are commonly summarized as exceptional weather and technical causes, including air traffic control, insufficient facility capacity, poor scheduling capability, program changes and limited buffer time, etc. The various factors described above make it difficult for people to understand the potential patterns of flight delays and to design appropriate strategies.

Recently, Han and Moutarde prove that the driving method based on historical observation data is not restricted by the past and is suitable for researching the potential dynamic characteristic of flight delay; thus, a way to facilitate system cognition and decision making is to leverage and learn historical data. For example, when severe weather is encountered, people may look for days with similar past weather conditions and refer to actions taken by air traffic controllers on those days.

Some previous studies have focused on identifying fixed patterns in air traffic management. Liu et al propose a semi-supervised learning algorithm that groups similar days as determined in different patterns. The first step is to measure the similarity between hourly weather forecasts and then determine the most similar days. The authors carried out two case studies at the new wak free international airport (EWR) using this method, demonstrating the effectiveness of this method. Mukherjee et al propose a pattern classification method based on the influence of severe weather conditions, using weather indicators as input, and applying factor analysis to determine the primary weather pattern. Then, the days were clustered using Ward's least square method. Days belonging to the same constellation have similar weather patterns.

In addition to weather patterns, some studies have attempted to determine similar days from other perspectives. Grabbe et al use a k-means clustering algorithm to identify similar days in the ground delay program, applying an expected maximization algorithm to data such as the start and end times and the predetermined arrival rates of the ground delay program. The studies by Zhou et al and Sternberg et al focused on using air traffic flow and flight delays to identify similar patterns. Gorriplay et al measured the principal components of demand and capacity data, but after cluster analysis of the data, it was found that neither capacity nor demand existed in natural patterns. Abdel-Aty et al found that some patterns could not be detected by statistical methods by identifying periodic arrival delay patterns of domestic fly-by flights.

Despite the many previous efforts, problems still exist in accurately understanding flight delay patterns. As mentioned earlier, finding clusters or patterns in spatio-temporal historical data is an effective method, but due to the high dimensionality of these data, it is difficult to find different euclidean spatial patterns in a number of studies. Therefore, proposed by Mislevy, the potential distribution analysis, the potential feature analysis, the membership degree analysis and other methods do not directly mine the mode, but form a low-dimensional projection subspace by using the features obtained by tensor factorization, so that the potential clustering structure of the space-time traffic dynamic mode is enhanced. Therefore, these methods open up new fields in the field of traffic science. Such as city flow analysis, traffic speed prediction, missing traffic data completion, ship track recovery, and the like.

Disclosure of Invention

Based on the problems, the invention provides a flight delay pattern analysis method based on non-negative tensor decomposition, which can be used for establishing a predictor based on a potential pattern, making a preliminary judgment on the situation of flight delay, better making arrangement for flights of an airport and being beneficial to making better decisions for the airport.

The flight delay pattern analysis method comprises the following steps:

step one, collecting all historical flight record data of an airport, and extracting main variables from the historical data record of each flight;

the main variables include the departure airport, departure time, departure date and delay level of the flight;

manually numbering all departure airports in sequence, wherein each airport corresponds to a unique number; the set is denoted as {1,2,3, …, Z }; z is an integer.

The departure date of the flight refers to: the days of the year are numbered in sequence, namely 1 month and 1 day is numbered 1,1 month and 2 days are numbered 2, and the like, and 12 months and 31 days are numbered 365/366. The departure date of each flight is the number of the corresponding specific days on the departure day; the set is denoted as {1,2,3 … …,365/366 }.

The departure time of the flight is: dividing the time of one day into every hour, wherein the number of flights taking off from 0 point to 1 point corresponds to the number 1, the number of flights taking off from 1 point to 2 points corresponds to the number 2, and so on, and the number of flights taking off from 23 points to 24 points corresponds to the number 24; the set is denoted as {1,2,3, … …,24 }.

The delay levels are divided into four according to actual conditions: (1) <15min, (2)15-45min, (3)45-90min, (4) >90min, and represent punctuality, slight delay, moderate delay, and severe delay with {1,2,3,4} respectively.

Step two, constructing a fourth-order tensor by the main variables extracted from all flights

Fourth order tensor

The method comprises four dimensions which respectively correspond to a departure airport, departure time, departure date and delay level;

step three, using a non-negative tensor decomposition method to carry out fourth-order tensor

Performing pattern recognition, and decomposing into four non-negative factor matrixes and a non-negative core tensorThe product of modulus of (a).

The non-negative tensor resolution formulation is as follows:

wherein

Representing a core tensor; a. the⁽¹⁾,A⁽²⁾,A⁽³⁾,A⁽⁴⁾Is a matrix of four factors, A⁽¹⁾Corresponding to the potential pattern matrix of flights in the dimension of the departure airport, A⁽²⁾Corresponding to the potential matrix of flights in the departure time dimension, A⁽³⁾Corresponding to the potential pattern matrix of flights in the dimension of departure date, A⁽⁴⁾Corresponding to the potential pattern matrix of the flight in the delay level dimension.

Set of departure airports divided into R by potential patterns₁Class, set of departure times divided by potential patterns into R₂Class, set of departure dates divided into R by latent patterns₃Class, set of delay levels are divided into R by potential mode₄Class; each potential class corresponds to a respective subset.

Step four, modeling the modulus of the non-negative factor matrix and the non-negative core tensor as an optimization problem, and solving by using a gradient descent method to obtain an optimal decomposition result

The optimization problem is as follows:

minimization

Wherein the constraint condition is a core tensor

Sum factor matrix A⁽ⁿ⁾Are all non-negative, n is 1,2,3, 4. The F-norm is the sum of the squares of the values at each position, taken in the root.

The solved optimal decomposition result is as follows:

is the original fourth order tensor

The set of all the elements in the tree is decomposed into a form of multiplying a plurality of matrixes; i.e. i₁Some potential class r representing the dimensions of the departure airport₁A corresponding subset of departure airports; i.e. i₂Some potential class r representing the departure time dimension₂A corresponding subset of departure times; i.e. i₃Some potential class r representing the dimension of the departure date₃A corresponding subset of departure dates; i.e. i₄Representing a potential class r of flights in the dimension of the delay level₄A corresponding delay level subset;

represents i_1*r₁The stored elements are corresponding subsets of each potential class of the departure airport;

represents i_2*r₂The stored elements are corresponding subsets of potential classes of flights at departure time;

represents i_3*r₃The stored elements are corresponding subsets of potential classes of flights on departure dates;

represents i_4*r₄The stored elements are corresponding subsets of potential classes of flights in delay level dimensions;

representing the core tensor

Decomposed sub-tensors with dimension r_1*r_2*r_3*r₄(ii) a Representing potential classes r of departure airport dimensions₁Potential class of departure times r₂Potential class of departure date r₃And latency class r of delay level₄The interaction relationship between them. r is₁，r₂，r₃，r₄Respectively, the dimensions after the non-negative tensor decomposition.

Step five, the optimal decomposition result is obtained

Corresponding to the PLSA model based on the probability, carrying out statistical analysis on the distribution situation of each potential mode; therefore, delay conditions are predicted and judged, and then targeted measures are taken for management and control.

The specific corresponding process is as follows:

firstly, aiming at all flight sets S with fixed departure time, departure date and departure airport, converting the number of flights in the flight sets S into the proportion of all flights;

all flights refer to all historical flights of the airport collected.

Then, decomposing the proportion according to the optimal decomposition result to obtain a core tensor

Seen as a probability tensor p in the PLSA model_t(ii) a Factor matrix

Viewed as a probabilistic form

Thus, the corresponding PLSA model is:

and T is the decomposed potential class number obtained by solving the optimization problem by using a gradient descent method.

The analysis process is as follows:

aiming at a certain flight A, when a departure airport, a departure date and a departure time are respectively determined, the potential class r 'of the departure airport can be correspondingly found'₁Potential class of departure time r'₂Potential class of departure date r'₃Then through the core sub-tensor

Finding potential class r 'with the departure airport'₁Potential class of departure time r'₂And potential class r of departure date'₃The associated delay level, in each potential mode, looks up the probability of flight A taking off on time, the probability of slight delay, the probability of moderate delay, and the probability of severe delay.

The invention has the advantages and positive effects that:

(1) the flight delay pattern analysis method based on the nonnegative tensor decomposition proves that potential class analysis based on the tensor decomposition has effectiveness in identifying the flight delay pattern through data processing and analysis.

(2) The flight delay pattern analysis method based on the non-negative tensor decomposition is easy to operate, simple in steps, accurate in result and innovative.

Drawings

FIG. 1 is a flow chart of a flight delay pattern analysis method based on non-negative tensor decomposition according to the present invention;

Detailed Description

The present invention will be described in further detail and with reference to the accompanying drawings so that those skilled in the art can understand and practice the invention.

The aim of the invention is to use a large amount of air traffic data to learn about potential flight delay patterns. Flight records are first collected using statistical methods and modeled as an n-order tensor. Considering the nature of the data set, flight records are viewed as multivariate observations sampled from a common distribution, with a latent model similar to tensor analysis being used to identify the primary mode for each mode. Then, a predictor based on the underlying patterns is proposed to show the efficiency of the framework.

Specifically, the flight delay pattern analysis method based on non-negative tensor decomposition includes the following steps, as shown in fig. 1:

extracting a plurality of main variables from the historical data record of each flight, wherein the main variables are respectively a departure airport, departure time, departure date, delay level, weather condition and the like of the flight;

the departure airports can number all the departure airports according to a certain rule, such as an alphabetical sequence table, and the numbers correspond to the airports one to one.

The departure date of the flight can be expressed by the number of the specific days in the year, i.e., 1 month, 1 day, 1 month, 2 days, and so on, and 12 months, 31 days, 365/366 days. If the departure date of a flight is 1 month and 30 days, the departure date number of the flight is 30.

The 24-hour value of the takeoff time of the flight is represented and is divided into 24 numbers, and each number corresponds to one hour in one day; for a 14 o' clock 30 takeoff flight, the takeoff time would correspond to 15.

The delay levels are classified into four levels according to the relevant regulations and practices of the Federal Aviation Administration (FAA) in the united states: (1) <15min, (2)15-45min, (3)45-90min, (4) >90min, and represent punctuality, slight delay, moderate delay, and severe delay with {1,2,3,4} respectively.

A larger nuclear tensor can contain more information reflecting the synthetic relationships between different modalities, but a small nuclear tensor can facilitate interpretation of the results.

The present invention only studies include four dimensions: fourth-order tensor for departure airport, departure time, departure date and delay level

An N-th order dimension that can be extended to include other variables is within the scope of the present application.

Pattern recognition is performed and decomposed into the product of four non-negative factor matrices and a non-negative core tensor.

Assuming that flight records are multivariate variables sampled from a general distribution, each observation comprises four variables, identifying a mode of each independent variable by using a correlation algorithm, describing historical record data of each flight by using a kernel tensor, and performing statistical analysis on the distribution condition of each mode in the variables;

the method utilizes the Tucker decomposition to realize the analysis of potential modes, the Tucker decomposition is also called as high-order singular value decomposition, a tensor is decomposed into a core tensor, a factor matrix is multiplied along each mode, and the factor matrix on each mode is called as a base matrix or a principal component of the tensor on each mode. By third order tensor

For example, it can be obtained by Tucker decomposition

Three factor matrices and a kernel tensor

In the third order tensor form, there are:

a, B, C are a matrix of modulo-N factors,

is the nuclear tensor, which represents the degree of interaction and relation between the different components.

In a higher dimension, there are

Wherein the content of the first and second substances,

representing the core tensor: (

The value representing each cell in the core tensor),

represents a factor matrix, h_dThe number of time factor matrix patterns.

The core tensor G characterizes the degree of interaction between the different modes, so each element of the core tensor can be expressed as:

the pattern recognition is performed for each single variable using non-Negative Tensor Decomposition (NTD), which isDecomposition for finding a non-negative N-th order tensor

As a non-negative core tensor

And N non-negative mode matrices

The product of modulus of (a). When the data must be decomposed into a set of additive components, it can decompose a tensor into a set of matrices and a core tensor.

The non-negative tensor resolution formula corresponding to the fourth order tensor is as follows:

wherein

Representing the core tensor. A. the⁽¹⁾,A⁽²⁾,A⁽³⁾,A⁽⁴⁾Is a matrix of four factors, considered as the principal component of the modulus. A. the⁽¹⁾Corresponding to the potential pattern matrix of flights in the dimension of the departure airport, A⁽²⁾Corresponding to the potential matrix of flights in the departure time dimension, A⁽³⁾Corresponding to the potential pattern matrix of flights in the dimension of departure date, A⁽⁴⁾Corresponding to the potential pattern matrix of the flight in the delay level dimension.

The optimization problem is as follows

Minimization

Wherein the constraint condition is a core tensor

The solved optimal decomposition result is as follows:

is the original fourth order tensor

representing the core tensor

Step five, the optimal decomposition result is obtained

The specific corresponding process is as follows:

all flights refer to all historical flights of the airport collected.

Seen as a probability tensor p in the PLSA model_t；p_tAs a proportion of the total number of flights for each potential class. Factor matrix

Viewed as a probabilistic form

Is the probability of occurrence of such potential class under a certain classification.

Thus, the corresponding PLSA model is:

Analyzing the proposed potential classes, and performing statistical analysis on the distribution condition of each mode in the variables; the kernel tensor is used to describe the relevance of different dimensional modes of each flight, such as weather and geographic position, time and geographic position, the relationship between departure airport and departure date, and the like. And describing the distribution characteristics of each dimension of the flight by using the factor matrix. The analysis process is as follows:

aiming at a certain flight A, when a departure airport, a departure date and a departure time are respectively determined, the submarine of the departure airport can be correspondingly foundR's'₁Potential class of departure time r'₂Potential class of departure date r'₃Then through the core sub-tensor

From the decomposition result, the operation of the flight in the dimension of the airport can be known to be divided into several modes; the operation in the starting date dimension is divided into several modes, such as a winter mode and a summer mode; the method is beneficial to finding the operation mode of the flight from the flight big data, so that the delay condition is predicted and judged, and further, the targeted measures are taken for management and control.

Claims

1. A flight delay pattern analysis method based on non-negative tensor decomposition is characterized by comprising the following steps:

Fourth order tensor

Comprises four dimensions which respectively correspond to an departure airport and an exit airportTime of departure, date of departure, and delay level;

Performing pattern recognition, and decomposing the pattern recognition into a product of four nonnegative factor matrixes and a nonnegative core tensor;

the non-negative tensor resolution formulation is as follows:

wherein

Representing a core tensor; a. the⁽¹⁾,A⁽²⁾,A⁽³⁾,A⁽⁴⁾Is a matrix of four factors, A⁽¹⁾Corresponding to the potential pattern matrix of flights in the dimension of the departure airport, A⁽²⁾Corresponding to the potential matrix of flights in the departure time dimension, A⁽³⁾Corresponding to the potential pattern matrix of flights in the dimension of departure date, A⁽⁴⁾A potential mode matrix of the corresponding flight on the dimension of the delay level;

set of departure airports divided into R by potential patterns₁Class, set of departure times divided by potential patterns into R₂Class, set of departure dates divided into R by latent patterns₃Class, set of delay levels are divided into R by potential mode₄Class; each potential class corresponds to a respective subset;

The optimization problem is as follows:

minimization

Wherein the constraint condition is a core tensor

Sum factor matrix A⁽ⁿ⁾Are all non-negative, n is 1,2,3, 4; the F norm is obtained by summing the squares of the values of each position in an evolution mode;

the solved optimal decomposition result is as follows:

is the original fourth order tensor

represents i₁*r₁The stored elements are corresponding subsets of each potential class of the departure airport;

represents i₂*r₂The stored elements are corresponding subsets of potential classes of flights at departure time;

represents i₃*r₃The stored elements are corresponding subsets of potential classes of flights on departure dates;

represents i₄*r₄The stored elements are corresponding subsets of potential classes of flights in delay level dimensions;

representing the core tensor

Decomposed sub-tensors with dimension r₁*r₂*r₃*r₄(ii) a Representing potential classes r of departure airport dimensions₁Potential class of departure times r₂Potential class of departure date r₃And latency class r of delay level₄The interaction relationship between the two; r is₁，r₂，r₃，r₄Respectively representing the dimensions after the non-negative tensor decomposition;

step five, the optimal decomposition result is obtained

Corresponding to the PLSA model based on the probability, carrying out statistical analysis on the distribution situation of each potential mode; therefore, delay conditions are predicted and judged, and then targeted measures are taken for management and control;

the specific corresponding process is as follows:

all flights refer to all historical flights of the airport collected;

Seen as a probability tensor p in the PLSA model_t(ii) a Factor matrix

Viewed as a probabilistic form

Thus, the corresponding PLSA model is:

2. The flight delay pattern analysis method based on the non-negative tensor decomposition as set forth in claim 1, wherein the main variables in the first step comprise the departure airport, departure time, departure date and delay level of the flight;

manually numbering all departure airports in sequence, wherein each airport corresponds to a unique number; the set is denoted as {1,2,3, …, Z }; z is an integer;

the departure date of the flight refers to: numbering the days in the year in sequence, namely numbering 1 on 1 month and 1 day, numbering 2 on 1 month and 2 days, and so on, numbering 365/366 on 12 months and 31 days; the departure date of each flight is the number of the corresponding specific days on the departure day; the set is denoted as {1,2,3 … …,365/366 };

the departure time of the flight is: dividing the time of one day into every hour, wherein the number of flights taking off from 0 point to 1 point corresponds to the number 1, the number of flights taking off from 1 point to 2 points corresponds to the number 2, and so on, and the number of flights taking off from 23 points to 24 points corresponds to the number 24; the set is denoted as {1,2,3, … …,24 };

3. The flight delay pattern analysis method based on the non-negative tensor decomposition as set forth in claim 1, wherein the analysis process in the fifth step is as follows: