WO2021082394A1 - Layout-variable taxiing-out time prediction system based on big data deep learning - Google Patents

Layout-variable taxiing-out time prediction system based on big data deep learning Download PDF

Info

Publication number
WO2021082394A1
WO2021082394A1 PCT/CN2020/089916 CN2020089916W WO2021082394A1 WO 2021082394 A1 WO2021082394 A1 WO 2021082394A1 CN 2020089916 W CN2020089916 W CN 2020089916W WO 2021082394 A1 WO2021082394 A1 WO 2021082394A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
data
out time
scene
feature
Prior art date
Application number
PCT/CN2020/089916
Other languages
French (fr)
Chinese (zh)
Inventor
周龙
Original Assignee
南京智慧航空研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京智慧航空研究院有限公司 filed Critical 南京智慧航空研究院有限公司
Publication of WO2021082394A1 publication Critical patent/WO2021082394A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • G06Q50/40

Definitions

  • the invention relates to the field of airport traffic control, in particular to a scene variable sliding-out time prediction system based on big data deep learning.
  • the simulation model uses the existing airport topology model, conflict detection and resolution as factors, and obtains the taxi-out time by simulating the operation of all incoming and outgoing aircraft on the ground.
  • the simulation model has strong pertinence and is not universally applicable to different airports.
  • Previous research on analytical models mainly focused on models such as linear regression, and there were also some studies that tried to use machine learning techniques.
  • determining the main factors affecting the taxi time is a focus of the research.
  • Analytical models usually have shortcomings such as incomplete influencing factors, and their actual reference value is weak and cannot meet actual application requirements.
  • the purpose of the present invention is to provide a scene variable sliding-out time prediction system based on deep learning of big data, so as to achieve the purpose of improving the comprehensiveness of influencing factors in the analysis model.
  • the present invention provides a scene variable sliding-out time prediction system based on big data deep learning, including:
  • the data set establishment module is suitable for obtaining historical operating data and performing data cleaning to obtain the data set
  • Index definition and quantification module suitable for defining and quantifying traffic condition indexes of traffic characteristics on the scene
  • the feature set extraction module is suitable for analyzing and extracting the feature set that affects the sliding-out time of the scene based on the data set and the traffic condition index;
  • the model building module is suitable for establishing the prediction model of the time of the scene sliding out through the integrated machine learning method based on the feature set;
  • the prediction module is suitable for completing the prediction of the airport surface slide-out time through the surface slide-out time prediction model.
  • the data set establishment module includes:
  • the original data set acquisition unit is suitable for acquiring historical operating data to construct the original data set
  • the data cleaning unit is suitable for data cleaning of the original data set
  • the data set acquisition unit is suitable for data integration of the original data set to acquire the data set
  • the data set dividing unit is suitable for dividing the data set into a training set and a test set.
  • the indicator definition and quantification module includes:
  • the network topology acquisition unit is suitable for adopting the macroscopic spatiotemporal network topology model to model the traffic situation of the airport surface to obtain the macroscopic spatiotemporal network topology;
  • the quantification unit is suitable for defining and quantifying four types of indicators reflecting the traffic volume of the scene based on the macroscopic spatio-temporal network topology structure.
  • the feature set extraction module includes:
  • the original feature set extraction unit is suitable for extracting features that affect the sliding time of the scene from the data set and traffic condition indicators and form the original feature set;
  • Feature analysis unit suitable for feature analysis of the features in the original feature set
  • the feature set building unit is suitable for building a feature set based on the result of feature analysis.
  • the feature analysis unit is to use one or more of the three of the correlation measurement correlation coefficient, standardized mutual information and factor analysis to perform feature analysis on the features of the original feature set;
  • Correlation measurement The correlation coefficient reflects the statistic of the degree of linear correlation between two variables. Its value is [-1,1]. The larger the absolute value, the stronger the degree of linear correlation. A positive value indicates a positive correlation, and a negative value indicates a negative correlation.
  • X and Y are used to refer to any two variables, and the correlation coefficient P X, Y is defined as:
  • Cov(X, Y) is the covariance of X and Y
  • ⁇ X and ⁇ Y are the standard deviations of X and Y
  • ⁇ X and ⁇ Y are the mean values of X and Y
  • Standardized mutual information is a commonly used correlation measure, and its value range is [0, 1]. The larger the value, the greater the degree of correlation between variables.
  • the standardized mutual information U X, Y is defined as:
  • I X, Y are the mutual information of X and Y
  • H X and H Y are the respective entropy of X and Y
  • p(x,y) is the joint probability distribution of X and Y
  • p(x) is the probability distribution of X and Y
  • model establishment module includes:
  • the initial model acquisition unit is suitable for acquiring the initial model by using the feature set as the input of the integrated learning model GBRT;
  • the training unit is suitable for training the initial model and adjusting the value of hyperparameters to complete the establishment of the prediction model for the time of the scene sliding out
  • the training unit is:
  • model establishment module also includes
  • the model testing unit is suitable for using the test set to verify and evaluate the performance of the scene sliding-out time prediction model.
  • N is the number of samples in the test set
  • o i is the actual taxi time of the i-th sample
  • p i is the predicted taxi time of the model.
  • the beneficial effect of the present invention is that the present invention provides a scene variable sliding-out time prediction system based on big data deep learning.
  • the scene variable sliding-out time prediction system based on deep learning of big data includes: a data set establishment module, which is suitable for obtaining historical operating data and data cleaning to obtain a data set; an index definition and quantification module, which is suitable for defining and quantifying the traffic characteristics of the scene The traffic condition index;
  • the feature set extraction module is suitable for analyzing and extracting the feature set that affects the scene slip-out time based on the data set and the traffic condition index;
  • the model building module is suitable for establishing the scene slip-out time through the integrated machine learning method based on the feature set
  • the prediction model, the prediction module is suitable for predicting the airport surface slip-out time through the surface slip-out time prediction model. . Process the original recorded data of the airport, model the traffic conditions of the airport scene, analyze and extract the factors affecting taxi time, train the GBRT integrated learning model, and then obtain the taxi time prediction model, which
  • Fig. 1 is a schematic block diagram of a scene variable sliding-out time prediction system based on big data deep learning provided by the present invention.
  • Fig. 2 is the macroscopic spatio-temporal network topology structure of the coasting process provided by the present invention.
  • Fig. 3 is the correlation coefficient of the correlation measurement between the candidate influencing factors and the sliding-out time provided by the present invention.
  • Fig. 4 is the standardized mutual information relationship between candidate influencing factors and slip-out time provided by the present invention.
  • Figure 5 is a factor analysis result diagram of candidate influencing factors provided by the present invention.
  • Figure 6 is a diagram of the performance change process of the model training and testing phases provided by the present invention.
  • this embodiment 1 provides a scene variable sliding-out time prediction system based on big data deep learning, processing the original recorded data of the airport, modeling the airport scene traffic conditions, analyzing and extracting the impact of taxi time Factors, train the GBRT integrated learning model, and then obtain the sliding-out time prediction model, which provides a data basis for the management and optimization of airport operations.
  • the scene variable sliding-out time prediction system based on big data deep learning includes:
  • the data set establishment module is suitable for obtaining historical operating data and performing data cleaning to obtain the data set
  • Index definition and quantification module suitable for defining and quantifying traffic condition indexes of traffic characteristics on the scene
  • the feature set extraction module is suitable for analyzing and extracting the feature set that affects the sliding-out time of the scene based on the data set and the traffic condition index;
  • the model building module is suitable for establishing the prediction model of the time of the scene sliding out through the integrated machine learning method based on the feature set,
  • the prediction module is suitable for completing the prediction of the airport surface slide-out time through the surface slide-out time prediction model.
  • the data set establishment module includes:
  • the original data set obtaining unit is suitable for obtaining historical operating data to construct an original data set.
  • the data cleaning unit is suitable for data cleaning of the original data set.
  • the data type of all attributes and the basic check whether it is out of bounds are performed, and then the delimited detection method is used to further check the outliers for some attributes.
  • the attribute value range is defined, and the data whose value is not in the corresponding range is regarded as an abnormal value.
  • data entries containing outliers are deleted from the data set.
  • the attribute value range is shown in the following table.
  • the data set acquisition unit is suitable for data integration of the original data set to acquire the data set.
  • this step includes redundant attribute identification, data type conversion, and logic error checking. Identify and delete redundant attributes, identify redundant attributes that carry less information by calculating the information entropy of each attribute, and identify redundant attributes contained in other attributes by calculating mutual information between attributes.
  • the redundant attributes "departure airport" and "execution date" have been deleted. Convert the data type to convert the information that is only used for identification in the non-numeric attribute into an integer value type that is easy to follow-up processing and use.
  • the information contained in the "restricted content” attribute is difficult to quantify and is deleted after comprehensive consideration.
  • Check for logical errors consider the physical meaning of each feature, establish constraint relationships between features, and eliminate logical errors. Check the correspondence between the model and the number of engines, check the sequence of time nodes in the scene operation, and directly delete the information items with logical errors.
  • the data set dividing unit is suitable for dividing the data set into a training set and a test set.
  • the data set is divided into two parts, namely the training set and the test set.
  • 90% of the data is the training set used in the training phase of the model, and 10% of the data is used as the test set to verify the effectiveness and robustness of the model. "That is to say, the training set and the test set are of the same origin.
  • 10% of the data set is reserved for testing before the machine learning model training, and the remaining data set is used for testing. The next 90% is used as the training set to train the machine learning model.
  • the indicator definition and quantification module includes:
  • the network topology acquisition unit is suitable for adopting the macroscopic spatiotemporal network topology model to model the traffic situation of the airport surface to obtain the macroscopic spatiotemporal network topology;
  • the macroscopic spatio-temporal network topology model is used to model the traffic situation of the airport surface.
  • Figure 2 visualizes the general situation of the network topology during the taxiing process of departure and arrival in any time and space. In the actual operation of the airport scene, the processes of sliding in and sliding out are mutually coupled and interdependent. Therefore, the influence of the port arrival on the port departure process is also considered in the model.
  • the spatio-temporal network topology model is a general framework for describing the macro resource flow of the airport system. As shown in Figure 2, the departure d 1 ,..., d 4 represent all four different relationships with the reference departure flight d 0, which are " Before launch, before takeoff", “before launch, after takeoff”, “after launch, before takeoff” and "after launch, after takeoff”.
  • inbound a 1 ,..., a 4 represent all four different relationships with the reference inbound flight a 0 , namely "before landing, before arrival”, “before landing, after arrival”, and “after landing” , Before it is in place” and "after it is in place, after it is in place”.
  • t on t in represents the landing time and arrival time of the reference inbound flight a 0.
  • t out t off means the launch time and departure time of the reference departure flight.
  • represents the time threshold of arrival and departure.
  • the quantification unit is suitable for defining and quantifying four types of indicators reflecting the traffic volume of the scene based on the macroscopic spatio-temporal network topology structure.
  • SIFIs surface instantaneous flow index
  • SCFIs surface cumulative flow index
  • AQLIs aircraft queue length index
  • SRDIs slot resource demand index
  • SIFIs include D-SIFI and A-SIFI, which respectively represent the number of taxi departure and arrival flights when d 0 is launched from the boarding gate.
  • SCFIs include D-SCFI and A-SCFI, which respectively represent the amount of overlap between the taxi period d 0 and the taxi period of the departing and arriving aircraft.
  • AQLIs include D-AQLI and A-AQLI, which respectively represent the number of aircraft taking off and landing on the runway during the entire taxiing process d 0.
  • SRDIs include D-SRDI and A-SRDI, which represent the number of aircraft launched and landed during the departure slot of aircraft d 0 [t 0 - ⁇ , t 0 + ⁇ ].
  • the value of ⁇ can be set between 10 minutes and 30 minutes.
  • the feature set extraction module includes:
  • the original feature set extraction unit is suitable for extracting features that affect the sliding time of the scene from the data set and the traffic condition indicators to form the original feature set.
  • the data set building module and the relevant factors that affect the time of the scene sliding out acquired by the index definition and quantification module are sorted to form the original feature set. Process the original feature set, and extract new features from the original feature set to replace some of the features in the original feature set.
  • the factors related to the sliding-out time of the affected scene acquired by the data set building module are: flight number, flight attributes, destination airport, planned departure time, aircraft type, airline company, launch time, actual departure time, departure runway, departure stand , Parking space type, engine type, corridor entrance, restricted or not, boarding gate.
  • the time-related factors of the impact scene obtained by S120 are: D-SIFI, D-SCFI, D-AQLI, D-SRDI, Corridor_NO.
  • the final acquired original feature set that is, the candidate influencing factors are shown in the following table:
  • Feature analysis unit suitable for feature analysis of the features in the original feature set
  • the feature set building unit is suitable for building a feature set based on the result of feature analysis.
  • the feature analysis unit includes:
  • Figure 3 Figure 4, and Figure 5 respectively show the candidate influencing factors and the slip-out Time Pearson correlation coefficient, standardized mutual information between candidate influencing factors and slip-out time, and factor analysis results of candidate influencing factors.
  • Correlation measurement The correlation coefficient reflects the statistic of the degree of linear correlation between two variables. Its value is [-1,1]. The larger the absolute value, the stronger the degree of linear correlation. A positive value indicates a positive correlation, and a negative value indicates a negative correlation.
  • X and Y are used to refer to any two variables, and the correlation coefficient P X, Y is defined as:
  • Cov(X, Y) is the covariance of X and Y
  • ⁇ X and ⁇ Y are the standard deviations of X and Y
  • ⁇ X and ⁇ Y are the mean values of X and Y
  • Standardized mutual information is a commonly used correlation measure, and its value range is [0, 1]. The larger the value, the greater the degree of correlation between variables.
  • the standardized mutual information U X, Y is defined as:
  • I X, Y are the mutual information of X and Y
  • H X and H Y are the respective entropy of X and Y
  • p(x,y) is the joint probability distribution of X and Y
  • p(x) is the probability distribution of X and Y
  • the model establishment module includes:
  • the initial model acquisition unit is suitable for acquiring the initial model by using the feature set as the input of the integrated learning model GBRT;
  • the training unit is suitable for training the initial model and adjusting the values of hyperparameters to complete the establishment of the prediction model for the time to slide out of the scene.
  • the training unit is: select "maximum depth” as the control method for controlling the decision tree; select “least squares” as the loss function; under the optimal product value, select the maximum learning rate that can maintain stable performance And the corresponding minimum number of estimators; according to the overall data distribution of the sliding-out time in the training set, the minimum sample is set to be divided into 200; the initial model training is completed to establish the scene sliding-out time prediction model.
  • the GradientBoostedRegressionTrees (GBRT) model which is a typical representative of integrated learning, is used to complete the prediction operation of the slide-out time of the scene.
  • the feature set obtained in step S133 is used as the input of the model, and the GBRT model is quickly trained by executing the algorithm in the scikit-learn library.
  • the hyperparameters that need to be set are: decision tree size control, loss function type, number of estimators and learning rate, and minimum sample partition. There are two options for controlling the size of the decision tree, which are "max_depth” and "max_leaf_nodes”.
  • the GBRT model F(x) is an additive model of the following form:
  • h m (x) is a basis function, usually called a weak learner under the concept of boosting
  • ⁇ m is the corresponding weight of the weak learner
  • M is the sum of the number of weak learners.
  • GBRT uses a fixed-size decision tree as a weak learner. Similar to other boosting algorithm ideas, GBRT greedily builds an additive model:
  • Fm(x) represents the G B R T model obtained in the mth iteration.
  • h m(x) is given by inferred.
  • n is the total number of training samples
  • L is the selected loss function
  • yi is the label of the i-th sample
  • Fm-1(xi) is the prediction value of the i-th sample of the GBRT model obtained in the m-1th iteration
  • h (xi) is the predicted value of the i-th sample of the weak learner to be obtained. While ⁇ m is determined by inferred.
  • n is the total number of training samples
  • L is the selected loss function
  • yi is the label of the i-th sample
  • Fm-1(xi) is the prediction value of the i-th sample obtained by the GBRT model obtained in the m-1 iteration
  • the initial model F 0 is related to the problem. For least squares regression, the average value of the target value is usually selected.
  • model establishment module further includes:
  • the model testing unit is suitable for using the test set to verify and evaluate the performance of the scene sliding-out time prediction model.
  • test set is used to verify the scene sliding-out time prediction model and the performance evaluation in the performance evaluation adopts the mean square error, and the calculation formula is:
  • N is the number of samples in the test set
  • o i is the actual taxi time of the i-th sample
  • p i is the predicted taxi time of the model.
  • the MSE is used to monitor the performance changes of the model during training and testing, and the result is shown in FIG. 6.
  • MSE reached 2.5 in the training set, and the performance on the test set was 5.5.
  • the MSE performance of the training set and the test set it reflects the generalization ability of the model to a certain extent.
  • the table below compares the prediction accuracy within different error ranges of the test set.
  • 85.7% of the data sets have a taxi time error within 3 minutes; more than 93% of the data have a prediction error of 4 minutes; about 96.5% of the data have an error of less than 5 minutes. From the verification results on the above test set, it can be seen that the designed data mining model and algorithm can better meet the accuracy requirements of the actual scene dynamic sliding-out time prediction task.
  • the present invention provides a scene variable sliding-out time prediction system based on big data deep learning.
  • the scene variable sliding-out time prediction system based on deep learning of big data includes: a data set establishment module, which is suitable for obtaining historical operating data and data cleaning to obtain a data set; an index definition and quantification module, which is suitable for defining and quantifying the traffic characteristics of the scene The traffic condition index;
  • the feature set extraction module is suitable for analyzing and extracting the feature set that affects the scene slip-out time based on the data set and the traffic condition index;
  • the model building module is suitable for establishing the scene slip-out time through the integrated machine learning method based on the feature set
  • the prediction model, the prediction module is suitable for predicting the airport surface slip-out time through the surface slip-out time prediction model. . Process the original recorded data of the airport, model the traffic conditions of the airport scene, analyze and extract the factors affecting taxi time, train the GBRT integrated learning model, and then obtain the taxi time prediction model, which provides a data basis for the management and

Abstract

Disclosed is a layout-variable taxiing-out time prediction system based on big data deep learning. The system comprises: a data set establishment module adapted for acquiring historical operation data and performing data cleaning to obtain a data set; an indicator defining and quantifying module adapted for defining and quantifying a traffic condition indicator of a layout traffic characteristic; a feature set extraction module adapted for analyzing and extracting a feature set, which influences layout taxiing-out time, on the basis of the data set and the traffic condition indicator; a model establishment module adapted for establishing a layout taxiing-out time prediction model by an integrated machine learning method according to the feature set; and a prediction module adapted for completing prediction of airport layout taxiing-out time by means of the layout taxiing-out time prediction model. Airport original record data is processed, modeling is performed on airport layout traffic conditions, taxiing time influence factors are analyzed and extracted, and a GBRT integrated learning model is trained, such that the taxiing-out time prediction model is obtained, and a data basis is provided for management and optimization of airport operations.

Description

基于大数据深度学习的场面可变滑出时间预测系统A scene variable sliding-out time prediction system based on big data deep learning 技术领域Technical field
本发明涉及机场交通管制领域,具体涉及一种基于大数据深度学习的场面可变滑出时间预测系统。The invention relates to the field of airport traffic control, in particular to a scene variable sliding-out time prediction system based on big data deep learning.
背景技术Background technique
现有技术中,飞行器滑出时间预测大多从两方面建立模型:仿真和分析。仿真模型使用了已有的机场拓扑结构模型、冲突探测以及解决方案作为因素,通过仿真地面上所有进离场航空器的运行进而获取滑出时间。仿真模型具有很强的针对性,对不同机场没有很好的普适性。分析模型的以往研究主要聚焦在线性回归等模型上,也有一些尝试使用机器学习技术的研究。对于分析模型而言,确定影响滑行时间的主要因素是研究的一个侧重点。分析模型通常有影响因素不全等缺点,实际参考价值较弱,不能满足实际应用需求。In the prior art, most of the aircraft’s taxi-out time prediction models are built from two aspects: simulation and analysis. The simulation model uses the existing airport topology model, conflict detection and resolution as factors, and obtains the taxi-out time by simulating the operation of all incoming and outgoing aircraft on the ground. The simulation model has strong pertinence and is not universally applicable to different airports. Previous research on analytical models mainly focused on models such as linear regression, and there were also some studies that tried to use machine learning techniques. For the analysis model, determining the main factors affecting the taxi time is a focus of the research. Analytical models usually have shortcomings such as incomplete influencing factors, and their actual reference value is weak and cannot meet actual application requirements.
如何解决上述问题,是目前亟待解决的。How to solve the above-mentioned problems is urgently needed at present.
发明内容Summary of the invention
本发明的目的是提供一种基于大数据深度学习的场面可变滑出时间预测系统,以实现提高分析模型中影响因素的全面性的目的。The purpose of the present invention is to provide a scene variable sliding-out time prediction system based on deep learning of big data, so as to achieve the purpose of improving the comprehensiveness of influencing factors in the analysis model.
为了解决上述技术问题,本发明提供了一种基于大数据深度学习的场面可变滑出时间预测系统,包括:In order to solve the above technical problems, the present invention provides a scene variable sliding-out time prediction system based on big data deep learning, including:
数据集建立模块,适于获取历史运行数据并进行数据清洗从而获得数据集;The data set establishment module is suitable for obtaining historical operating data and performing data cleaning to obtain the data set;
指标定义及量化模块,适于定义并量化场面交通特性的交通状况指标;Index definition and quantification module, suitable for defining and quantifying traffic condition indexes of traffic characteristics on the scene;
特征集提取模块,适于基于数据集以及交通状况指标分析和提取影响场面滑出时间的特征集;The feature set extraction module is suitable for analyzing and extracting the feature set that affects the sliding-out time of the scene based on the data set and the traffic condition index;
模型建立模块,适于依据特征集通过集成机器学习方法建立场面滑出时间预测模型;The model building module is suitable for establishing the prediction model of the time of the scene sliding out through the integrated machine learning method based on the feature set;
预测模块,适于通过场面滑出时间预测模型完成对机场场面滑出时间的预测。The prediction module is suitable for completing the prediction of the airport surface slide-out time through the surface slide-out time prediction model.
进一步的,所述数据集建立模块包括:Further, the data set establishment module includes:
原始数据集获取单元,适于获取历史运行数据构建原始数据集;The original data set acquisition unit is suitable for acquiring historical operating data to construct the original data set;
数据清理单元,适于对原始数据集进行数据清理;The data cleaning unit is suitable for data cleaning of the original data set;
数据集获取单元,适于将原始数据集进行数据集成获取数据集;The data set acquisition unit is suitable for data integration of the original data set to acquire the data set;
数据集划分单元,适于将数据集分为训练集以及测试集。The data set dividing unit is suitable for dividing the data set into a training set and a test set.
进一步的,所述指标定义及量化模块包括:Further, the indicator definition and quantification module includes:
网络拓扑结构获取单元,适于采用宏观时空网络拓扑模型,对机场场面运行交通态势进行建模获取宏观时空网络拓扑结构;The network topology acquisition unit is suitable for adopting the macroscopic spatiotemporal network topology model to model the traffic situation of the airport surface to obtain the macroscopic spatiotemporal network topology;
量化单元,适于基于宏观时空网络拓扑结构,定义体现场面交通量的四类指标并进行量化。The quantification unit is suitable for defining and quantifying four types of indicators reflecting the traffic volume of the scene based on the macroscopic spatio-temporal network topology structure.
进一步的,所述特征集提取模块包括:Further, the feature set extraction module includes:
原始特征集提取单元,适于从数据集以及交通状况指标提取影响场面滑出时间的特征并构成原始特征集;The original feature set extraction unit is suitable for extracting features that affect the sliding time of the scene from the data set and traffic condition indicators and form the original feature set;
特征分析单元,适于对原始特征集中的特征进行特征分析Feature analysis unit, suitable for feature analysis of the features in the original feature set
特征集构建单元,适于依据特征分析结果构建特征集。The feature set building unit is suitable for building a feature set based on the result of feature analysis.
进一步的,所述特征分析单元,即:采用相关性度量相关系数、标准化互信息以及因子分析三者中的一种或多种对原始特征集重的特征进行特征分析;Further, the feature analysis unit is to use one or more of the three of the correlation measurement correlation coefficient, standardized mutual information and factor analysis to perform feature analysis on the features of the original feature set;
相关性度量相关系数反映两个变量线性相关程度的统计量,其取值为[-1,1],绝对值越大表示线性相关程度越强,正值表示正相关,负值表示负相关,用X、Y代指任意两个变量,相关性度量相关系数P X,Y的定义为: Correlation measurement The correlation coefficient reflects the statistic of the degree of linear correlation between two variables. Its value is [-1,1]. The larger the absolute value, the stronger the degree of linear correlation. A positive value indicates a positive correlation, and a negative value indicates a negative correlation. X and Y are used to refer to any two variables, and the correlation coefficient P X, Y is defined as:
Figure PCTCN2020089916-appb-000001
Figure PCTCN2020089916-appb-000001
其中Cov(X,Y)为X与Y的协方差,σ X、σ Y为X、Y的标准差,μ X、μ Y为X、Y的均值; Where Cov(X, Y) is the covariance of X and Y, σ X and σ Y are the standard deviations of X and Y, and μ X and μ Y are the mean values of X and Y;
标准化互信息是常用相关度量其取值范围为[0,1],值越大表示变量间的相关程度越大,标准化互信息U X,Y的定义为: Standardized mutual information is a commonly used correlation measure, and its value range is [0, 1]. The larger the value, the greater the degree of correlation between variables. The standardized mutual information U X, Y is defined as:
Figure PCTCN2020089916-appb-000002
Figure PCTCN2020089916-appb-000002
其中,I X,Y为X、Y的互信息,H X、H Y为X、Y各自的信息熵,p(x,y)为X、Y的联合概率分布,p(x)、p(y)为X、Y各自的概率分布 Among them, I X, Y are the mutual information of X and Y, H X and H Y are the respective entropy of X and Y, p(x,y) is the joint probability distribution of X and Y, p(x), p( y) is the probability distribution of X and Y
因子分析,即,提取到的特征x是完全被潜在影响因子z控制的,表达式为=Az+ε,其中A为系数矩阵,ε为误差,加以影响因子之间互相独立、影响因子与误差互相独立,最终推导得出:∑ x=AA T+∑ ε,其中∑表示协方差矩阵,从而可以求出A与z。 Factor analysis, that is, the extracted feature x is completely controlled by the potential influence factor z, the expression is =Az+ε, where A is the coefficient matrix, and ε is the error, plus the mutual independence between the influence factors, the influence factor and the error Independent of each other, the final derivation is: ∑ x =AA T +∑ ε , where ∑ represents the covariance matrix, so that A and z can be calculated.
进一步的,所述模型建立模块包括:Further, the model establishment module includes:
初始模型获取单元,适于将特征集作为集成学习模型GBRT的输入获取初始模型;The initial model acquisition unit is suitable for acquiring the initial model by using the feature set as the input of the integrated learning model GBRT;
训练单元,适于对初始模型进行训练并调整超参数取值从而完成场面滑出时间预测模型的建立The training unit is suitable for training the initial model and adjusting the value of hyperparameters to complete the establishment of the prediction model for the time of the scene sliding out
进一步的,所述训练单元,即:Further, the training unit is:
选取最大深度作为控制决策树的控制方式;Select the maximum depth as the control method to control the decision tree;
选取最小二乘作为损失函数;Choose least squares as the loss function;
最优乘积值下,选择能保持性能稳定下最大的学习率和相应最小的估计器数量;Under the optimal product value, select the largest learning rate and the corresponding smallest number of estimators that can maintain stable performance;
根据训练集中滑出时间的整体数据分布,设置最小样本划分为200;According to the overall data distribution of the sliding-out time in the training set, set the minimum sample to be divided into 200;
完成对初始模型的训练从而建立场面滑出时间预测模型。Complete the training of the initial model to establish a prediction model for the time to slide out of the scene.
进一步的,所述模型建立模块还包括Further, the model establishment module also includes
模型测试单元,适于使用测试集对场面滑出时间预测模型进行验证并进行性能评估。The model testing unit is suitable for using the test set to verify and evaluate the performance of the scene sliding-out time prediction model.
进一步的,所述模型测试单元中的性能评估采用均方误差,计算公式为:Further, the performance evaluation in the model test unit adopts the mean square error, and the calculation formula is:
Figure PCTCN2020089916-appb-000003
Figure PCTCN2020089916-appb-000003
其中N为测试集样本数量,o i为第i个样本的实际滑行时间,p i为模型的预测滑行时间。 Where N is the number of samples in the test set, o i is the actual taxi time of the i-th sample, and p i is the predicted taxi time of the model.
本发明的有益效果是,本发明提供了一种基于大数据深度学习的场面可变滑出时间预测系统。基于大数据深度学习的场面可变滑出时间预测系统包括:数据集建立模块,适于获取历史运行数据并进行数据清洗从而获得数据集;指标定义及量化模块,适于定义并量化场面交通特性的交通状况指标;特征集提取模块,适于基于数据集以及交通状况指标分析和提取影响场面滑出时间的特征集;模型建立模块,适于依据特征集通过集成机器学习方法建立场面滑出时间预测模型,预测模块,适于通过场面滑出时间预测模型完成对机场场面滑出时间的预测。。处理机场原始记录数据,对机场场面交通状况进行建模,分析和提取滑行时间影响因素,训练GBRT集成学习模型,进而得到滑出时间预测模型,为机场运行的管理和优化提供数据依据。The beneficial effect of the present invention is that the present invention provides a scene variable sliding-out time prediction system based on big data deep learning. The scene variable sliding-out time prediction system based on deep learning of big data includes: a data set establishment module, which is suitable for obtaining historical operating data and data cleaning to obtain a data set; an index definition and quantification module, which is suitable for defining and quantifying the traffic characteristics of the scene The traffic condition index; The feature set extraction module is suitable for analyzing and extracting the feature set that affects the scene slip-out time based on the data set and the traffic condition index; the model building module is suitable for establishing the scene slip-out time through the integrated machine learning method based on the feature set The prediction model, the prediction module, is suitable for predicting the airport surface slip-out time through the surface slip-out time prediction model. . Process the original recorded data of the airport, model the traffic conditions of the airport scene, analyze and extract the factors affecting taxi time, train the GBRT integrated learning model, and then obtain the taxi time prediction model, which provides a data basis for the management and optimization of airport operations.
附图说明Description of the drawings
下面结合附图和实施例对本发明进一步说明。The present invention will be further described below in conjunction with the drawings and embodiments.
图1是本发明所提供的基于大数据深度学习的场面可变滑出时间预测系统的原理框图。Fig. 1 is a schematic block diagram of a scene variable sliding-out time prediction system based on big data deep learning provided by the present invention.
图2是本发明所提供的滑行过程宏观时空网络拓扑结构。Fig. 2 is the macroscopic spatio-temporal network topology structure of the coasting process provided by the present invention.
图3是本发明所提供的候选影响因素与滑出时间的相关性度量相关系数。Fig. 3 is the correlation coefficient of the correlation measurement between the candidate influencing factors and the sliding-out time provided by the present invention.
图4是本发明所提供的候选影响因素与滑出时间的标准化互信息关系。Fig. 4 is the standardized mutual information relationship between candidate influencing factors and slip-out time provided by the present invention.
图5是本发明所提供的候选影响因素的因子分析结果图。Figure 5 is a factor analysis result diagram of candidate influencing factors provided by the present invention.
图6是本发明所提供的模型训练与测试阶段性能变化过程图。Figure 6 is a diagram of the performance change process of the model training and testing phases provided by the present invention.
具体实施方式Detailed ways
现在结合附图对本发明作进一步详细的说明。这些附图均为简化的示意图,仅以示意方式说明本发明的基本结构,因此其仅显示与本发明有关的构成。The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are all simplified schematic diagrams, which merely illustrate the basic structure of the present invention in a schematic manner, so they only show the constitutions related to the present invention.
实施例1Example 1
如图1所示,本实施例1提供了一种基于大数据深度学习的场面可变滑出时间预测系统,处理机场原始记录数据,对机场场面交通状况进行建模,分析和提取滑行时间影响因素,训练GBRT集成学习模型,进而得到滑出时间预测模型,为机场运行的管理和优化提供数据依据。具体的,基于大数据深度学习的场面可变滑出时间预测系统包括:As shown in Figure 1, this embodiment 1 provides a scene variable sliding-out time prediction system based on big data deep learning, processing the original recorded data of the airport, modeling the airport scene traffic conditions, analyzing and extracting the impact of taxi time Factors, train the GBRT integrated learning model, and then obtain the sliding-out time prediction model, which provides a data basis for the management and optimization of airport operations. Specifically, the scene variable sliding-out time prediction system based on big data deep learning includes:
数据集建立模块,适于获取历史运行数据并进行数据清洗从而获得数据集;The data set establishment module is suitable for obtaining historical operating data and performing data cleaning to obtain the data set;
指标定义及量化模块,适于定义并量化场面交通特性的交通状况指标;Index definition and quantification module, suitable for defining and quantifying traffic condition indexes of traffic characteristics on the scene;
特征集提取模块,适于基于数据集以及交通状况指标分析和提取影响场面滑出时间的特征集;The feature set extraction module is suitable for analyzing and extracting the feature set that affects the sliding-out time of the scene based on the data set and the traffic condition index;
模型建立模块,适于依据特征集通过集成机器学习方法建立场面滑出时间预测模型,The model building module is suitable for establishing the prediction model of the time of the scene sliding out through the integrated machine learning method based on the feature set,
预测模块,适于通过场面滑出时间预测模型完成对机场场面滑出时间的预测。The prediction module is suitable for completing the prediction of the airport surface slide-out time through the surface slide-out time prediction model.
在本实施例中,数据集建立模块包括:In this embodiment, the data set establishment module includes:
原始数据集获取单元,适于获取历史运行数据构建原始数据集。The original data set obtaining unit is suitable for obtaining historical operating data to construct an original data set.
具体的,从机场场面运行数据库中尽可能多的提取数据,构成机场航班离港运行原始数据集。收集滑行轨迹相关信息,包括离港跑道、离港停机位、走廊口编号、滑行长度等;收集航班属性相关信息,包括航班号、航班类型、机型、所属航司、引擎类型等;收集交通管制相关信息,包括是否受限、管制员信息、通话信息、延误情况、本场气象、机场通播等;收集飞行计划相关信息,包括起飞机场、目的机场、计划起飞时间、计划撤轮档时间、航路点信息等;收集滑行过程实录信息,包括撤轮档时间、推出时间、请求/许可开车时间、实际起飞时间、滑行速度、跑道头等待时间等。Specifically, extract as much data as possible from the airport surface operation database to form the original data set of airport flight departure operations. Collect taxi trajectory related information, including departure runway, departure stand, corridor entrance number, taxi length, etc.; collect flight attribute related information, including flight number, flight type, aircraft type, airline company, engine type, etc.; collect traffic Control-related information, including whether it is restricted, controller information, call information, delays, local weather, airport broadcasts, etc.; collect flight plan-related information, including departure airport, destination airport, planned departure time, and planned withdrawal Time, waypoint information, etc.; collect the actual recorded information of the taxiing process, including the time to withdraw the gear, the time to roll out, the requested/permitted driving time, the actual take-off time, taxi speed, waiting time at the end of the runway, etc.
数据清理单元,适于对原始数据集进行数据清理。The data cleaning unit is suitable for data cleaning of the original data set.
具体的,考虑机场实获数据集的情况,为实际工作制定具体的处理方案。在缺失值处理方面,采用设置默认缺省值和直接删除两种方法。设置默认缺省值,为“是否受限”设置默认缺省值“否”,为“受限内容”设置默认缺省值“无”。在默认缺省值填充完毕后,直接删除了信息缺失超过半数的属性,包括“请求开车”、“许可开车”、“撤轮挡时间”、“尾流”、“滑行速度”、“离场排队数”。之后,对数据集进行完备性检查,删除缺失信息的数据条目。在异常值处理方面,首先对所有属性的数据进行数据类型和是否越界的基本检查,再采用定界检测法对部分属性进一步检查异常值。基于机场场面运行实际情况为属性划定取值范围,将取值不在对应范围内的数据视作异常值。最后从数据集中删除含有异常值的数据条目。属性取值范围如下表所示。Specifically, consider the actual data set obtained by the airport, and formulate a specific processing plan for actual work. In terms of processing missing values, two methods are used: setting default values and deleting directly. Set the default default value, set the default default value "No" for "Restricted or not", set the default default value "None" for "Restricted content". After the default values are filled in, the attributes with more than half of the information missing are directly deleted, including "request to drive", "permission to drive", "block withdrawal time", "wake", "taxi speed", and "departure" Number of queues". After that, the completeness of the data set is checked, and data entries with missing information are deleted. In the aspect of outlier processing, firstly, the data type of all attributes and the basic check whether it is out of bounds are performed, and then the delimited detection method is used to further check the outliers for some attributes. Based on the actual situation of the airport surface operation, the attribute value range is defined, and the data whose value is not in the corresponding range is regarded as an abnormal value. Finally, data entries containing outliers are deleted from the data set. The attribute value range is shown in the following table.
部分属性的取值范围Value range of some attributes
Figure PCTCN2020089916-appb-000004
Figure PCTCN2020089916-appb-000004
Figure PCTCN2020089916-appb-000005
Figure PCTCN2020089916-appb-000005
数据集获取单元,适于将原始数据集进行数据集成获取数据集。The data set acquisition unit is suitable for data integration of the original data set to acquire the data set.
具体的,本步骤包含冗余属性识别、数据类型转换以及逻辑错误检验的工作。识别并删除冗余属性,通过计算各属性的信息熵识别携带信息较少的冗余属性,通过计算属性间的互信息识别信息被其他属性包含的冗余属性。删除了冗余属性“起飞机场”、“执行日期”。转换数据类型,将非数值型属性中仅具标识作用的信息转换为易于后续处理和使用的整数值型。“受限内容”属性包含的信息难以量化,综合考虑后予以删除。检查逻辑错误,考虑各特征的物理意义,建立特征间的约束关系,排除逻辑错误。检查机型与引擎数量的对应关系,检查场面运行中各时间节点的先后关系,直接删除存在逻辑错误的信息条目。Specifically, this step includes redundant attribute identification, data type conversion, and logic error checking. Identify and delete redundant attributes, identify redundant attributes that carry less information by calculating the information entropy of each attribute, and identify redundant attributes contained in other attributes by calculating mutual information between attributes. The redundant attributes "departure airport" and "execution date" have been deleted. Convert the data type to convert the information that is only used for identification in the non-numeric attribute into an integer value type that is easy to follow-up processing and use. The information contained in the "restricted content" attribute is difficult to quantify and is deleted after comprehensive consideration. Check for logical errors, consider the physical meaning of each feature, establish constraint relationships between features, and eliminate logical errors. Check the correspondence between the model and the number of engines, check the sequence of time nodes in the scene operation, and directly delete the information items with logical errors.
数据集划分单元,适于将数据集分为训练集以及测试集。The data set dividing unit is suitable for dividing the data set into a training set and a test set.
具体的,数据集被划分为两个部分,分别是训练集和测试集。其中90%的数据为训练集用于模型的训练阶段,而10%的数据作为测试集被用于验证模型有效性和鲁棒性。”也就是说,训练集与测试集是同源同性质的。在得到最终处理好的数据集之后,在机器学习模型训练之前,从数据集中预留10%用作测试,将数据集中的剩下90%当做训练集训练机器学习模型。Specifically, the data set is divided into two parts, namely the training set and the test set. 90% of the data is the training set used in the training phase of the model, and 10% of the data is used as the test set to verify the effectiveness and robustness of the model. "That is to say, the training set and the test set are of the same origin. After the final processed data set is obtained, 10% of the data set is reserved for testing before the machine learning model training, and the remaining data set is used for testing. The next 90% is used as the training set to train the machine learning model.
在本实施例中,指标定义及量化模块包括:In this embodiment, the indicator definition and quantification module includes:
网络拓扑结构获取单元,适于采用宏观时空网络拓扑模型,对机场场面运行交通态势进行建模获取宏观时空网络拓扑结构;The network topology acquisition unit is suitable for adopting the macroscopic spatiotemporal network topology model to model the traffic situation of the airport surface to obtain the macroscopic spatiotemporal network topology;
具体的,采用宏观时空网络拓扑模型,对机场场面运行交通态势进行建模。图2可视化了在任何时空域下离港和进港的滑行过程中网络拓扑的一般情况。在机场场面的实际运行中,滑入和滑出的过程是相互耦合、相互依存的。因此,在模型中同时考虑到进港对出港过程的影响。时空网络拓扑模型是描述机场系统宏观资源流动的通用框架,如图2所示,离港d 1,...,d 4表示与参考离港航班d 0的所有四种不同关系,分别是“推出前,起飞前”、“推出前,起飞后”、“推出后,起飞前”以及“推出后,起飞后”。相似地,进港a 1,...,a 4表示与参考进港航班a 0的所有四种不同关系,分别是“落地前,到位前”、“落地前,到位后”、“落地后,到位前”以及“落地后,到位后”。t on,t in表示参考进港航班a 0的落地时间和到位时间。t out,t off表示参考离港航班的推出时间和起飞时间。δ表示进港和离港的时间阈值。 Specifically, the macroscopic spatio-temporal network topology model is used to model the traffic situation of the airport surface. Figure 2 visualizes the general situation of the network topology during the taxiing process of departure and arrival in any time and space. In the actual operation of the airport scene, the processes of sliding in and sliding out are mutually coupled and interdependent. Therefore, the influence of the port arrival on the port departure process is also considered in the model. The spatio-temporal network topology model is a general framework for describing the macro resource flow of the airport system. As shown in Figure 2, the departure d 1 ,..., d 4 represent all four different relationships with the reference departure flight d 0, which are " Before launch, before takeoff", "before launch, after takeoff", "after launch, before takeoff" and "after launch, after takeoff". Similarly, inbound a 1 ,..., a 4 represent all four different relationships with the reference inbound flight a 0 , namely "before landing, before arrival", "before landing, after arrival", and "after landing" , Before it is in place" and "after it is in place, after it is in place". t on , t in represents the landing time and arrival time of the reference inbound flight a 0. t out , t off means the launch time and departure time of the reference departure flight. δ represents the time threshold of arrival and departure.
量化单元,适于基于宏观时空网络拓扑结构,定义体现场面交通量的四类指标并进行量化。The quantification unit is suitable for defining and quantifying four types of indicators reflecting the traffic volume of the scene based on the macroscopic spatio-temporal network topology structure.
具体的,基于宏观时空网络拓扑结构,定义了体现场面交通量的四类共八个指标。这四类分别是场面瞬时流量指数(SIFIs)、场面累积流量指数(SCFIs)、飞机排队长度指数(AQLIs)和槽资源需求指数(SRDIs)。每个类别中计算两个统计量,分别是离港航空器的数量(前缀为D-)和进港航空器的数量(前缀为A-)。下表显示了以d 0为参考离港航班在图2情况下的各种统计量。 Specifically, based on the macroscopic spatio-temporal network topology, a total of eight indicators in four categories that reflect on-site traffic are defined. These four categories are the surface instantaneous flow index (SIFIs), the surface cumulative flow index (SCFIs), the aircraft queue length index (AQLIs) and the slot resource demand index (SRDIs). Two statistics are calculated for each category, namely the number of departing aircraft (prefixed with D-) and the number of arriving aircraft (prefixed with A-). The following table shows the various statistics of the departure flight in Figure 2 with d 0 as the reference.
离港航班d 0场面交通态势指标统计结果 Statistic results of traffic situation indicators for departure flight d 0
Figure PCTCN2020089916-appb-000006
Figure PCTCN2020089916-appb-000006
以图2为例,下面详细介绍了表1中指标的定义和计算方法。对于任何离场航班d 0,SIFIs包括D-SIFI和A-SIFI,分别表示当d 0从登机口推出时,滑行离港和进港的航班数量。SCFIs包括D-SCFI和A-SCFI,分别表示离港和进港航空器的滑行周期与d 0滑行周期重叠的数量。AQLIs包括D-AQLI和A-AQLI,分别表示d 0整个滑行过程中在跑道上的起飞和降落的航空器数量。SRDIs包括D-SRDI和A-SRDI,表示在航空器d 0的离港槽[t 0-δ,t 0+δ]期间推出和降落航空器的数量。一般来说,δ的取值可以设置为10分钟到30分钟之间。 Taking Figure 2 as an example, the definition and calculation method of the indicators in Table 1 are described in detail below. For any departure flight d 0 , SIFIs include D-SIFI and A-SIFI, which respectively represent the number of taxi departure and arrival flights when d 0 is launched from the boarding gate. SCFIs include D-SCFI and A-SCFI, which respectively represent the amount of overlap between the taxi period d 0 and the taxi period of the departing and arriving aircraft. AQLIs include D-AQLI and A-AQLI, which respectively represent the number of aircraft taking off and landing on the runway during the entire taxiing process d 0. SRDIs include D-SRDI and A-SRDI, which represent the number of aircraft launched and landed during the departure slot of aircraft d 0 [t 0 -δ, t 0 +δ]. Generally speaking, the value of δ can be set between 10 minutes and 30 minutes.
在本实施例中,特征集提取模块包括:In this embodiment, the feature set extraction module includes:
原始特征集提取单元,适于从数据集以及交通状况指标提取影响场面滑出时间的特征并构成原始特征集。The original feature set extraction unit is suitable for extracting features that affect the sliding time of the scene from the data set and the traffic condition indicators to form the original feature set.
具体的,对数据集建立模块以及指标定义及量化模块获取的影响场面滑出时间的相关因素进行整理,构成原始特征集。处理原始特征集,从原始特征中提取新特征替换原始特征集中的部分特征。Specifically, the data set building module and the relevant factors that affect the time of the scene sliding out acquired by the index definition and quantification module are sorted to form the original feature set. Process the original feature set, and extract new features from the original feature set to replace some of the features in the original feature set.
数据集建立模块获取的影响场面滑出时间相关因素为:航班号、航班属性、目的机场、计划起飞时间、机型、所属航司、推出时间、实际起飞时间、离港跑道、离港停机位、停机位类型、引擎类型、走廊口、是否受限、登机口。S120获取的影响场面画出时间相关因素为:D-SIFI、D-SCFI、D-AQLI、D-SRDI、Corridor_NO。使用推出时间与实际起飞时间之差作为场面滑行时间,替代原特征。从计划起飞时间中提取月、日、周、小时、分钟新特征替代原特征。对停机位、登机口特征进行进一步划分和分析。提取跑道与机位/登机口的对应关系作为新特征。最终获取的原始特征集即候选影响因素如下表所示:The factors related to the sliding-out time of the affected scene acquired by the data set building module are: flight number, flight attributes, destination airport, planned departure time, aircraft type, airline company, launch time, actual departure time, departure runway, departure stand , Parking space type, engine type, corridor entrance, restricted or not, boarding gate. The time-related factors of the impact scene obtained by S120 are: D-SIFI, D-SCFI, D-AQLI, D-SRDI, Corridor_NO. Use the difference between the roll-out time and the actual take-off time as the taxi time of the scene, instead of the original feature. Extract new features of month, day, week, hour, and minute from the planned departure time to replace the original feature. Further divide and analyze the characteristics of parking bays and gates. The corresponding relationship between the runway and the stand/gate is extracted as a new feature. The final acquired original feature set, that is, the candidate influencing factors are shown in the following table:
候选影响因素Candidate influence factors
Figure PCTCN2020089916-appb-000007
Figure PCTCN2020089916-appb-000007
特征分析单元,适于对原始特征集中的特征进行特征分析Feature analysis unit, suitable for feature analysis of the features in the original feature set
特征集构建单元,适于依据特征分析结果构建特征集。The feature set building unit is suitable for building a feature set based on the result of feature analysis.
具体的,基于特征分析单元的分析结果,从原始特征集提取单元形成的原始特征集中选出重要特征,构成用于集成机器学习模型的特征集。筛除部分与场面滑行时间相关性较小的特征。包括“引擎类型”、“停机位类型”、“月”、“周”、“日”、“分钟”。最终获取的特征集即影响因素如下表所示:Specifically, based on the analysis result of the feature analysis unit, important features are selected from the original feature set formed by the original feature set extraction unit to form a feature set for the integrated machine learning model. The features that are less correlated with the coasting time of the surface are screened out. Including "engine type", "slot type", "month", "week", "day", "minute". The final feature set, namely the influencing factors, is shown in the following table:
最终选取的影响因素The final selection of influencing factors
Figure PCTCN2020089916-appb-000008
Figure PCTCN2020089916-appb-000008
在本实施例中,特征分析单元包括:In this embodiment, the feature analysis unit includes:
采用相关性度量相关系数、标准化互信息以及因子分析三者中的一种或多种对原始特征集重的特征进行特征分析,图3、图4、图5分别展示了候选影响因素与滑出时间的Pearson相关系数、候选影响因素与滑出时间的标准化互信息以及候选影响因素的因子分析结果。Use one or more of the correlation coefficient, standardized mutual information, and factor analysis to perform feature analysis on the features of the original feature set. Figure 3, Figure 4, and Figure 5 respectively show the candidate influencing factors and the slip-out Time Pearson correlation coefficient, standardized mutual information between candidate influencing factors and slip-out time, and factor analysis results of candidate influencing factors.
相关性度量相关系数反映两个变量线性相关程度的统计量,其取值为[-1,1],绝对值越大表示线性相关程度越强,正值表示正相关,负值表示负相关,用X、Y代指任意两个变量,相关性度量相关系数P X,Y的定义为: Correlation measurement The correlation coefficient reflects the statistic of the degree of linear correlation between two variables. Its value is [-1,1]. The larger the absolute value, the stronger the degree of linear correlation. A positive value indicates a positive correlation, and a negative value indicates a negative correlation. X and Y are used to refer to any two variables, and the correlation coefficient P X, Y is defined as:
Figure PCTCN2020089916-appb-000009
Figure PCTCN2020089916-appb-000009
其中Cov(X,Y)为X与Y的协方差,σ X、σ Y为X、Y的标准差,μ X、μ Y为X、Y的均值; Where Cov(X, Y) is the covariance of X and Y, σ X and σ Y are the standard deviations of X and Y, and μ X and μ Y are the mean values of X and Y;
标准化互信息是常用相关度量其取值范围为[0,1],值越大表示变量间的相关程度越大,标准化互信息U X,Y的定义为: Standardized mutual information is a commonly used correlation measure, and its value range is [0, 1]. The larger the value, the greater the degree of correlation between variables. The standardized mutual information U X, Y is defined as:
Figure PCTCN2020089916-appb-000010
Figure PCTCN2020089916-appb-000010
其中,I X,Y为X、Y的互信息,H X、H Y为X、Y各自的信息熵,p(x,y)为X、Y的联合概率分 布,p(x)、p(y)为X、Y各自的概率分布 Among them, I X, Y are the mutual information of X and Y, H X and H Y are the respective entropy of X and Y, p(x,y) is the joint probability distribution of X and Y, p(x), p( y) is the probability distribution of X and Y
因子分析,即,提取到的特征x是完全被潜在影响因子z控制的,表达式为=Az+ε,其中A为系数矩阵,ε为误差,加以影响因子之间互相独立、影响因子与误差互相独立,最终推导得出:∑ x=AA T+∑ ε,其中∑表示协方差矩阵,从而可以求出A与z。 Factor analysis, that is, the extracted feature x is completely controlled by the potential influence factor z, the expression is =Az+ε, where A is the coefficient matrix, and ε is the error, plus the mutual independence between the influence factors, the influence factor and the error Independent of each other, the final derivation is: ∑ x =AA T +∑ ε , where ∑ represents the covariance matrix, so that A and z can be calculated.
在本实施例中,模型建立模块包括:In this embodiment, the model establishment module includes:
初始模型获取单元,适于将特征集作为集成学习模型GBRT的输入获取初始模型;The initial model acquisition unit is suitable for acquiring the initial model by using the feature set as the input of the integrated learning model GBRT;
训练单元,适于对初始模型进行训练并调整超参数取值从而完成场面滑出时间预测模型的建立。The training unit is suitable for training the initial model and adjusting the values of hyperparameters to complete the establishment of the prediction model for the time to slide out of the scene.
在本实施例中,训练单元,即:选取“最大深度”作为控制决策树的控制方式;选取“最小二乘”作为损失函数;最优乘积值下,选择能保持性能稳定下最大的学习率和相应最小的估计器数量;根据训练集中滑出时间的整体数据分布,设置最小样本划分为200;完成对初始模型的训练从而建立场面滑出时间预测模型。In this embodiment, the training unit is: select "maximum depth" as the control method for controlling the decision tree; select "least squares" as the loss function; under the optimal product value, select the maximum learning rate that can maintain stable performance And the corresponding minimum number of estimators; according to the overall data distribution of the sliding-out time in the training set, the minimum sample is set to be divided into 200; the initial model training is completed to establish the scene sliding-out time prediction model.
具体的,采用集成学习的典型代表GradientBoostedRegressionTrees(GBRT)模型来完成场面滑出时间的预测操作。将步骤S133所获的特征集作为模型的输入,通过执行scikit-learn库中的算法快速训练GBRT模型。需要设置的超参数有:决策树大小控制、损失函数类型、估计器个数与学习率以及最小样本划分。在控制决策树大小上,共有两种方式选择,分别是“最大深度(max_depth)”和“最大叶子结点个数(max_leaf_nodes)”。在回归任务中共有四种可选损失函数,分别是“最小二乘(ls)”、“最小绝对偏差(lad)”、“Huber损失(huber)”以及“分位数损失(quantile)”。由于学习率和估计器个数是具有高度的相互作用,二者的乘积大致反映迭代训练情况。因此在设置参数的时候,根据经验设置不同的乘积值,并选择在训练集中获得最好性能的乘积值。最小样本划分用于控制叶子节点中样本个数下限,用于提高模型的鲁棒性。总的来说需要根据应用场景的数据条件合理的调整超参数取值。Specifically, the GradientBoostedRegressionTrees (GBRT) model, which is a typical representative of integrated learning, is used to complete the prediction operation of the slide-out time of the scene. The feature set obtained in step S133 is used as the input of the model, and the GBRT model is quickly trained by executing the algorithm in the scikit-learn library. The hyperparameters that need to be set are: decision tree size control, loss function type, number of estimators and learning rate, and minimum sample partition. There are two options for controlling the size of the decision tree, which are "max_depth" and "max_leaf_nodes". There are four optional loss functions in the regression task, namely "least squares (ls)", "least absolute deviation (lad)", "Huber loss (huber)" and "quantile loss (quantile)". Since the learning rate and the number of estimators have a high degree of interaction, the product of the two roughly reflects the iterative training situation. Therefore, when setting parameters, set different product values based on experience, and choose the product value that obtains the best performance in the training set. The minimum sample partition is used to control the lower limit of the number of samples in the leaf node, and is used to improve the robustness of the model. In general, the hyperparameter values need to be adjusted reasonably according to the data conditions of the application scenario.
具体的,GBRT模型F(x)是以下形式的可加模型:Specifically, the GBRT model F(x) is an additive model of the following form:
Figure PCTCN2020089916-appb-000011
Figure PCTCN2020089916-appb-000011
其中h m(x)是基函数,通常在boosting的概念下被称为弱学习器,γ m是弱学习器对应的权重,M是弱学习器的数量和。GBRT使用固定大小的决策树作为弱学习器。与其他boosting算法思想类似,GBRT贪婪地构建了可加模型: Among them, h m (x) is a basis function, usually called a weak learner under the concept of boosting, γ m is the corresponding weight of the weak learner, and M is the sum of the number of weak learners. GBRT uses a fixed-size decision tree as a weak learner. Similar to other boosting algorithm ideas, GBRT greedily builds an additive model:
F m(x)=F m-1(x)+γ mh m(x) F m (x)=F m-1 (x)+γ m h m (x)
其中,Fm(x)表示第m次迭代得到的G B R T模型。其中h m(x)由
Figure PCTCN2020089916-appb-000012
得出。n为训练样本总数,L为选定的损失函数,yi为第i个样本的标签,Fm-1(xi)为第m-1次迭代获取的GBRT模型对第i个样本的预测值,h(xi)为要获取的弱学习器对第i个样本的预测值。而γ m
Figure PCTCN2020089916-appb-000013
得出。n为训练样本总数,L为选定的损失函数,yi为第i个样本的标签,Fm-1(xi)为第m-1次迭代获取的GBRT模型对第i个样本的预测值
Among them, Fm(x) represents the G B R T model obtained in the mth iteration. Where h m(x) is given by
Figure PCTCN2020089916-appb-000012
inferred. n is the total number of training samples, L is the selected loss function, yi is the label of the i-th sample, Fm-1(xi) is the prediction value of the i-th sample of the GBRT model obtained in the m-1th iteration, h (xi) is the predicted value of the i-th sample of the weak learner to be obtained. While γ m is determined by
Figure PCTCN2020089916-appb-000013
inferred. n is the total number of training samples, L is the selected loss function, yi is the label of the i-th sample, and Fm-1(xi) is the prediction value of the i-th sample obtained by the GBRT model obtained in the m-1 iteration
最初的模型F 0是与问题相关的,对于最小二乘回归,通常选择目标值的平均值。 The initial model F 0 is related to the problem. For least squares regression, the average value of the target value is usually selected.
即,模型未训练状态为
Figure PCTCN2020089916-appb-000014
That is, the untrained state of the model is
Figure PCTCN2020089916-appb-000014
在本实施例中,模型建立模块还包括:In this embodiment, the model establishment module further includes:
模型测试单元,适于使用测试集对场面滑出时间预测模型进行验证并进行性能评估。The model testing unit is suitable for using the test set to verify and evaluate the performance of the scene sliding-out time prediction model.
具体的,所述使用测试集对场面滑出时间预测模型进行验证并进行性能评估中的性能评估采用均方误差,计算公式为:Specifically, the test set is used to verify the scene sliding-out time prediction model and the performance evaluation in the performance evaluation adopts the mean square error, and the calculation formula is:
Figure PCTCN2020089916-appb-000015
Figure PCTCN2020089916-appb-000015
其中N为测试集样本数量,o i为第i个样本的实际滑行时间,p i为模型的预测滑行时间。 Where N is the number of samples in the test set, o i is the actual taxi time of the i-th sample, and p i is the predicted taxi time of the model.
在本实施例中,使用MSE监测模型在训练和测试过程中的性能变化,结果如图6所示。最终,MSE在训练集中达到2.5,而在测试集的性能为5.5。尽管训练集和测试集的MSE性能上有一定的距离,但在一定程度上反应了模型的泛化能力。In this embodiment, the MSE is used to monitor the performance changes of the model during training and testing, and the result is shown in FIG. 6. In the end, MSE reached 2.5 in the training set, and the performance on the test set was 5.5. Although there is a certain distance between the MSE performance of the training set and the test set, it reflects the generalization ability of the model to a certain extent.
另一方面,下表比较了测试集不同误差范围内的预测精度情况。在所有测试集中,85.7%的数据集其滑行时间误差在3分钟之内;超过93%的数据,其预测误差在4分钟之间;大约96.5%的数据,其误差在5分钟之内。由上述测试集上的验证结果可知,所设计的数据挖掘模型及算法能够较好的达到实际场面动态滑出时间预测任务的精度要求。On the other hand, the table below compares the prediction accuracy within different error ranges of the test set. In all test sets, 85.7% of the data sets have a taxi time error within 3 minutes; more than 93% of the data have a prediction error of 4 minutes; about 96.5% of the data have an error of less than 5 minutes. From the verification results on the above test set, it can be seen that the designed data mining model and algorithm can better meet the accuracy requirements of the actual scene dynamic sliding-out time prediction task.
不同误差范围内的测试集精度Test set accuracy within different error ranges
误差范围tolerance scope [-3,3][-3,3] [-4,4][-4,4] [-5,5][-5,5]
精度Precision 85.7%85.7% 93.1%93.1% 96.5%96.5%
综上所述,本发明提供了一种基于大数据深度学习的场面可变滑出时间预测系统。基于大数据深度学习的场面可变滑出时间预测系统包括:数据集建立模块,适于获取历史运行数据并进行数据清洗从而获得数据集;指标定义及量化模块,适于定义并量化场面交通特性的交通状况指标;特征集提取模块,适于基于数据集以及交通状况指标分析和提取影响场面滑出时间的特征集;模型建立模块,适于依据特征集通过集成机器学习方法建立场面滑出时间预测模型,预测模块,适于通过场面滑出时间预测模型完成对机场场面滑出时间的预测。。处理机场原始记录数据,对机场场面交通状况进行建模,分析和提取滑行时间影响因素,训练GBRT集成学习模型,进而得到滑出时间预测模型,为机场运行的管理和优化提供数据依据。In summary, the present invention provides a scene variable sliding-out time prediction system based on big data deep learning. The scene variable sliding-out time prediction system based on deep learning of big data includes: a data set establishment module, which is suitable for obtaining historical operating data and data cleaning to obtain a data set; an index definition and quantification module, which is suitable for defining and quantifying the traffic characteristics of the scene The traffic condition index; The feature set extraction module is suitable for analyzing and extracting the feature set that affects the scene slip-out time based on the data set and the traffic condition index; the model building module is suitable for establishing the scene slip-out time through the integrated machine learning method based on the feature set The prediction model, the prediction module, is suitable for predicting the airport surface slip-out time through the surface slip-out time prediction model. . Process the original recorded data of the airport, model the traffic conditions of the airport scene, analyze and extract the factors affecting taxi time, train the GBRT integrated learning model, and then obtain the taxi time prediction model, which provides a data basis for the management and optimization of airport operations.
以上述依据本发明的理想实施例为启示,通过上述的说明内容,相关工作人员完全可以在不偏离本项发明技术思想的范围内,进行多样的变更以及修改。本项发明的技术性范围并不局限于说明书上的内容,必须要根据权利要求范围来确定其技术性范围。Taking the above-mentioned ideal embodiment according to the present invention as enlightenment, through the above-mentioned description content, relevant staff can make various changes and modifications without departing from the scope of the technical idea of the present invention. The technical scope of the present invention is not limited to the content of the description, and its technical scope must be determined according to the scope of the claims.

Claims (9)

  1. 一种基于大数据深度学习的场面可变滑出时间预测系统,其特征在于,包括:A scene variable sliding-out time prediction system based on big data deep learning is characterized in that it includes:
    数据集建立模块,适于获取历史运行数据并进行数据清洗从而获得数据集;The data set establishment module is suitable for obtaining historical operating data and performing data cleaning to obtain the data set;
    指标定义及量化模块,适于定义并量化场面交通特性的交通状况指标;Index definition and quantification module, suitable for defining and quantifying traffic condition indexes of traffic characteristics on the scene;
    特征集提取模块,适于基于数据集以及交通状况指标分析和提取影响场面滑出时间的特征集;The feature set extraction module is suitable for analyzing and extracting the feature set that affects the sliding-out time of the scene based on the data set and the traffic condition index;
    模型建立模块,适于依据特征集通过集成机器学习方法建立场面滑出时间预测模型;The model building module is suitable for establishing the prediction model of the time of the scene sliding out through the integrated machine learning method based on the feature set;
    预测模块,适于通过场面滑出时间预测模型完成对机场场面滑出时间的预测。The prediction module is suitable for completing the prediction of the airport surface slide-out time through the surface slide-out time prediction model.
  2. 如权利要求1所述的基于大数据深度学习的场面可变滑出时间预测系统,其特征在于,The scene variable sliding-out time prediction system based on big data deep learning according to claim 1, characterized in that:
    所述数据集建立模块包括:The data set establishment module includes:
    原始数据集获取单元,适于获取历史运行数据构建原始数据集;The original data set acquisition unit is suitable for acquiring historical operating data to construct the original data set;
    数据清理单元,适于对原始数据集进行数据清理;The data cleaning unit is suitable for data cleaning of the original data set;
    数据集获取单元,适于将原始数据集进行数据集成获取数据集;The data set acquisition unit is suitable for data integration of the original data set to acquire the data set;
    数据集划分单元,适于将数据集分为训练集以及测试集。The data set dividing unit is suitable for dividing the data set into a training set and a test set.
  3. 如权利要求2所述的基于大数据深度学习的场面可变滑出时间预测系统,其特征在于,The scene variable sliding-out time prediction system based on big data deep learning according to claim 2, characterized in that:
    所述指标定义及量化模块包括:The indicator definition and quantification module includes:
    网络拓扑结构获取单元,适于采用宏观时空网络拓扑模型,对机场场面运行交通态势进行建模获取宏观时空网络拓扑结构;The network topology acquisition unit is suitable for adopting the macroscopic spatiotemporal network topology model to model the traffic situation of the airport surface to obtain the macroscopic spatiotemporal network topology;
    量化单元,适于基于宏观时空网络拓扑结构,定义体现场面交通量的四类指标并进行量化。The quantification unit is suitable for defining and quantifying four types of indicators reflecting the traffic volume of the scene based on the macroscopic spatio-temporal network topology structure.
  4. 如权利要求3所述的基于大数据深度学习的场面可变滑出时间预测系统,其特征在于,The scene variable sliding-out time prediction system based on big data deep learning according to claim 3, characterized in that:
    所述特征集提取模块包括:The feature set extraction module includes:
    原始特征集提取单元,适于从数据集以及交通状况指标提取影响场面滑出时间的特征并构成原始特征集;The original feature set extraction unit is suitable for extracting features that affect the sliding time of the scene from the data set and traffic condition indicators and form the original feature set;
    特征分析单元,适于对原始特征集中的特征进行特征分析Feature analysis unit, suitable for feature analysis of the features in the original feature set
    特征集构建单元,适于依据特征分析结果构建特征集。The feature set building unit is suitable for building a feature set based on the result of feature analysis.
  5. 如权利要求4所述的基于大数据深度学习的场面可变滑出时间预测系统,其特征在于,The scene variable sliding-out time prediction system based on big data deep learning according to claim 4, characterized in that:
    所述特征分析单元,即:The feature analysis unit is:
    采用相关性度量相关系数、标准化互信息以及因子分析三者中的一种或多种对原始特征集重的特征进行特征分析;Use one or more of correlation coefficients, standardized mutual information, and factor analysis to perform feature analysis on the features of the original feature set;
    相关性度量相关系数反映两个变量线性相关程度的统计量,其取值为[-1,1],绝对值越大表示线性相关程度越强,正值表示正相关,负值表示负相关,用X、Y代指任意两个变量,相关性度量相关系数P X,Y的定义为: Correlation measurement The correlation coefficient reflects the statistic of the degree of linear correlation between two variables. Its value is [-1,1]. The larger the absolute value, the stronger the degree of linear correlation. A positive value indicates a positive correlation, and a negative value indicates a negative correlation. X and Y are used to refer to any two variables, and the correlation coefficient P X, Y is defined as:
    Figure PCTCN2020089916-appb-100001
    Figure PCTCN2020089916-appb-100001
    其中Cov(X,Y)为X与Y的协方差,σ X、σ Y为X、Y的标准差,μ X、μ Y为X、Y的均值; Where Cov(X, Y) is the covariance of X and Y, σ X and σ Y are the standard deviations of X and Y, and μ X and μ Y are the mean values of X and Y;
    标准化互信息是常用相关度量其取值范围为[0,1],值越大表示变量间的相关程度越大,标准化互信息U X,Y的定义为: Standardized mutual information is a commonly used correlation measure, and its value range is [0, 1]. The larger the value, the greater the degree of correlation between variables. The standardized mutual information U X, Y is defined as:
    Figure PCTCN2020089916-appb-100002
    Figure PCTCN2020089916-appb-100002
    其中,I X,Y为X、Y的互信息,H X、H Y为X、Y各自的信息熵,p(x,y)为X、Y的联合概率分布,p(x)、p(y)为X、Y各自的概率分布 Among them, I X, Y are the mutual information of X and Y, H X and H Y are the respective entropy of X and Y, p(x,y) is the joint probability distribution of X and Y, p(x), p( y) is the probability distribution of X and Y
    因子分析,即,提取到的特征x是完全被潜在影响因子z控制的,表达式为x=Az+ε,其中A为系数矩阵,ε为误差,加以影响因子之间互相独立、影响因子与误差互相独立,最终推导得出:∑ x=AA T+∑ ε,其中∑表示协方差矩阵,从而可以求出A与z。 Factor analysis, that is, the extracted feature x is completely controlled by the potential impact factor z, and the expression is x=Az+ε, where A is the coefficient matrix and ε is the error. The errors are independent of each other, and the final deduction is: ∑ x =AA T +∑ ε , where ∑ represents the covariance matrix, so A and z can be calculated.
  6. 如权利要求5所述的基于大数据深度学习的场面可变滑出时间预测系统,其特征在于,The scene variable sliding-out time prediction system based on big data deep learning according to claim 5, characterized in that:
    所述模型建立模块包括:The model building module includes:
    初始模型获取单元,适于将特征集作为集成学习模型GBRT的输入获取初始模型;The initial model acquisition unit is suitable for acquiring the initial model by using the feature set as the input of the integrated learning model GBRT;
    训练单元,适于对初始模型进行训练并调整超参数取值从而完成场面滑出时间预测模型的建立。The training unit is suitable for training the initial model and adjusting the values of hyperparameters to complete the establishment of the prediction model for the time to slide out of the scene.
  7. 如权利要求6所述的基于大数据深度学习的场面可变滑出时间预测系统,其特征在于,The scene variable sliding-out time prediction system based on big data deep learning according to claim 6, characterized in that:
    所述训练单元,即:The training unit is:
    选取最大深度作为控制决策树的控制方式;Select the maximum depth as the control method to control the decision tree;
    选取最小二乘作为损失函数;Choose least squares as the loss function;
    最优乘积值下,选择能保持性能稳定下最大的学习率和相应最小的估计器数量;Under the optimal product value, select the largest learning rate and the corresponding smallest number of estimators that can maintain stable performance;
    根据训练集中滑出时间的整体数据分布,设置最小样本划分为200;According to the overall data distribution of the sliding-out time in the training set, set the minimum sample to be divided into 200;
    完成对初始模型的训练从而建立场面滑出时间预测模型。Complete the training of the initial model to establish a prediction model for the time to slide out of the scene.
  8. 如权利要求7所述的基于大数据深度学习的场面可变滑出时间预测系统,其特征在于,The scene variable sliding-out time prediction system based on big data deep learning according to claim 7, characterized in that:
    所述模型建立模块还包括The model building module also includes
    模型测试单元,适于使用测试集对场面滑出时间预测模型进行验证并进行性能评估。The model testing unit is suitable for using the test set to verify and evaluate the performance of the scene sliding-out time prediction model.
  9. 如权利要求8所述的基于大数据深度学习的场面可变滑出时间预测系统,其特征在于,The scene variable sliding-out time prediction system based on big data deep learning according to claim 8, characterized in that:
    所述模型测试单元中的性能评估采用均方误差,计算公式为:The performance evaluation in the model test unit adopts the mean square error, and the calculation formula is:
    Figure PCTCN2020089916-appb-100003
    Figure PCTCN2020089916-appb-100003
    其中N为测试集样本数量,o i为第i个样本的实际滑行时间,p i为模型的预测滑行时间。 Where N is the number of samples in the test set, o i is the actual taxi time of the i-th sample, and p i is the predicted taxi time of the model.
PCT/CN2020/089916 2019-10-30 2020-05-13 Layout-variable taxiing-out time prediction system based on big data deep learning WO2021082394A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911044358.3 2019-10-30
CN201911044358.3A CN110852497A (en) 2019-10-30 2019-10-30 Scene variable slide-out time prediction system based on big data deep learning

Publications (1)

Publication Number Publication Date
WO2021082394A1 true WO2021082394A1 (en) 2021-05-06

Family

ID=69599051

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/089916 WO2021082394A1 (en) 2019-10-30 2020-05-13 Layout-variable taxiing-out time prediction system based on big data deep learning

Country Status (2)

Country Link
CN (1) CN110852497A (en)
WO (1) WO2021082394A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743498A (en) * 2021-09-02 2021-12-03 美视(杭州)人工智能科技有限公司 Solution method for fitting OKAI by using orthokeratology mirror
CN117668497A (en) * 2024-01-31 2024-03-08 山西卓昇环保科技有限公司 Carbon emission analysis method and system based on deep learning under environment protection

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852497A (en) * 2019-10-30 2020-02-28 南京智慧航空研究院有限公司 Scene variable slide-out time prediction system based on big data deep learning
CN110826788A (en) * 2019-10-30 2020-02-21 南京智慧航空研究院有限公司 Airport scene variable slide-out time prediction method based on big data deep learning
CN114783212A (en) * 2022-03-29 2022-07-22 南京航空航天大学 Method for constructing model feature set for prediction of departure taxi time of aircraft in busy airport
CN117253584A (en) * 2023-02-14 2023-12-19 南雄市民望医疗有限公司 Hemodialysis component detection-based dialysis time prediction system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100185426A1 (en) * 2009-01-16 2010-07-22 Rajesh Ganesan Predicting Aircraft Taxi-Out Times
CN106339358A (en) * 2016-08-16 2017-01-18 南京航空航天大学 Prediction method of aircraft scene taxiing time based on multiple regression analysis
CN106529734A (en) * 2016-11-18 2017-03-22 中国民航大学 Flight taxiing time prediction time based on a k-nearest neighbor (KNN) and support vector regression (SVR)
CN110826788A (en) * 2019-10-30 2020-02-21 南京智慧航空研究院有限公司 Airport scene variable slide-out time prediction method based on big data deep learning
CN110852497A (en) * 2019-10-30 2020-02-28 南京智慧航空研究院有限公司 Scene variable slide-out time prediction system based on big data deep learning

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463993B (en) * 2017-08-04 2020-11-24 贺志尧 Medium-and-long-term runoff forecasting method based on mutual information-kernel principal component analysis-Elman network
AU2018241119A1 (en) * 2017-10-06 2019-05-02 Tata Consultancy Services Limited System and method for flight delay prediction
US20190316909A1 (en) * 2018-04-13 2019-10-17 Passur Aerospace, Inc. Estimating Aircraft Taxi Times
CN108846523A (en) * 2018-07-31 2018-11-20 中国民航大学 A kind of flight for putting forth coasting time dynamic prediction method based on Bayesian network
CN110363333A (en) * 2019-06-21 2019-10-22 南京航空航天大学 The prediction technique of air transit ability under the influence of a kind of weather based on progressive gradient regression tree
CN110363361A (en) * 2019-07-25 2019-10-22 四川青霄信息科技有限公司 A kind of method and system for predicting variable sliding time based on big data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100185426A1 (en) * 2009-01-16 2010-07-22 Rajesh Ganesan Predicting Aircraft Taxi-Out Times
CN106339358A (en) * 2016-08-16 2017-01-18 南京航空航天大学 Prediction method of aircraft scene taxiing time based on multiple regression analysis
CN106529734A (en) * 2016-11-18 2017-03-22 中国民航大学 Flight taxiing time prediction time based on a k-nearest neighbor (KNN) and support vector regression (SVR)
CN110826788A (en) * 2019-10-30 2020-02-21 南京智慧航空研究院有限公司 Airport scene variable slide-out time prediction method based on big data deep learning
CN110852497A (en) * 2019-10-30 2020-02-28 南京智慧航空研究院有限公司 Scene variable slide-out time prediction system based on big data deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI XIA; BAI RUIBIN: "Freight Vehicle Travel Time Prediction Using Gradient Boosting Regression Tree", 2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), IEEE, 18 December 2016 (2016-12-18), pages 1010 - 1015, XP033055658, DOI: 10.1109/ICMLA.2016.0182 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743498A (en) * 2021-09-02 2021-12-03 美视(杭州)人工智能科技有限公司 Solution method for fitting OKAI by using orthokeratology mirror
CN117668497A (en) * 2024-01-31 2024-03-08 山西卓昇环保科技有限公司 Carbon emission analysis method and system based on deep learning under environment protection
CN117668497B (en) * 2024-01-31 2024-05-07 山西卓昇环保科技有限公司 Carbon emission analysis method and system based on deep learning under environment protection

Also Published As

Publication number Publication date
CN110852497A (en) 2020-02-28

Similar Documents

Publication Publication Date Title
WO2021082394A1 (en) Layout-variable taxiing-out time prediction system based on big data deep learning
WO2021082393A1 (en) Airport surface variable slide-out time prediction method based on big data deep learning
CN111652427B (en) Flight arrival time prediction method and system based on data mining analysis
CN108710623B (en) Airport departure delay time prediction method based on time series similarity measurement
CN111401601B (en) Delay propagation-oriented flight take-off and landing time prediction method
Choi et al. Artificial neural network models for airport capacity prediction
CN110210648B (en) Gray long-short term memory network-based control airspace strategic flow prediction method
CN112465199B (en) Airspace situation assessment system
CN110443448B (en) Bidirectional LSTM-based airplane position classification prediction method and system
Lin et al. From aircraft tracking data to network delay model: A data-driven approach considering en-route congestion
CN111160612A (en) Off-site flight delay analysis and prediction method based on weather influence
Zhang et al. Data‐driven flight time prediction for arrival aircraft within the terminal area
CN110889092A (en) Short-time large-scale activity peripheral track station passenger flow volume prediction method based on track transaction data
CN111968414B (en) 4D trajectory prediction method and device based on big data and AI and electronic equipment
CN105809280A (en) Prediction method for airport capacity demands
Ai et al. A deep learning approach to predict the spatial and temporal distribution of flight delay in network
CN112232535A (en) Flight departure average delay prediction method based on supervised learning
Wu et al. An improved svm model for flight delay prediction
CN113657814A (en) Aviation network risk prediction method and risk grade evaluation method
CN113610282A (en) Flight taxi time prediction method
CN110796315B (en) Departure flight delay prediction method based on aging information and deep learning
CN115752708A (en) Airport single-point noise prediction method based on deep time convolution network
CN110009939B (en) Flight delay prediction and sweep analysis method based on ASM
CN116109212B (en) Airport operation efficiency evaluation index design and monitoring method
CN105224801B (en) A kind of multiple-factor reservoir reservoir inflow short-period forecast evaluation method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20881288

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20881288

Country of ref document: EP

Kind code of ref document: A1