CN110852497A - Scene variable slide-out time prediction system based on big data deep learning - Google Patents

Scene variable slide-out time prediction system based on big data deep learning Download PDF

Info

Publication number
CN110852497A
CN110852497A CN201911044358.3A CN201911044358A CN110852497A CN 110852497 A CN110852497 A CN 110852497A CN 201911044358 A CN201911044358 A CN 201911044358A CN 110852497 A CN110852497 A CN 110852497A
Authority
CN
China
Prior art keywords
scene
out time
data
slide
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911044358.3A
Other languages
Chinese (zh)
Inventor
周龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Smart Aviation Research Institute Co Ltd
Original Assignee
Nanjing Smart Aviation Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Smart Aviation Research Institute Co Ltd filed Critical Nanjing Smart Aviation Research Institute Co Ltd
Priority to CN201911044358.3A priority Critical patent/CN110852497A/en
Publication of CN110852497A publication Critical patent/CN110852497A/en
Priority to PCT/CN2020/089916 priority patent/WO2021082394A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • Educational Administration (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Game Theory and Decision Science (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Fuzzy Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention relates to a scene variable slide-out time prediction system based on big data deep learning, which comprises: the data set establishing module is suitable for acquiring historical operating data and performing data cleaning to obtain a data set; the index definition and quantification module is suitable for defining and quantifying the traffic condition index of the scene traffic characteristic; a feature set extraction module adapted to be based onAnalyzing and extracting a data set and a traffic condition index and a feature set influencing the slide-out time of the scene; the model building module is suitable for building a scene slide-out time prediction model through an integrated machine learning method according to the feature set, and the prediction module is suitable for completing prediction of the scene slide-out time of the airport through the scene slide-out time prediction model. Processing airport original recorded data, modeling airport scene traffic conditions, analyzing and extracting influence factors of taxi time, and trainingGBRTAnd integrating the learning model to obtain a slip-out time prediction model, and providing a data basis for the management and optimization of the airport operation.

Description

Scene variable slide-out time prediction system based on big data deep learning
Technical Field
The invention relates to the field of airport traffic control, in particular to a scene variable slide-out time prediction system based on big data deep learning.
Background
In the prior art, aircraft slide-out time prediction is mostly modeled from two aspects: and (5) simulating and analyzing. The simulation model uses the existing airport topological structure model, conflict detection and solution as factors, and obtains the slide-out time by simulating the operation of all on-ground entering and leaving aircrafts. The simulation model has strong pertinence and has no good universality for different airports. The conventional research on analysis models has focused mainly on linear regression models, and some attempts have been made to use machine learning techniques. For the analytical model, the determination of the main factors influencing the glide time is an emphasis of the study. The analytical model usually has the defects of incomplete influence factors and the like, and the actual reference value is weak, so that the requirements of actual application cannot be met.
How to solve the above problems is a need to be solved.
Disclosure of Invention
The invention aims to provide a scene variable slide-out time prediction system based on big data deep learning, so as to achieve the purpose of improving the comprehensiveness of influence factors in an analysis model.
In order to solve the technical problem, the invention provides a scene variable slide-out time prediction system based on big data deep learning, which comprises:
the data set establishing module is suitable for acquiring historical operating data and performing data cleaning to obtain a data set;
the index definition and quantification module is suitable for defining and quantifying the traffic condition index of the scene traffic characteristic;
the characteristic set extraction module is suitable for analyzing and extracting a characteristic set influencing the slide-out time of the scene based on the data set and the traffic condition indexes;
the model establishing module is suitable for establishing a scene slide-out time prediction model through an integrated machine learning method according to the feature set;
and the prediction module is suitable for completing prediction of the airport scene slide-out time through the scene slide-out time prediction model.
Further, the data set establishing module comprises:
the original data set acquisition unit is suitable for acquiring historical operating data to construct an original data set;
the data cleaning unit is suitable for cleaning the original data set;
the data set acquisition unit is suitable for integrating the original data set to acquire a data set;
and the data set dividing unit is suitable for dividing the data set into a training set and a test set.
Further, the index defining and quantizing module includes:
the network topological structure acquisition unit is suitable for modeling the traffic situation of the airport scene by adopting a macroscopic space-time network topological model to acquire a macroscopic space-time network topological structure;
and the quantization unit is suitable for defining four types of indexes for representing scene traffic based on a macroscopic space-time network topological structure and quantizing the indexes.
Further, the feature set extraction module comprises:
the original feature set extracting unit is suitable for extracting features influencing scene slide-out time from the data set and the traffic condition indexes and forming an original feature set;
a feature analysis unit adapted to perform feature analysis on the features in the original feature set
And the feature set construction unit is suitable for constructing a feature set according to the feature analysis result.
Further, the feature analysis unit is configured to: performing feature analysis on the features of the original feature set by adopting one or more of correlation coefficient, standardized mutual information and factor analysis;
correlation measurement correlation coefficient reflects statistic of linear correlation degree of two variables, and the value of the statistic is [ -1, 1]The larger the absolute value is, the stronger the linear correlation degree is, the positive value is positive correlation, the negative value is negative correlation, X, Y is used for representing any two variables, and the correlation measure is a correlation coefficient PX,YIs defined as:
Figure BDA0002253731690000031
where Cov (X, Y) is the covariance of X and Y, σX、σYIs a standard deviation of X, Y, μX、μYIs the mean value of X, Y;
the standardized mutual information is a common correlation metric with a value range of [0, 1 ]]The larger the value is, the larger the degree of correlation between the variables is, and the mutual information U is normalizedX,YIs defined as:
Figure BDA0002253731690000032
wherein, IX,YIs mutual information of X, Y, HX、HYFor X, Y respective entropy, p (x, y) is X, Y joint probability distribution, and p (x), p (y) are X, Y respective probability distributions
Factor analysis, that is, the extracted feature x is completely controlled by a potential influence factor z, and the expression is equal to Az + epsilon, where a is a coefficient matrix and epsilon is an error, and the influence factors are independent of each other and the influence factors and the error are independent of each other, and finally, the derivation is performed: sigmax=AAT+∑εWhere Σ represents the covariance matrix, so that a and z can be found.
Further, the model building module comprises:
the initial model acquisition unit is suitable for acquiring an initial model by taking the feature set as the input of the integrated learning model GBRT;
a training unit suitable for training the initial model and adjusting the value of the hyper-parameter so as to complete the establishment of the scene slide-out time prediction model
Further, the training unit is to:
selecting the maximum depth as a control mode for controlling the decision tree;
selecting least squares as a loss function;
under the optimal product value, selecting the maximum learning rate and the corresponding minimum estimator quantity which can keep the performance stable;
setting minimum samples to be divided into 200 according to the whole data distribution of the sliding-out time in the training set;
and finishing the training of the initial model so as to establish a scene slide-out time prediction model.
Further, the model building module further comprises
And the model test unit is suitable for verifying the scene slide-out time prediction model by using a test set and evaluating the performance.
Further, the performance evaluation in the model test unit adopts a mean square error, and the calculation formula is as follows:
Figure BDA0002253731690000041
where N is the number of test set samples, oiIs the actual glide time, p, of the ith sampleiIs the predicted glide time for the model.
The invention has the beneficial effect that the invention provides a scene variable slide-out time prediction system based on big data deep learning. The scene variable slide-out time prediction system based on big data deep learning comprises: the data set establishing module is suitable for acquiring historical operating data and performing data cleaning to obtain a data set; the index definition and quantification module is suitable for defining and quantifying the traffic condition index of the scene traffic characteristic; the characteristic set extraction module is suitable for analyzing and extracting a characteristic set influencing the slide-out time of the scene based on the data set and the traffic condition indexes; the model building module is suitable for building a scene slide-out time prediction model through an integrated machine learning method according to the feature set, and the prediction module is suitable for completing prediction of the scene slide-out time of the airport through the scene slide-out time prediction model. . The method comprises the steps of processing original recorded data of an airport, modeling traffic conditions of the airport scene, analyzing and extracting influence factors of sliding time, training a GBRT ensemble learning model, further obtaining a sliding-out time prediction model, and providing data basis for management and optimization of airport operation.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a schematic block diagram of a scene variable slide-out time prediction system based on big data deep learning provided by the present invention.
FIG. 2 is a schematic diagram of a taxi process macro spatiotemporal network topology provided by the present invention.
FIG. 3 is a correlation coefficient of a candidate influencing factor with respect to a slip-out time metric provided by the present invention.
FIG. 4 is a normalized mutual information relationship of candidate influencing factors and roll-out time provided by the present invention.
FIG. 5 is a graph of the results of factor analysis of candidate influencing factors provided by the present invention.
FIG. 6 is a diagram of the performance variation process during the model training and testing phases provided by the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views illustrating only the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.
Example 1
As shown in fig. 1, this embodiment 1 provides a scene variable slide-out time prediction system based on big data deep learning, which processes original recorded data of an airport, models traffic conditions of the airport scene, analyzes and extracts influence factors of slide time, trains a GBRT ensemble learning model, and further obtains a slide-out time prediction model, thereby providing a data basis for management and optimization of airport operation. Specifically, the scene variable slide-out time prediction system based on big data deep learning comprises:
the data set establishing module is suitable for acquiring historical operating data and performing data cleaning to obtain a data set;
the index definition and quantification module is suitable for defining and quantifying the traffic condition index of the scene traffic characteristic;
the characteristic set extraction module is suitable for analyzing and extracting a characteristic set influencing the slide-out time of the scene based on the data set and the traffic condition indexes;
the model establishing module is suitable for establishing a scene slide-out time prediction model by an integrated machine learning method according to the feature set,
and the prediction module is suitable for completing prediction of the airport scene slide-out time through the scene slide-out time prediction model.
In this embodiment, the data set creating module includes:
and the original data set acquisition unit is suitable for acquiring historical operating data to construct an original data set.
Specifically, data are extracted from an airport scene operation database as much as possible to form an airport flight departure operation original data set. Collecting relevant information of a sliding track, including a departure runway, a departure parking place, a corridor port number, a sliding length and the like; collecting flight attribute related information including flight number, flight type, model, affiliated navigation department, engine type and the like; collecting traffic control related information including whether the traffic control is limited, controller information, communication information, delay conditions, local weather, airport broadcasting and the like; collecting flight plan related information including a takeoff airport, a target airport, planned takeoff time, planned gear-removing time, waypoint information and the like; collecting the actual recording information of the sliding process, wherein the actual recording information comprises the time of removing the gear, the pushing time, the time of requesting/allowing driving, the actual takeoff time, the sliding speed, the waiting time of the track head and the like.
And the data cleaning unit is suitable for cleaning the original data set.
Specifically, a specific processing scheme is formulated for actual work in consideration of the situation that an airport acquires a data set in real time. In the aspect of missing value processing, two methods of setting default values and directly deleting are adopted. Default values are set, default value no is set for 'limited or not', and default value no is set for 'limited content'. After the default value is filled, the attributes with more than half of information missing are directly deleted, wherein the attributes comprise 'request for driving', 'permission for driving', 'wheel-removing time', 'wake', 'sliding speed' and 'departure queue number'. And then, carrying out completeness check on the data set, and deleting the data entries with missing information. In the aspect of abnormal value processing, firstly, basic check is carried out on data types of all attributes and whether the data types are out of bounds, and then a delimiting detection method is adopted to further check abnormal values for partial attributes. And defining a value range for the attribute based on the actual operation condition of the airport scene, and regarding data with the value not in the corresponding range as an abnormal value. And finally deleting the data entry containing the abnormal value from the data set. The attribute value ranges are shown in the following table.
Value range of partial attribute
Figure BDA0002253731690000071
Figure BDA0002253731690000081
And the data set acquisition unit is suitable for integrating the original data set to acquire the data set.
Specifically, the step includes the tasks of redundant attribute identification, data type conversion and logic error check. And identifying and deleting the redundant attributes, identifying the redundant attributes with less carried information by calculating the information entropy of each attribute, and identifying the redundant attributes of which the information is contained by other attributes by calculating the mutual information among the attributes. The redundant attributes "takeoff airport" and "execution date" are deleted. And converting the data type, namely converting the information which only has an identification function in the non-numerical attribute into an integer value type which is easy to process and use subsequently. The information contained in the "restricted content" attribute is difficult to quantify and is deleted after comprehensive consideration. And checking logic errors, considering the physical meanings of the features, establishing a constraint relation among the features, and eliminating the logic errors. Checking the corresponding relation between the machine type and the number of the engines, checking the precedence relation of each time node in scene operation, and directly deleting the information items with logic errors.
And the data set dividing unit is suitable for dividing the data set into a training set and a test set.
Specifically, the data set is divided into two parts, a training set and a test set. Of which 90% of the data is the training set used in the training phase of the model and 10% is used as the test set to verify the model's validity and robustness. That is, the training set is homologously homogenous with the test set. After the final processed data set is obtained, 10% is reserved from the data set for testing before training of the machine learning model, and the remaining 90% of the data set is used as a training set to train the machine learning model.
In this embodiment, the index defining and quantizing module includes:
the network topological structure acquisition unit is suitable for modeling the traffic situation of the airport scene by adopting a macroscopic space-time network topological model to acquire a macroscopic space-time network topological structure;
specifically, a macroscopic space-time network topological model is adopted to model the traffic situation of the airport scene. Fig. 2 visualizes the general situation of the network topology during taxiing in any time-space domain departure and arrival. In actual operation of an airport scene, the processes of sliding in and sliding out are coupled and interdependent. Therefore, the influence of the harbor entry on the harbor exit process is simultaneously considered in the model. The space-time network topological model is a general framework for describing the flow of macroscopic resources of an airport system, and is shown as d in FIG. 21,...,d4Indicating and referencing departure flights d0All four different relationships of (1) are "before launch, before launch", "before launch, after launch", "after launch, before launch" and "after launch, after launch". Similarly, Port a1,...,a4Representing and referencing inbound flights a0All four different relationships of (1) are "before landing, before landing", "before landing, after landing", "after landing, before landing" and "after landing", after landing ". t is ton,tinIndicating a reference inbound flight a0Landing time and in-place time. t is tout,toffShowing the departure time and departure time of the reference departure flight. δ represents the time threshold for entry and departure.
And the quantization unit is suitable for defining four types of indexes for representing scene traffic based on a macroscopic space-time network topological structure and quantizing the indexes.
In particular, on a macroscopic basisThe air network topological structure defines eight indexes of four types which represent scene traffic. The four categories are respectively scene instantaneous flow indexes (SIFIIs), Scene Cumulative Flow Indexes (SCFIs), Airplane Queuing Length Indexes (AQLIs) and Slot Resource Demand Indexes (SRDIs). Two statistics are computed in each category, the number of outgoing aircraft (prefixed by D-) and the number of incoming aircraft (prefixed by A-). The following table shows the values given in d0Various statistics in the case of fig. 2 are referenced for departure flights.
Departure flight d0Statistical result of scene traffic situation indexes
Figure BDA0002253731690000091
Taking fig. 2 as an example, the following describes the definition and calculation method of the index in table 1 in detail. For any departure flight d0SIFIs include D-SIFII and A-SIFII, which are respectively expressed as D0The number of flights leaving and entering the port are taxied when pushed out of the gate. SCFIs include D-SCFI and A-SCFI, which represent taxi periods and D for departing and departing aircraft, respectively0The amount of overlap of taxi periods. AQLIs include D-AQLI and A-AQLI, each of which is D0The number of aircraft taking off and landing on the runway during the entire taxiing process. The SRDIs include D-SRDI and A-SRDI, and are shown on aircraft D0Departure groove [ t ]0-δ,t0+δ]During which the number of aircraft are launched and landed. In general, the value of δ may be set to be between 10 minutes and 30 minutes.
In this embodiment, the feature set extracting module includes:
and the original feature set extracting unit is suitable for extracting features influencing the scene slide-out time from the data set and the traffic condition indexes and forming an original feature set.
Specifically, the relevant factors influencing the scene slide-out time, which are acquired by the data set establishing module and the index defining and quantifying module, are sorted to form an original feature set. And processing the original feature set, and extracting new features from the original features to replace partial features in the original feature set.
The data set establishing module obtains the relevant factors influencing the scene slide-out time as follows: flight number, flight attribute, destination airport, planned takeoff time, model, affiliated department, pushout time, actual takeoff time, departure runway, departure stand, stand type, engine type, corridor entrance, whether limited or not, gate. The scene drawing time related factors obtained in S120 are: D-SIFI, D-SCFI, D-AQLI, D-SRDI, Corridor _ NO. And using the difference between the push-out time and the actual takeoff time as the scene sliding time to replace the original characteristics. And extracting new characteristics of month, day, week, hour and minute from the planned takeoff time to replace the original characteristics. And further dividing and analyzing the characteristics of the stand and the gate. And extracting the corresponding relation between the runway and the airplane position/gate as a new characteristic. The final set of raw features, i.e., candidate influencing factors, obtained is shown in the following table:
candidate influencing factors
Figure BDA0002253731690000111
A feature analysis unit adapted to perform feature analysis on the features in the original feature set
And the feature set construction unit is suitable for constructing a feature set according to the feature analysis result.
Specifically, based on the analysis result of the feature analysis unit, important features are selected from the original feature set formed by the original feature set extraction unit, and a feature set for integrating the machine learning model is formed. And screening out the characteristics of which the correlation with the scene sliding time is small. Including "engine type", "stand type", "month", "week", "day", "minute". The finally obtained feature set, i.e. the influencing factors, is shown in the following table:
finally selected influencing factors
Figure BDA0002253731690000121
In this embodiment, the feature analysis unit includes:
one or more of a correlation measurement correlation coefficient, standardized mutual information and factor analysis are adopted to perform feature analysis on the features of the original feature set, and fig. 3, 4 and 5 show the Pearson correlation coefficient of the candidate influence factors and the slide-out time, the standardized mutual information of the candidate influence factors and the slide-out time and the factor analysis results of the candidate influence factors respectively.
Correlation measurement correlation coefficient reflects statistic of linear correlation degree of two variables, and the value of the statistic is [ -1, 1]The larger the absolute value is, the stronger the linear correlation degree is, the positive value is positive correlation, the negative value is negative correlation, X, Y is used for representing any two variables, and the correlation measure is a correlation coefficient PX,YIs defined as:
Figure BDA0002253731690000122
where Cov (X, Y) is the covariance of X and Y, σX、σYIs a standard deviation of X, Y, μX、μYIs the mean value of X, Y;
the standardized mutual information is a common correlation metric with a value range of [0, 1 ]]The larger the value is, the larger the degree of correlation between the variables is, and the mutual information U is normalizedX,YIs defined as:
Figure BDA0002253731690000131
wherein, IX,YIs mutual information of X, Y, HX、HYFor X, Y respective entropy, p (x, y) is X, Y joint probability distribution, and p (x), p (y) are X, Y respective probability distributions
Factor analysis, that is, the extracted feature x is completely controlled by a potential influence factor z, and the expression is equal to Az + epsilon, where a is a coefficient matrix and epsilon is an error, and the influence factors are independent of each other and the influence factors and the error are independent of each other, and finally, the derivation is performed: sigmax=AAT+∑εWhere Σ represents the covariance matrix, so that a and z can be found.
In this embodiment, the model building module includes:
the initial model acquisition unit is suitable for acquiring an initial model by taking the feature set as the input of the integrated learning model GBRT;
and the training unit is suitable for training the initial model and adjusting the value of the hyper-parameter so as to complete the establishment of the scene slide-out time prediction model.
In this embodiment, the training unit, namely: selecting 'maximum depth' as a control mode for controlling the decision tree; selecting 'least square' as a loss function; under the optimal product value, selecting the maximum learning rate and the corresponding minimum estimator quantity which can keep the performance stable; setting minimum samples to be divided into 200 according to the whole data distribution of the sliding-out time in the training set; and finishing the training of the initial model so as to establish a scene slide-out time prediction model.
Specifically, a GradientBoostedRegenerationTrees (GBRT) model, which is a typical representative of ensemble learning, is adopted to complete the prediction operation of the scene slide-out time. And taking the feature set obtained in the step S133 as the input of the model, and quickly training the GBRT model by executing the algorithm in the scimit-learn library. The hyper-parameters to be set are: the method comprises the steps of decision tree size control, loss function types, the number of estimators, learning rate and minimum sample division. There are two choices in controlling the size of the decision tree, the "maximum depth (max _ depth)" and the "maximum number of leaf nodes (max _ leaf _ nodes)", respectively. There are four alternative loss functions in the regression task, respectively "least squares (ls)", "minimum absolute deviation (lad)", "Huber loss (Huber)" and "quantile loss (quantile)". Since the learning rate and the number of estimators have a high degree of interaction, the product of the two roughly reflects the iterative training situation. Thus, when setting the parameters, different product values are set empirically and the product value that achieves the best performance in the training set is selected. And the minimum sample division is used for controlling the lower limit of the number of samples in the leaf node and improving the robustness of the model. In general, the value of the hyper-parameter needs to be reasonably adjusted according to the data condition of the application scene.
Specifically, GBRT model f (x) is an additive model of the form:
Figure BDA0002253731690000141
wherein h ism(x) Is a basis function, commonly referred to as weak learner, gamma, under the concept of boostingmIs the weight corresponding to the weak learner, and M is the sum of the number of weak learners. GBRT uses a fixed-size decision tree as a weak learner. Similar to other boosting algorithm ideas, GBRT greedily constructs an additive model:
Fm(x)=Fm-1(x)+γmhm(x)
wherein, Fm (x) represents the GBRT model obtained in the mth iteration. Wherein hm (x) is composed of
Figure BDA0002253731690000142
And (6) obtaining. n is the total number of training samples, L is the selected loss function, yi is the label of the ith sample, Fm-1(xi) is the predicted value of the GBRT model obtained from the (m-1) iteration on the ith sample, and h (xi) is the predicted value of the weak learner to be obtained on the ith sample. And gamma ismBy
Figure BDA0002253731690000151
And (6) obtaining. n is the total number of training samples, L is the selected loss function, yi is the label of the ith sample, Fm-1(xi) is the predicted value of the GBRT model obtained from the (m-1) iteration on the ith sample
Initial model F0Is problem-related, for least squares regression, the average of the target values is usually chosen.
That is, the model is not trained
In this embodiment, the model building module further includes:
and the model test unit is suitable for verifying the scene slide-out time prediction model by using a test set and evaluating the performance.
Specifically, the field slide-out time prediction model is verified by using the test set, and a mean square error is adopted for performance evaluation in the performance evaluation, wherein the calculation formula is as follows:
where N is the number of test set samples, oiIs the actual glide time, p, of the ith sampleiIs the predicted glide time for the model.
In this example, MSE was used to monitor the performance change of the model during training and testing, with the results shown in fig. 6. Finally, the MSE reached 2.5 in the training set and the performance in the test set was 5.5. Although there is some distance in the MSE performance of the training and test sets, it reflects to some extent the generalization capability of the model.
On the other hand, the following table compares the prediction accuracy of the test set within different error ranges. In all test sets, 85.7% of the data sets had a glide time error within 3 minutes; over 93% of the data with prediction errors between 4 minutes; about 96.5% of the data, with an error of within 5 minutes. According to the verification result on the test set, the designed data mining model and algorithm can better meet the precision requirement of the actual scene dynamic slide-out time prediction task.
Test set accuracy within different error ranges
Error range [-3,3] [-4,4] [-5,5]
Accuracy of measurement 85.7% 93.1% 96.5%
In summary, the invention provides a scene variable slide-out time prediction system based on big data deep learning. The scene variable slide-out time prediction system based on big data deep learning comprises: the data set establishing module is suitable for acquiring historical operating data and performing data cleaning to obtain a data set; the index definition and quantification module is suitable for defining and quantifying the traffic condition index of the scene traffic characteristic; the characteristic set extraction module is suitable for analyzing and extracting a characteristic set influencing the slide-out time of the scene based on the data set and the traffic condition indexes; the model building module is suitable for building a scene slide-out time prediction model through an integrated machine learning method according to the feature set, and the prediction module is suitable for completing prediction of the scene slide-out time of the airport through the scene slide-out time prediction model. . The method comprises the steps of processing original recorded data of an airport, modeling traffic conditions of the airport scene, analyzing and extracting influence factors of sliding time, training a GBRT ensemble learning model, further obtaining a sliding-out time prediction model, and providing data basis for management and optimization of airport operation.
In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims (9)

1. A scene variable slide-out time prediction system based on big data deep learning, comprising:
the data set establishing module is suitable for acquiring historical operating data and performing data cleaning to obtain a data set;
the index definition and quantification module is suitable for defining and quantifying the traffic condition index of the scene traffic characteristic;
the characteristic set extraction module is suitable for analyzing and extracting a characteristic set influencing the slide-out time of the scene based on the data set and the traffic condition indexes;
the model establishing module is suitable for establishing a scene slide-out time prediction model through an integrated machine learning method according to the feature set;
and the prediction module is suitable for completing prediction of the airport scene slide-out time through the scene slide-out time prediction model.
2. The big-data deep learning-based scene variable slide-out time prediction system of claim 1,
the data set building module comprises:
the original data set acquisition unit is suitable for acquiring historical operating data to construct an original data set;
the data cleaning unit is suitable for cleaning the original data set;
the data set acquisition unit is suitable for integrating the original data set to acquire a data set;
and the data set dividing unit is suitable for dividing the data set into a training set and a test set.
3. The big-data deep learning-based scene variable slide-out time prediction system of claim 2,
the index definition and quantization module comprises:
the network topological structure acquisition unit is suitable for modeling the traffic situation of the airport scene by adopting a macroscopic space-time network topological model to acquire a macroscopic space-time network topological structure;
and the quantization unit is suitable for defining four types of indexes for representing scene traffic based on a macroscopic space-time network topological structure and quantizing the indexes.
4. The big-data deep learning-based scene variable slide-out time prediction system of claim 3,
the feature set extraction module comprises:
the original feature set extracting unit is suitable for extracting features influencing scene slide-out time from the data set and the traffic condition indexes and forming an original feature set;
a feature analysis unit adapted to perform feature analysis on the features in the original feature set
And the feature set construction unit is suitable for constructing a feature set according to the feature analysis result.
5. The big-data deep learning-based scene variable slide-out time prediction system of claim 4,
the feature analysis unit, namely:
performing feature analysis on the features of the original feature set by adopting one or more of correlation coefficient, standardized mutual information and factor analysis;
correlation measurement correlation coefficient reflects statistic of linear correlation degree of two variables, and the value of the statistic is [ -1, 1]The larger the absolute value is, the stronger the linear correlation degree is, the positive value is positive correlation, the negative value is negative correlation, X, Y is used for representing any two variables, and the correlation measure is a correlation coefficient PX,YIs defined as:
Figure FDA0002253731680000021
where Cov (X, Y) is the covariance of X and Y, σX、σYIs a standard deviation of X, Y, μX、μYIs the mean value of X, Y;
the standardized mutual information is a common correlation metric with a value range of [0, 1 ]]The larger the value is, the larger the degree of correlation between the variables is, and the mutual information U is normalizedX,YIs defined as:
Figure FDA0002253731680000031
wherein, IX,YIs mutual information of X, Y, HX、HYFor X, Y respective entropy, p (x, y) is the joint probability distribution of X, Y, p (x)) And p (y) is X, Y
Factor analysis, that is, the extracted feature x is completely controlled by a potential influence factor z, and the expression is x ═ Az + epsilon, where a is a coefficient matrix and epsilon is an error, and the influence factors are independent from each other and the influence factors and the error are independent from each other, and finally, the derivation is carried out: sigmax=AAT+∑εWhere Σ represents the covariance matrix, so that a and z can be found.
6. The big-data deep learning-based scene variable slide-out time prediction system of claim 5,
the model building module comprises:
the initial model acquisition unit is suitable for acquiring an initial model by taking the feature set as the input of the integrated learning model GBRT;
and the training unit is suitable for training the initial model and adjusting the value of the hyper-parameter so as to complete the establishment of the scene slide-out time prediction model.
7. The big-data deep learning-based scene variable slide-out time prediction system of claim 6,
the training unit, namely:
selecting the maximum depth as a control mode for controlling the decision tree;
selecting least squares as a loss function;
under the optimal product value, selecting the maximum learning rate and the corresponding minimum estimator quantity which can keep the performance stable;
setting minimum samples to be divided into 200 according to the whole data distribution of the sliding-out time in the training set;
and finishing the training of the initial model so as to establish a scene slide-out time prediction model.
8. The big-data deep learning-based scene variable slide-out time prediction system of claim 7,
the model building module also comprises
And the model test unit is suitable for verifying the scene slide-out time prediction model by using a test set and evaluating the performance.
9. The big-data deep learning-based scene variable slide-out time prediction system of claim 8,
the performance evaluation in the model test unit adopts a mean square error, and the calculation formula is as follows:
where N is the number of test set samples, oiIs the actual glide time, p, of the ith sampleiIs the predicted glide time for the model.
CN201911044358.3A 2019-10-30 2019-10-30 Scene variable slide-out time prediction system based on big data deep learning Pending CN110852497A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911044358.3A CN110852497A (en) 2019-10-30 2019-10-30 Scene variable slide-out time prediction system based on big data deep learning
PCT/CN2020/089916 WO2021082394A1 (en) 2019-10-30 2020-05-13 Layout-variable taxiing-out time prediction system based on big data deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911044358.3A CN110852497A (en) 2019-10-30 2019-10-30 Scene variable slide-out time prediction system based on big data deep learning

Publications (1)

Publication Number Publication Date
CN110852497A true CN110852497A (en) 2020-02-28

Family

ID=69599051

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911044358.3A Pending CN110852497A (en) 2019-10-30 2019-10-30 Scene variable slide-out time prediction system based on big data deep learning

Country Status (2)

Country Link
CN (1) CN110852497A (en)
WO (1) WO2021082394A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021082393A1 (en) * 2019-10-30 2021-05-06 南京智慧航空研究院有限公司 Airport surface variable slide-out time prediction method based on big data deep learning
WO2021082394A1 (en) * 2019-10-30 2021-05-06 南京智慧航空研究院有限公司 Layout-variable taxiing-out time prediction system based on big data deep learning
CN114783212A (en) * 2022-03-29 2022-07-22 南京航空航天大学 Method for constructing model feature set for prediction of departure taxi time of aircraft in busy airport
CN117253584A (en) * 2023-02-14 2023-12-19 南雄市民望医疗有限公司 Hemodialysis component detection-based dialysis time prediction system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743498A (en) * 2021-09-02 2021-12-03 美视(杭州)人工智能科技有限公司 Solution method for fitting OKAI by using orthokeratology mirror
CN117668497B (en) * 2024-01-31 2024-05-07 山西卓昇环保科技有限公司 Carbon emission analysis method and system based on deep learning under environment protection

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463993A (en) * 2017-08-04 2017-12-12 贺志尧 Medium-and Long-Term Runoff Forecasting method based on mutual information core principle component analysis Elman networks
CN108846523A (en) * 2018-07-31 2018-11-20 中国民航大学 A kind of flight for putting forth coasting time dynamic prediction method based on Bayesian network
US20190108758A1 (en) * 2017-10-06 2019-04-11 Tata Consultancy Services Limited System and method for flight delay prediction
US20190316909A1 (en) * 2018-04-13 2019-10-17 Passur Aerospace, Inc. Estimating Aircraft Taxi Times
CN110363333A (en) * 2019-06-21 2019-10-22 南京航空航天大学 The prediction technique of air transit ability under the influence of a kind of weather based on progressive gradient regression tree
CN110363361A (en) * 2019-07-25 2019-10-22 四川青霄信息科技有限公司 A kind of method and system for predicting variable sliding time based on big data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100185426A1 (en) * 2009-01-16 2010-07-22 Rajesh Ganesan Predicting Aircraft Taxi-Out Times
CN106339358B (en) * 2016-08-16 2018-11-09 南京航空航天大学 Aircraft scene coasting time prediction technique based on multiple regression analysis
CN106529734A (en) * 2016-11-18 2017-03-22 中国民航大学 Flight taxiing time prediction time based on a k-nearest neighbor (KNN) and support vector regression (SVR)
CN110852497A (en) * 2019-10-30 2020-02-28 南京智慧航空研究院有限公司 Scene variable slide-out time prediction system based on big data deep learning
CN110826788A (en) * 2019-10-30 2020-02-21 南京智慧航空研究院有限公司 Airport scene variable slide-out time prediction method based on big data deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463993A (en) * 2017-08-04 2017-12-12 贺志尧 Medium-and Long-Term Runoff Forecasting method based on mutual information core principle component analysis Elman networks
US20190108758A1 (en) * 2017-10-06 2019-04-11 Tata Consultancy Services Limited System and method for flight delay prediction
US20190316909A1 (en) * 2018-04-13 2019-10-17 Passur Aerospace, Inc. Estimating Aircraft Taxi Times
CN108846523A (en) * 2018-07-31 2018-11-20 中国民航大学 A kind of flight for putting forth coasting time dynamic prediction method based on Bayesian network
CN110363333A (en) * 2019-06-21 2019-10-22 南京航空航天大学 The prediction technique of air transit ability under the influence of a kind of weather based on progressive gradient regression tree
CN110363361A (en) * 2019-07-25 2019-10-22 四川青霄信息科技有限公司 A kind of method and system for predicting variable sliding time based on big data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘桂荣等: "《统计学原理 第2版》", 30 June 2019, 华东理工大学出版社 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021082393A1 (en) * 2019-10-30 2021-05-06 南京智慧航空研究院有限公司 Airport surface variable slide-out time prediction method based on big data deep learning
WO2021082394A1 (en) * 2019-10-30 2021-05-06 南京智慧航空研究院有限公司 Layout-variable taxiing-out time prediction system based on big data deep learning
CN114783212A (en) * 2022-03-29 2022-07-22 南京航空航天大学 Method for constructing model feature set for prediction of departure taxi time of aircraft in busy airport
CN117253584A (en) * 2023-02-14 2023-12-19 南雄市民望医疗有限公司 Hemodialysis component detection-based dialysis time prediction system

Also Published As

Publication number Publication date
WO2021082394A1 (en) 2021-05-06

Similar Documents

Publication Publication Date Title
CN110826788A (en) Airport scene variable slide-out time prediction method based on big data deep learning
CN110852497A (en) Scene variable slide-out time prediction system based on big data deep learning
CN107086935B (en) People flow distribution prediction method based on WIFI AP
CN111401601B (en) Delay propagation-oriented flight take-off and landing time prediction method
CN110503245B (en) Prediction method for large-area delay risk of airport flight
CN104156594B (en) Dynamic flight station-crossing time estimation method based on Bayes network
CN110570693B (en) Flight operation time prediction method based on reliability
Choi et al. Artificial neural network models for airport capacity prediction
CN111652427A (en) Flight arrival time prediction method and system based on data mining analysis
CN113706931B (en) Airspace flow control strategy recommendation method and device, electronic equipment and storage medium
CN111160612A (en) Off-site flight delay analysis and prediction method based on weather influence
CN110796315B (en) Departure flight delay prediction method based on aging information and deep learning
Provan et al. A probabilistic airport capacity model for improved ground delay program planning
Ramanujam et al. Estimation of arrival-departure capacity tradeoffs in multi-airport systems
CN112419131A (en) Method for estimating traffic origin-destination demand
CN116956757A (en) Departure flight taxi time prediction method, electronic device, and storage medium
CN113610282A (en) Flight taxi time prediction method
CN115752708A (en) Airport single-point noise prediction method based on deep time convolution network
CN116911434A (en) Airport operation situation prediction method, device and system and storage medium
CN110009939B (en) Flight delay prediction and sweep analysis method based on ASM
CN115966107A (en) Airport traffic flow prediction method based on graph neural network
CN116109212B (en) Airport operation efficiency evaluation index design and monitoring method
CN115759386B (en) Method and device for predicting flight execution result of civil aviation flight and electronic equipment
CN112365037A (en) Airport airspace flow prediction method based on long-term and short-term data prediction model
Meijers Data-driven predictive analytics of runway occupancy time for improved capacity at airports

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200228

RJ01 Rejection of invention patent application after publication