CN110852497A - Scene variable slide-out time prediction system based on big data deep learning - Google Patents
Scene variable slide-out time prediction system based on big data deep learning Download PDFInfo
- Publication number
- CN110852497A CN110852497A CN201911044358.3A CN201911044358A CN110852497A CN 110852497 A CN110852497 A CN 110852497A CN 201911044358 A CN201911044358 A CN 201911044358A CN 110852497 A CN110852497 A CN 110852497A
- Authority
- CN
- China
- Prior art keywords
- scene
- out time
- data
- slide
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 36
- 238000004140 cleaning Methods 0.000 claims abstract description 12
- 238000010801 machine learning Methods 0.000 claims abstract description 10
- 238000000605 extraction Methods 0.000 claims abstract description 9
- 238000011002 quantification Methods 0.000 claims abstract description 6
- 238000012360 testing method Methods 0.000 claims description 27
- 230000006870 function Effects 0.000 claims description 9
- 238000009826 distribution Methods 0.000 claims description 8
- 238000000556 factor analysis Methods 0.000 claims description 8
- 238000003066 decision tree Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000005259 measurement Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 4
- 238000013139 quantization Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 238000009795 derivation Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 7
- 238000005457 optimization Methods 0.000 abstract description 4
- 238000000034 method Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 8
- 230000002159 abnormal effect Effects 0.000 description 4
- 230000010006 flight Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 208000035204 infantile sudden cardiac failure Diseases 0.000 description 3
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 238000013479 data entry Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Entrepreneurship & Innovation (AREA)
- Software Systems (AREA)
- Educational Administration (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Primary Health Care (AREA)
- Game Theory and Decision Science (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- Fuzzy Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention relates to a scene variable slide-out time prediction system based on big data deep learning, which comprises: the data set establishing module is suitable for acquiring historical operating data and performing data cleaning to obtain a data set; the index definition and quantification module is suitable for defining and quantifying the traffic condition index of the scene traffic characteristic; a feature set extraction module adapted to be based onAnalyzing and extracting a data set and a traffic condition index and a feature set influencing the slide-out time of the scene; the model building module is suitable for building a scene slide-out time prediction model through an integrated machine learning method according to the feature set, and the prediction module is suitable for completing prediction of the scene slide-out time of the airport through the scene slide-out time prediction model. Processing airport original recorded data, modeling airport scene traffic conditions, analyzing and extracting influence factors of taxi time, and trainingGBRTAnd integrating the learning model to obtain a slip-out time prediction model, and providing a data basis for the management and optimization of the airport operation.
Description
Technical Field
The invention relates to the field of airport traffic control, in particular to a scene variable slide-out time prediction system based on big data deep learning.
Background
In the prior art, aircraft slide-out time prediction is mostly modeled from two aspects: and (5) simulating and analyzing. The simulation model uses the existing airport topological structure model, conflict detection and solution as factors, and obtains the slide-out time by simulating the operation of all on-ground entering and leaving aircrafts. The simulation model has strong pertinence and has no good universality for different airports. The conventional research on analysis models has focused mainly on linear regression models, and some attempts have been made to use machine learning techniques. For the analytical model, the determination of the main factors influencing the glide time is an emphasis of the study. The analytical model usually has the defects of incomplete influence factors and the like, and the actual reference value is weak, so that the requirements of actual application cannot be met.
How to solve the above problems is a need to be solved.
Disclosure of Invention
The invention aims to provide a scene variable slide-out time prediction system based on big data deep learning, so as to achieve the purpose of improving the comprehensiveness of influence factors in an analysis model.
In order to solve the technical problem, the invention provides a scene variable slide-out time prediction system based on big data deep learning, which comprises:
the data set establishing module is suitable for acquiring historical operating data and performing data cleaning to obtain a data set;
the index definition and quantification module is suitable for defining and quantifying the traffic condition index of the scene traffic characteristic;
the characteristic set extraction module is suitable for analyzing and extracting a characteristic set influencing the slide-out time of the scene based on the data set and the traffic condition indexes;
the model establishing module is suitable for establishing a scene slide-out time prediction model through an integrated machine learning method according to the feature set;
and the prediction module is suitable for completing prediction of the airport scene slide-out time through the scene slide-out time prediction model.
Further, the data set establishing module comprises:
the original data set acquisition unit is suitable for acquiring historical operating data to construct an original data set;
the data cleaning unit is suitable for cleaning the original data set;
the data set acquisition unit is suitable for integrating the original data set to acquire a data set;
and the data set dividing unit is suitable for dividing the data set into a training set and a test set.
Further, the index defining and quantizing module includes:
the network topological structure acquisition unit is suitable for modeling the traffic situation of the airport scene by adopting a macroscopic space-time network topological model to acquire a macroscopic space-time network topological structure;
and the quantization unit is suitable for defining four types of indexes for representing scene traffic based on a macroscopic space-time network topological structure and quantizing the indexes.
Further, the feature set extraction module comprises:
the original feature set extracting unit is suitable for extracting features influencing scene slide-out time from the data set and the traffic condition indexes and forming an original feature set;
a feature analysis unit adapted to perform feature analysis on the features in the original feature set
And the feature set construction unit is suitable for constructing a feature set according to the feature analysis result.
Further, the feature analysis unit is configured to: performing feature analysis on the features of the original feature set by adopting one or more of correlation coefficient, standardized mutual information and factor analysis;
correlation measurement correlation coefficient reflects statistic of linear correlation degree of two variables, and the value of the statistic is [ -1, 1]The larger the absolute value is, the stronger the linear correlation degree is, the positive value is positive correlation, the negative value is negative correlation, X, Y is used for representing any two variables, and the correlation measure is a correlation coefficient PX,YIs defined as:
where Cov (X, Y) is the covariance of X and Y, σX、σYIs a standard deviation of X, Y, μX、μYIs the mean value of X, Y;
the standardized mutual information is a common correlation metric with a value range of [0, 1 ]]The larger the value is, the larger the degree of correlation between the variables is, and the mutual information U is normalizedX,YIs defined as:
wherein, IX,YIs mutual information of X, Y, HX、HYFor X, Y respective entropy, p (x, y) is X, Y joint probability distribution, and p (x), p (y) are X, Y respective probability distributions
Factor analysis, that is, the extracted feature x is completely controlled by a potential influence factor z, and the expression is equal to Az + epsilon, where a is a coefficient matrix and epsilon is an error, and the influence factors are independent of each other and the influence factors and the error are independent of each other, and finally, the derivation is performed: sigmax=AAT+∑εWhere Σ represents the covariance matrix, so that a and z can be found.
Further, the model building module comprises:
the initial model acquisition unit is suitable for acquiring an initial model by taking the feature set as the input of the integrated learning model GBRT;
a training unit suitable for training the initial model and adjusting the value of the hyper-parameter so as to complete the establishment of the scene slide-out time prediction model
Further, the training unit is to:
selecting the maximum depth as a control mode for controlling the decision tree;
selecting least squares as a loss function;
under the optimal product value, selecting the maximum learning rate and the corresponding minimum estimator quantity which can keep the performance stable;
setting minimum samples to be divided into 200 according to the whole data distribution of the sliding-out time in the training set;
and finishing the training of the initial model so as to establish a scene slide-out time prediction model.
Further, the model building module further comprises
And the model test unit is suitable for verifying the scene slide-out time prediction model by using a test set and evaluating the performance.
Further, the performance evaluation in the model test unit adopts a mean square error, and the calculation formula is as follows:
where N is the number of test set samples, oiIs the actual glide time, p, of the ith sampleiIs the predicted glide time for the model.
The invention has the beneficial effect that the invention provides a scene variable slide-out time prediction system based on big data deep learning. The scene variable slide-out time prediction system based on big data deep learning comprises: the data set establishing module is suitable for acquiring historical operating data and performing data cleaning to obtain a data set; the index definition and quantification module is suitable for defining and quantifying the traffic condition index of the scene traffic characteristic; the characteristic set extraction module is suitable for analyzing and extracting a characteristic set influencing the slide-out time of the scene based on the data set and the traffic condition indexes; the model building module is suitable for building a scene slide-out time prediction model through an integrated machine learning method according to the feature set, and the prediction module is suitable for completing prediction of the scene slide-out time of the airport through the scene slide-out time prediction model. . The method comprises the steps of processing original recorded data of an airport, modeling traffic conditions of the airport scene, analyzing and extracting influence factors of sliding time, training a GBRT ensemble learning model, further obtaining a sliding-out time prediction model, and providing data basis for management and optimization of airport operation.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a schematic block diagram of a scene variable slide-out time prediction system based on big data deep learning provided by the present invention.
FIG. 2 is a schematic diagram of a taxi process macro spatiotemporal network topology provided by the present invention.
FIG. 3 is a correlation coefficient of a candidate influencing factor with respect to a slip-out time metric provided by the present invention.
FIG. 4 is a normalized mutual information relationship of candidate influencing factors and roll-out time provided by the present invention.
FIG. 5 is a graph of the results of factor analysis of candidate influencing factors provided by the present invention.
FIG. 6 is a diagram of the performance variation process during the model training and testing phases provided by the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views illustrating only the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.
Example 1
As shown in fig. 1, this embodiment 1 provides a scene variable slide-out time prediction system based on big data deep learning, which processes original recorded data of an airport, models traffic conditions of the airport scene, analyzes and extracts influence factors of slide time, trains a GBRT ensemble learning model, and further obtains a slide-out time prediction model, thereby providing a data basis for management and optimization of airport operation. Specifically, the scene variable slide-out time prediction system based on big data deep learning comprises:
the data set establishing module is suitable for acquiring historical operating data and performing data cleaning to obtain a data set;
the index definition and quantification module is suitable for defining and quantifying the traffic condition index of the scene traffic characteristic;
the characteristic set extraction module is suitable for analyzing and extracting a characteristic set influencing the slide-out time of the scene based on the data set and the traffic condition indexes;
the model establishing module is suitable for establishing a scene slide-out time prediction model by an integrated machine learning method according to the feature set,
and the prediction module is suitable for completing prediction of the airport scene slide-out time through the scene slide-out time prediction model.
In this embodiment, the data set creating module includes:
and the original data set acquisition unit is suitable for acquiring historical operating data to construct an original data set.
Specifically, data are extracted from an airport scene operation database as much as possible to form an airport flight departure operation original data set. Collecting relevant information of a sliding track, including a departure runway, a departure parking place, a corridor port number, a sliding length and the like; collecting flight attribute related information including flight number, flight type, model, affiliated navigation department, engine type and the like; collecting traffic control related information including whether the traffic control is limited, controller information, communication information, delay conditions, local weather, airport broadcasting and the like; collecting flight plan related information including a takeoff airport, a target airport, planned takeoff time, planned gear-removing time, waypoint information and the like; collecting the actual recording information of the sliding process, wherein the actual recording information comprises the time of removing the gear, the pushing time, the time of requesting/allowing driving, the actual takeoff time, the sliding speed, the waiting time of the track head and the like.
And the data cleaning unit is suitable for cleaning the original data set.
Specifically, a specific processing scheme is formulated for actual work in consideration of the situation that an airport acquires a data set in real time. In the aspect of missing value processing, two methods of setting default values and directly deleting are adopted. Default values are set, default value no is set for 'limited or not', and default value no is set for 'limited content'. After the default value is filled, the attributes with more than half of information missing are directly deleted, wherein the attributes comprise 'request for driving', 'permission for driving', 'wheel-removing time', 'wake', 'sliding speed' and 'departure queue number'. And then, carrying out completeness check on the data set, and deleting the data entries with missing information. In the aspect of abnormal value processing, firstly, basic check is carried out on data types of all attributes and whether the data types are out of bounds, and then a delimiting detection method is adopted to further check abnormal values for partial attributes. And defining a value range for the attribute based on the actual operation condition of the airport scene, and regarding data with the value not in the corresponding range as an abnormal value. And finally deleting the data entry containing the abnormal value from the data set. The attribute value ranges are shown in the following table.
Value range of partial attribute
And the data set acquisition unit is suitable for integrating the original data set to acquire the data set.
Specifically, the step includes the tasks of redundant attribute identification, data type conversion and logic error check. And identifying and deleting the redundant attributes, identifying the redundant attributes with less carried information by calculating the information entropy of each attribute, and identifying the redundant attributes of which the information is contained by other attributes by calculating the mutual information among the attributes. The redundant attributes "takeoff airport" and "execution date" are deleted. And converting the data type, namely converting the information which only has an identification function in the non-numerical attribute into an integer value type which is easy to process and use subsequently. The information contained in the "restricted content" attribute is difficult to quantify and is deleted after comprehensive consideration. And checking logic errors, considering the physical meanings of the features, establishing a constraint relation among the features, and eliminating the logic errors. Checking the corresponding relation between the machine type and the number of the engines, checking the precedence relation of each time node in scene operation, and directly deleting the information items with logic errors.
And the data set dividing unit is suitable for dividing the data set into a training set and a test set.
Specifically, the data set is divided into two parts, a training set and a test set. Of which 90% of the data is the training set used in the training phase of the model and 10% is used as the test set to verify the model's validity and robustness. That is, the training set is homologously homogenous with the test set. After the final processed data set is obtained, 10% is reserved from the data set for testing before training of the machine learning model, and the remaining 90% of the data set is used as a training set to train the machine learning model.
In this embodiment, the index defining and quantizing module includes:
the network topological structure acquisition unit is suitable for modeling the traffic situation of the airport scene by adopting a macroscopic space-time network topological model to acquire a macroscopic space-time network topological structure;
specifically, a macroscopic space-time network topological model is adopted to model the traffic situation of the airport scene. Fig. 2 visualizes the general situation of the network topology during taxiing in any time-space domain departure and arrival. In actual operation of an airport scene, the processes of sliding in and sliding out are coupled and interdependent. Therefore, the influence of the harbor entry on the harbor exit process is simultaneously considered in the model. The space-time network topological model is a general framework for describing the flow of macroscopic resources of an airport system, and is shown as d in FIG. 21,...,d4Indicating and referencing departure flights d0All four different relationships of (1) are "before launch, before launch", "before launch, after launch", "after launch, before launch" and "after launch, after launch". Similarly, Port a1,...,a4Representing and referencing inbound flights a0All four different relationships of (1) are "before landing, before landing", "before landing, after landing", "after landing, before landing" and "after landing", after landing ". t is ton,tinIndicating a reference inbound flight a0Landing time and in-place time. t is tout,toffShowing the departure time and departure time of the reference departure flight. δ represents the time threshold for entry and departure.
And the quantization unit is suitable for defining four types of indexes for representing scene traffic based on a macroscopic space-time network topological structure and quantizing the indexes.
In particular, on a macroscopic basisThe air network topological structure defines eight indexes of four types which represent scene traffic. The four categories are respectively scene instantaneous flow indexes (SIFIIs), Scene Cumulative Flow Indexes (SCFIs), Airplane Queuing Length Indexes (AQLIs) and Slot Resource Demand Indexes (SRDIs). Two statistics are computed in each category, the number of outgoing aircraft (prefixed by D-) and the number of incoming aircraft (prefixed by A-). The following table shows the values given in d0Various statistics in the case of fig. 2 are referenced for departure flights.
Departure flight d0Statistical result of scene traffic situation indexes
Taking fig. 2 as an example, the following describes the definition and calculation method of the index in table 1 in detail. For any departure flight d0SIFIs include D-SIFII and A-SIFII, which are respectively expressed as D0The number of flights leaving and entering the port are taxied when pushed out of the gate. SCFIs include D-SCFI and A-SCFI, which represent taxi periods and D for departing and departing aircraft, respectively0The amount of overlap of taxi periods. AQLIs include D-AQLI and A-AQLI, each of which is D0The number of aircraft taking off and landing on the runway during the entire taxiing process. The SRDIs include D-SRDI and A-SRDI, and are shown on aircraft D0Departure groove [ t ]0-δ,t0+δ]During which the number of aircraft are launched and landed. In general, the value of δ may be set to be between 10 minutes and 30 minutes.
In this embodiment, the feature set extracting module includes:
and the original feature set extracting unit is suitable for extracting features influencing the scene slide-out time from the data set and the traffic condition indexes and forming an original feature set.
Specifically, the relevant factors influencing the scene slide-out time, which are acquired by the data set establishing module and the index defining and quantifying module, are sorted to form an original feature set. And processing the original feature set, and extracting new features from the original features to replace partial features in the original feature set.
The data set establishing module obtains the relevant factors influencing the scene slide-out time as follows: flight number, flight attribute, destination airport, planned takeoff time, model, affiliated department, pushout time, actual takeoff time, departure runway, departure stand, stand type, engine type, corridor entrance, whether limited or not, gate. The scene drawing time related factors obtained in S120 are: D-SIFI, D-SCFI, D-AQLI, D-SRDI, Corridor _ NO. And using the difference between the push-out time and the actual takeoff time as the scene sliding time to replace the original characteristics. And extracting new characteristics of month, day, week, hour and minute from the planned takeoff time to replace the original characteristics. And further dividing and analyzing the characteristics of the stand and the gate. And extracting the corresponding relation between the runway and the airplane position/gate as a new characteristic. The final set of raw features, i.e., candidate influencing factors, obtained is shown in the following table:
candidate influencing factors
A feature analysis unit adapted to perform feature analysis on the features in the original feature set
And the feature set construction unit is suitable for constructing a feature set according to the feature analysis result.
Specifically, based on the analysis result of the feature analysis unit, important features are selected from the original feature set formed by the original feature set extraction unit, and a feature set for integrating the machine learning model is formed. And screening out the characteristics of which the correlation with the scene sliding time is small. Including "engine type", "stand type", "month", "week", "day", "minute". The finally obtained feature set, i.e. the influencing factors, is shown in the following table:
finally selected influencing factors
In this embodiment, the feature analysis unit includes:
one or more of a correlation measurement correlation coefficient, standardized mutual information and factor analysis are adopted to perform feature analysis on the features of the original feature set, and fig. 3, 4 and 5 show the Pearson correlation coefficient of the candidate influence factors and the slide-out time, the standardized mutual information of the candidate influence factors and the slide-out time and the factor analysis results of the candidate influence factors respectively.
Correlation measurement correlation coefficient reflects statistic of linear correlation degree of two variables, and the value of the statistic is [ -1, 1]The larger the absolute value is, the stronger the linear correlation degree is, the positive value is positive correlation, the negative value is negative correlation, X, Y is used for representing any two variables, and the correlation measure is a correlation coefficient PX,YIs defined as:
where Cov (X, Y) is the covariance of X and Y, σX、σYIs a standard deviation of X, Y, μX、μYIs the mean value of X, Y;
the standardized mutual information is a common correlation metric with a value range of [0, 1 ]]The larger the value is, the larger the degree of correlation between the variables is, and the mutual information U is normalizedX,YIs defined as:
wherein, IX,YIs mutual information of X, Y, HX、HYFor X, Y respective entropy, p (x, y) is X, Y joint probability distribution, and p (x), p (y) are X, Y respective probability distributions
Factor analysis, that is, the extracted feature x is completely controlled by a potential influence factor z, and the expression is equal to Az + epsilon, where a is a coefficient matrix and epsilon is an error, and the influence factors are independent of each other and the influence factors and the error are independent of each other, and finally, the derivation is performed: sigmax=AAT+∑εWhere Σ represents the covariance matrix, so that a and z can be found.
In this embodiment, the model building module includes:
the initial model acquisition unit is suitable for acquiring an initial model by taking the feature set as the input of the integrated learning model GBRT;
and the training unit is suitable for training the initial model and adjusting the value of the hyper-parameter so as to complete the establishment of the scene slide-out time prediction model.
In this embodiment, the training unit, namely: selecting 'maximum depth' as a control mode for controlling the decision tree; selecting 'least square' as a loss function; under the optimal product value, selecting the maximum learning rate and the corresponding minimum estimator quantity which can keep the performance stable; setting minimum samples to be divided into 200 according to the whole data distribution of the sliding-out time in the training set; and finishing the training of the initial model so as to establish a scene slide-out time prediction model.
Specifically, a GradientBoostedRegenerationTrees (GBRT) model, which is a typical representative of ensemble learning, is adopted to complete the prediction operation of the scene slide-out time. And taking the feature set obtained in the step S133 as the input of the model, and quickly training the GBRT model by executing the algorithm in the scimit-learn library. The hyper-parameters to be set are: the method comprises the steps of decision tree size control, loss function types, the number of estimators, learning rate and minimum sample division. There are two choices in controlling the size of the decision tree, the "maximum depth (max _ depth)" and the "maximum number of leaf nodes (max _ leaf _ nodes)", respectively. There are four alternative loss functions in the regression task, respectively "least squares (ls)", "minimum absolute deviation (lad)", "Huber loss (Huber)" and "quantile loss (quantile)". Since the learning rate and the number of estimators have a high degree of interaction, the product of the two roughly reflects the iterative training situation. Thus, when setting the parameters, different product values are set empirically and the product value that achieves the best performance in the training set is selected. And the minimum sample division is used for controlling the lower limit of the number of samples in the leaf node and improving the robustness of the model. In general, the value of the hyper-parameter needs to be reasonably adjusted according to the data condition of the application scene.
Specifically, GBRT model f (x) is an additive model of the form:
wherein h ism(x) Is a basis function, commonly referred to as weak learner, gamma, under the concept of boostingmIs the weight corresponding to the weak learner, and M is the sum of the number of weak learners. GBRT uses a fixed-size decision tree as a weak learner. Similar to other boosting algorithm ideas, GBRT greedily constructs an additive model:
Fm(x)=Fm-1(x)+γmhm(x)
wherein, Fm (x) represents the GBRT model obtained in the mth iteration. Wherein hm (x) is composed ofAnd (6) obtaining. n is the total number of training samples, L is the selected loss function, yi is the label of the ith sample, Fm-1(xi) is the predicted value of the GBRT model obtained from the (m-1) iteration on the ith sample, and h (xi) is the predicted value of the weak learner to be obtained on the ith sample. And gamma ismByAnd (6) obtaining. n is the total number of training samples, L is the selected loss function, yi is the label of the ith sample, Fm-1(xi) is the predicted value of the GBRT model obtained from the (m-1) iteration on the ith sample
Initial model F0Is problem-related, for least squares regression, the average of the target values is usually chosen.
That is, the model is not trained
In this embodiment, the model building module further includes:
and the model test unit is suitable for verifying the scene slide-out time prediction model by using a test set and evaluating the performance.
Specifically, the field slide-out time prediction model is verified by using the test set, and a mean square error is adopted for performance evaluation in the performance evaluation, wherein the calculation formula is as follows:
where N is the number of test set samples, oiIs the actual glide time, p, of the ith sampleiIs the predicted glide time for the model.
In this example, MSE was used to monitor the performance change of the model during training and testing, with the results shown in fig. 6. Finally, the MSE reached 2.5 in the training set and the performance in the test set was 5.5. Although there is some distance in the MSE performance of the training and test sets, it reflects to some extent the generalization capability of the model.
On the other hand, the following table compares the prediction accuracy of the test set within different error ranges. In all test sets, 85.7% of the data sets had a glide time error within 3 minutes; over 93% of the data with prediction errors between 4 minutes; about 96.5% of the data, with an error of within 5 minutes. According to the verification result on the test set, the designed data mining model and algorithm can better meet the precision requirement of the actual scene dynamic slide-out time prediction task.
Test set accuracy within different error ranges
Error range | [-3,3] | [-4,4] | [-5,5] |
Accuracy of measurement | 85.7% | 93.1% | 96.5% |
In summary, the invention provides a scene variable slide-out time prediction system based on big data deep learning. The scene variable slide-out time prediction system based on big data deep learning comprises: the data set establishing module is suitable for acquiring historical operating data and performing data cleaning to obtain a data set; the index definition and quantification module is suitable for defining and quantifying the traffic condition index of the scene traffic characteristic; the characteristic set extraction module is suitable for analyzing and extracting a characteristic set influencing the slide-out time of the scene based on the data set and the traffic condition indexes; the model building module is suitable for building a scene slide-out time prediction model through an integrated machine learning method according to the feature set, and the prediction module is suitable for completing prediction of the scene slide-out time of the airport through the scene slide-out time prediction model. . The method comprises the steps of processing original recorded data of an airport, modeling traffic conditions of the airport scene, analyzing and extracting influence factors of sliding time, training a GBRT ensemble learning model, further obtaining a sliding-out time prediction model, and providing data basis for management and optimization of airport operation.
In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.
Claims (9)
1. A scene variable slide-out time prediction system based on big data deep learning, comprising:
the data set establishing module is suitable for acquiring historical operating data and performing data cleaning to obtain a data set;
the index definition and quantification module is suitable for defining and quantifying the traffic condition index of the scene traffic characteristic;
the characteristic set extraction module is suitable for analyzing and extracting a characteristic set influencing the slide-out time of the scene based on the data set and the traffic condition indexes;
the model establishing module is suitable for establishing a scene slide-out time prediction model through an integrated machine learning method according to the feature set;
and the prediction module is suitable for completing prediction of the airport scene slide-out time through the scene slide-out time prediction model.
2. The big-data deep learning-based scene variable slide-out time prediction system of claim 1,
the data set building module comprises:
the original data set acquisition unit is suitable for acquiring historical operating data to construct an original data set;
the data cleaning unit is suitable for cleaning the original data set;
the data set acquisition unit is suitable for integrating the original data set to acquire a data set;
and the data set dividing unit is suitable for dividing the data set into a training set and a test set.
3. The big-data deep learning-based scene variable slide-out time prediction system of claim 2,
the index definition and quantization module comprises:
the network topological structure acquisition unit is suitable for modeling the traffic situation of the airport scene by adopting a macroscopic space-time network topological model to acquire a macroscopic space-time network topological structure;
and the quantization unit is suitable for defining four types of indexes for representing scene traffic based on a macroscopic space-time network topological structure and quantizing the indexes.
4. The big-data deep learning-based scene variable slide-out time prediction system of claim 3,
the feature set extraction module comprises:
the original feature set extracting unit is suitable for extracting features influencing scene slide-out time from the data set and the traffic condition indexes and forming an original feature set;
a feature analysis unit adapted to perform feature analysis on the features in the original feature set
And the feature set construction unit is suitable for constructing a feature set according to the feature analysis result.
5. The big-data deep learning-based scene variable slide-out time prediction system of claim 4,
the feature analysis unit, namely:
performing feature analysis on the features of the original feature set by adopting one or more of correlation coefficient, standardized mutual information and factor analysis;
correlation measurement correlation coefficient reflects statistic of linear correlation degree of two variables, and the value of the statistic is [ -1, 1]The larger the absolute value is, the stronger the linear correlation degree is, the positive value is positive correlation, the negative value is negative correlation, X, Y is used for representing any two variables, and the correlation measure is a correlation coefficient PX,YIs defined as:
where Cov (X, Y) is the covariance of X and Y, σX、σYIs a standard deviation of X, Y, μX、μYIs the mean value of X, Y;
the standardized mutual information is a common correlation metric with a value range of [0, 1 ]]The larger the value is, the larger the degree of correlation between the variables is, and the mutual information U is normalizedX,YIs defined as:
wherein, IX,YIs mutual information of X, Y, HX、HYFor X, Y respective entropy, p (x, y) is the joint probability distribution of X, Y, p (x)) And p (y) is X, Y
Factor analysis, that is, the extracted feature x is completely controlled by a potential influence factor z, and the expression is x ═ Az + epsilon, where a is a coefficient matrix and epsilon is an error, and the influence factors are independent from each other and the influence factors and the error are independent from each other, and finally, the derivation is carried out: sigmax=AAT+∑εWhere Σ represents the covariance matrix, so that a and z can be found.
6. The big-data deep learning-based scene variable slide-out time prediction system of claim 5,
the model building module comprises:
the initial model acquisition unit is suitable for acquiring an initial model by taking the feature set as the input of the integrated learning model GBRT;
and the training unit is suitable for training the initial model and adjusting the value of the hyper-parameter so as to complete the establishment of the scene slide-out time prediction model.
7. The big-data deep learning-based scene variable slide-out time prediction system of claim 6,
the training unit, namely:
selecting the maximum depth as a control mode for controlling the decision tree;
selecting least squares as a loss function;
under the optimal product value, selecting the maximum learning rate and the corresponding minimum estimator quantity which can keep the performance stable;
setting minimum samples to be divided into 200 according to the whole data distribution of the sliding-out time in the training set;
and finishing the training of the initial model so as to establish a scene slide-out time prediction model.
8. The big-data deep learning-based scene variable slide-out time prediction system of claim 7,
the model building module also comprises
And the model test unit is suitable for verifying the scene slide-out time prediction model by using a test set and evaluating the performance.
9. The big-data deep learning-based scene variable slide-out time prediction system of claim 8,
the performance evaluation in the model test unit adopts a mean square error, and the calculation formula is as follows:
where N is the number of test set samples, oiIs the actual glide time, p, of the ith sampleiIs the predicted glide time for the model.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911044358.3A CN110852497A (en) | 2019-10-30 | 2019-10-30 | Scene variable slide-out time prediction system based on big data deep learning |
PCT/CN2020/089916 WO2021082394A1 (en) | 2019-10-30 | 2020-05-13 | Layout-variable taxiing-out time prediction system based on big data deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911044358.3A CN110852497A (en) | 2019-10-30 | 2019-10-30 | Scene variable slide-out time prediction system based on big data deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110852497A true CN110852497A (en) | 2020-02-28 |
Family
ID=69599051
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911044358.3A Pending CN110852497A (en) | 2019-10-30 | 2019-10-30 | Scene variable slide-out time prediction system based on big data deep learning |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110852497A (en) |
WO (1) | WO2021082394A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021082393A1 (en) * | 2019-10-30 | 2021-05-06 | 南京智慧航空研究院有限公司 | Airport surface variable slide-out time prediction method based on big data deep learning |
WO2021082394A1 (en) * | 2019-10-30 | 2021-05-06 | 南京智慧航空研究院有限公司 | Layout-variable taxiing-out time prediction system based on big data deep learning |
CN114783212A (en) * | 2022-03-29 | 2022-07-22 | 南京航空航天大学 | Method for constructing model feature set for prediction of departure taxi time of aircraft in busy airport |
CN117253584A (en) * | 2023-02-14 | 2023-12-19 | 南雄市民望医疗有限公司 | Hemodialysis component detection-based dialysis time prediction system |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113743498A (en) * | 2021-09-02 | 2021-12-03 | 美视(杭州)人工智能科技有限公司 | Solution method for fitting OKAI by using orthokeratology mirror |
CN117668497B (en) * | 2024-01-31 | 2024-05-07 | 山西卓昇环保科技有限公司 | Carbon emission analysis method and system based on deep learning under environment protection |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107463993A (en) * | 2017-08-04 | 2017-12-12 | 贺志尧 | Medium-and Long-Term Runoff Forecasting method based on mutual information core principle component analysis Elman networks |
CN108846523A (en) * | 2018-07-31 | 2018-11-20 | 中国民航大学 | A kind of flight for putting forth coasting time dynamic prediction method based on Bayesian network |
US20190108758A1 (en) * | 2017-10-06 | 2019-04-11 | Tata Consultancy Services Limited | System and method for flight delay prediction |
US20190316909A1 (en) * | 2018-04-13 | 2019-10-17 | Passur Aerospace, Inc. | Estimating Aircraft Taxi Times |
CN110363333A (en) * | 2019-06-21 | 2019-10-22 | 南京航空航天大学 | The prediction technique of air transit ability under the influence of a kind of weather based on progressive gradient regression tree |
CN110363361A (en) * | 2019-07-25 | 2019-10-22 | 四川青霄信息科技有限公司 | A kind of method and system for predicting variable sliding time based on big data |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100185426A1 (en) * | 2009-01-16 | 2010-07-22 | Rajesh Ganesan | Predicting Aircraft Taxi-Out Times |
CN106339358B (en) * | 2016-08-16 | 2018-11-09 | 南京航空航天大学 | Aircraft scene coasting time prediction technique based on multiple regression analysis |
CN106529734A (en) * | 2016-11-18 | 2017-03-22 | 中国民航大学 | Flight taxiing time prediction time based on a k-nearest neighbor (KNN) and support vector regression (SVR) |
CN110852497A (en) * | 2019-10-30 | 2020-02-28 | 南京智慧航空研究院有限公司 | Scene variable slide-out time prediction system based on big data deep learning |
CN110826788A (en) * | 2019-10-30 | 2020-02-21 | 南京智慧航空研究院有限公司 | Airport scene variable slide-out time prediction method based on big data deep learning |
-
2019
- 2019-10-30 CN CN201911044358.3A patent/CN110852497A/en active Pending
-
2020
- 2020-05-13 WO PCT/CN2020/089916 patent/WO2021082394A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107463993A (en) * | 2017-08-04 | 2017-12-12 | 贺志尧 | Medium-and Long-Term Runoff Forecasting method based on mutual information core principle component analysis Elman networks |
US20190108758A1 (en) * | 2017-10-06 | 2019-04-11 | Tata Consultancy Services Limited | System and method for flight delay prediction |
US20190316909A1 (en) * | 2018-04-13 | 2019-10-17 | Passur Aerospace, Inc. | Estimating Aircraft Taxi Times |
CN108846523A (en) * | 2018-07-31 | 2018-11-20 | 中国民航大学 | A kind of flight for putting forth coasting time dynamic prediction method based on Bayesian network |
CN110363333A (en) * | 2019-06-21 | 2019-10-22 | 南京航空航天大学 | The prediction technique of air transit ability under the influence of a kind of weather based on progressive gradient regression tree |
CN110363361A (en) * | 2019-07-25 | 2019-10-22 | 四川青霄信息科技有限公司 | A kind of method and system for predicting variable sliding time based on big data |
Non-Patent Citations (1)
Title |
---|
刘桂荣等: "《统计学原理 第2版》", 30 June 2019, 华东理工大学出版社 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021082393A1 (en) * | 2019-10-30 | 2021-05-06 | 南京智慧航空研究院有限公司 | Airport surface variable slide-out time prediction method based on big data deep learning |
WO2021082394A1 (en) * | 2019-10-30 | 2021-05-06 | 南京智慧航空研究院有限公司 | Layout-variable taxiing-out time prediction system based on big data deep learning |
CN114783212A (en) * | 2022-03-29 | 2022-07-22 | 南京航空航天大学 | Method for constructing model feature set for prediction of departure taxi time of aircraft in busy airport |
CN117253584A (en) * | 2023-02-14 | 2023-12-19 | 南雄市民望医疗有限公司 | Hemodialysis component detection-based dialysis time prediction system |
Also Published As
Publication number | Publication date |
---|---|
WO2021082394A1 (en) | 2021-05-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110826788A (en) | Airport scene variable slide-out time prediction method based on big data deep learning | |
CN110852497A (en) | Scene variable slide-out time prediction system based on big data deep learning | |
CN107086935B (en) | People flow distribution prediction method based on WIFI AP | |
CN111401601B (en) | Delay propagation-oriented flight take-off and landing time prediction method | |
CN110503245B (en) | Prediction method for large-area delay risk of airport flight | |
CN104156594B (en) | Dynamic flight station-crossing time estimation method based on Bayes network | |
CN110570693B (en) | Flight operation time prediction method based on reliability | |
Choi et al. | Artificial neural network models for airport capacity prediction | |
CN111652427A (en) | Flight arrival time prediction method and system based on data mining analysis | |
CN113706931B (en) | Airspace flow control strategy recommendation method and device, electronic equipment and storage medium | |
CN111160612A (en) | Off-site flight delay analysis and prediction method based on weather influence | |
CN110796315B (en) | Departure flight delay prediction method based on aging information and deep learning | |
Provan et al. | A probabilistic airport capacity model for improved ground delay program planning | |
Ramanujam et al. | Estimation of arrival-departure capacity tradeoffs in multi-airport systems | |
CN112419131A (en) | Method for estimating traffic origin-destination demand | |
CN116956757A (en) | Departure flight taxi time prediction method, electronic device, and storage medium | |
CN113610282A (en) | Flight taxi time prediction method | |
CN115752708A (en) | Airport single-point noise prediction method based on deep time convolution network | |
CN116911434A (en) | Airport operation situation prediction method, device and system and storage medium | |
CN110009939B (en) | Flight delay prediction and sweep analysis method based on ASM | |
CN115966107A (en) | Airport traffic flow prediction method based on graph neural network | |
CN116109212B (en) | Airport operation efficiency evaluation index design and monitoring method | |
CN115759386B (en) | Method and device for predicting flight execution result of civil aviation flight and electronic equipment | |
CN112365037A (en) | Airport airspace flow prediction method based on long-term and short-term data prediction model | |
Meijers | Data-driven predictive analytics of runway occupancy time for improved capacity at airports |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200228 |
|
RJ01 | Rejection of invention patent application after publication |