WO2021082394A1

WO2021082394A1 - Layout-variable taxiing-out time prediction system based on big data deep learning

Info

Publication number: WO2021082394A1
Application number: PCT/CN2020/089916
Authority: WO
Inventors: 周龙
Original assignee: 南京智慧航空研究院有限公司
Priority date: 2019-10-30
Filing date: 2020-05-13
Publication date: 2021-05-06
Also published as: CN110852497A

Abstract

Disclosed is a layout-variable taxiing-out time prediction system based on big data deep learning. The system comprises: a data set establishment module adapted for acquiring historical operation data and performing data cleaning to obtain a data set; an indicator defining and quantifying module adapted for defining and quantifying a traffic condition indicator of a layout traffic characteristic; a feature set extraction module adapted for analyzing and extracting a feature set, which influences layout taxiing-out time, on the basis of the data set and the traffic condition indicator; a model establishment module adapted for establishing a layout taxiing-out time prediction model by an integrated machine learning method according to the feature set; and a prediction module adapted for completing prediction of airport layout taxiing-out time by means of the layout taxiing-out time prediction model. Airport original record data is processed, modeling is performed on airport layout traffic conditions, taxiing time influence factors are analyzed and extracted, and a GBRT integrated learning model is trained, such that the taxiing-out time prediction model is obtained, and a data basis is provided for management and optimization of airport operations.

Description

A scene variable sliding-out time prediction system based on big data deep learning

Technical field

The invention relates to the field of airport traffic control, in particular to a scene variable sliding-out time prediction system based on big data deep learning.

Background technique

In the prior art, most of the aircraft’s taxi-out time prediction models are built from two aspects: simulation and analysis. The simulation model uses the existing airport topology model, conflict detection and resolution as factors, and obtains the taxi-out time by simulating the operation of all incoming and outgoing aircraft on the ground. The simulation model has strong pertinence and is not universally applicable to different airports. Previous research on analytical models mainly focused on models such as linear regression, and there were also some studies that tried to use machine learning techniques. For the analysis model, determining the main factors affecting the taxi time is a focus of the research. Analytical models usually have shortcomings such as incomplete influencing factors, and their actual reference value is weak and cannot meet actual application requirements.

How to solve the above-mentioned problems is urgently needed at present.

Summary of the invention

The purpose of the present invention is to provide a scene variable sliding-out time prediction system based on deep learning of big data, so as to achieve the purpose of improving the comprehensiveness of influencing factors in the analysis model.

In order to solve the above technical problems, the present invention provides a scene variable sliding-out time prediction system based on big data deep learning, including:

The data set establishment module is suitable for obtaining historical operating data and performing data cleaning to obtain the data set;

Index definition and quantification module, suitable for defining and quantifying traffic condition indexes of traffic characteristics on the scene;

The feature set extraction module is suitable for analyzing and extracting the feature set that affects the sliding-out time of the scene based on the data set and the traffic condition index;

The model building module is suitable for establishing the prediction model of the time of the scene sliding out through the integrated machine learning method based on the feature set;

The prediction module is suitable for completing the prediction of the airport surface slide-out time through the surface slide-out time prediction model.

Further, the data set establishment module includes:

The original data set acquisition unit is suitable for acquiring historical operating data to construct the original data set;

The data cleaning unit is suitable for data cleaning of the original data set;

The data set acquisition unit is suitable for data integration of the original data set to acquire the data set;

The data set dividing unit is suitable for dividing the data set into a training set and a test set.

Further, the indicator definition and quantification module includes:

The network topology acquisition unit is suitable for adopting the macroscopic spatiotemporal network topology model to model the traffic situation of the airport surface to obtain the macroscopic spatiotemporal network topology;

The quantification unit is suitable for defining and quantifying four types of indicators reflecting the traffic volume of the scene based on the macroscopic spatio-temporal network topology structure.

Further, the feature set extraction module includes:

The original feature set extraction unit is suitable for extracting features that affect the sliding time of the scene from the data set and traffic condition indicators and form the original feature set;

Feature analysis unit, suitable for feature analysis of the features in the original feature set

The feature set building unit is suitable for building a feature set based on the result of feature analysis.

Further, the feature analysis unit is to use one or more of the three of the correlation measurement correlation coefficient, standardized mutual information and factor analysis to perform feature analysis on the features of the original feature set;

Correlation measurement The correlation coefficient reflects the statistic of the degree of linear correlation between two variables. Its value is [-1,1]. The larger the absolute value, the stronger the degree of linear correlation. A positive value indicates a positive correlation, and a negative value indicates a negative correlation. X and Y are used to refer to any two variables, and the correlation coefficient P _{X, Y} is defined as:

Where Cov(X, Y) is the covariance of X and Y, σ _X and σ _Y are the standard deviations of X and Y, and μ _X and μ _Y are the mean values of X and Y;

Standardized mutual information is a commonly used correlation measure, and its value range is [0, 1]. The larger the value, the greater the degree of correlation between variables. The standardized mutual information U _{X, Y} is defined as:

Among them, I _{X, Y} are the mutual information of X and Y, H _X and H _Y are the respective entropy of X and Y, p(x,y) is the joint probability distribution of X and Y, p(x), p( y) is the probability distribution of X and Y

Factor analysis, that is, the extracted feature x is completely controlled by the potential influence factor z, the expression is =Az+ε, where A is the coefficient matrix, and ε is the error, plus the mutual independence between the influence factors, the influence factor and the error Independent of each other, the final derivation is: ∑ _x =AA ^T +∑ _ε , where ∑ represents the covariance matrix, so that A and z can be calculated.

Further, the model establishment module includes:

The initial model acquisition unit is suitable for acquiring the initial model by using the feature set as the input of the integrated learning model GBRT;

The training unit is suitable for training the initial model and adjusting the value of hyperparameters to complete the establishment of the prediction model for the time of the scene sliding out

Further, the training unit is:

Select the maximum depth as the control method to control the decision tree;

Choose least squares as the loss function;

Under the optimal product value, select the largest learning rate and the corresponding smallest number of estimators that can maintain stable performance;

According to the overall data distribution of the sliding-out time in the training set, set the minimum sample to be divided into 200;

Complete the training of the initial model to establish a prediction model for the time to slide out of the scene.

Further, the model establishment module also includes

The model testing unit is suitable for using the test set to verify and evaluate the performance of the scene sliding-out time prediction model.

Further, the performance evaluation in the model test unit adopts the mean square error, and the calculation formula is:

Where N is the number of samples in the test set, o _i is the actual taxi time of the i-th sample, and p _i is the predicted taxi time of the model.

The beneficial effect of the present invention is that the present invention provides a scene variable sliding-out time prediction system based on big data deep learning. The scene variable sliding-out time prediction system based on deep learning of big data includes: a data set establishment module, which is suitable for obtaining historical operating data and data cleaning to obtain a data set; an index definition and quantification module, which is suitable for defining and quantifying the traffic characteristics of the scene The traffic condition index; The feature set extraction module is suitable for analyzing and extracting the feature set that affects the scene slip-out time based on the data set and the traffic condition index; the model building module is suitable for establishing the scene slip-out time through the integrated machine learning method based on the feature set The prediction model, the prediction module, is suitable for predicting the airport surface slip-out time through the surface slip-out time prediction model. . Process the original recorded data of the airport, model the traffic conditions of the airport scene, analyze and extract the factors affecting taxi time, train the GBRT integrated learning model, and then obtain the taxi time prediction model, which provides a data basis for the management and optimization of airport operations.

Description of the drawings

The present invention will be further described below in conjunction with the drawings and embodiments.

Fig. 1 is a schematic block diagram of a scene variable sliding-out time prediction system based on big data deep learning provided by the present invention.

Fig. 2 is the macroscopic spatio-temporal network topology structure of the coasting process provided by the present invention.

Fig. 3 is the correlation coefficient of the correlation measurement between the candidate influencing factors and the sliding-out time provided by the present invention.

Fig. 4 is the standardized mutual information relationship between candidate influencing factors and slip-out time provided by the present invention.

Figure 5 is a factor analysis result diagram of candidate influencing factors provided by the present invention.

Figure 6 is a diagram of the performance change process of the model training and testing phases provided by the present invention.

Detailed ways

The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are all simplified schematic diagrams, which merely illustrate the basic structure of the present invention in a schematic manner, so they only show the constitutions related to the present invention.

Example 1

As shown in Figure 1, this embodiment 1 provides a scene variable sliding-out time prediction system based on big data deep learning, processing the original recorded data of the airport, modeling the airport scene traffic conditions, analyzing and extracting the impact of taxi time Factors, train the GBRT integrated learning model, and then obtain the sliding-out time prediction model, which provides a data basis for the management and optimization of airport operations. Specifically, the scene variable sliding-out time prediction system based on big data deep learning includes:

The model building module is suitable for establishing the prediction model of the time of the scene sliding out through the integrated machine learning method based on the feature set,

In this embodiment, the data set establishment module includes:

The original data set obtaining unit is suitable for obtaining historical operating data to construct an original data set.

Specifically, extract as much data as possible from the airport surface operation database to form the original data set of airport flight departure operations. Collect taxi trajectory related information, including departure runway, departure stand, corridor entrance number, taxi length, etc.; collect flight attribute related information, including flight number, flight type, aircraft type, airline company, engine type, etc.; collect traffic Control-related information, including whether it is restricted, controller information, call information, delays, local weather, airport broadcasts, etc.; collect flight plan-related information, including departure airport, destination airport, planned departure time, and planned withdrawal Time, waypoint information, etc.; collect the actual recorded information of the taxiing process, including the time to withdraw the gear, the time to roll out, the requested/permitted driving time, the actual take-off time, taxi speed, waiting time at the end of the runway, etc.

The data cleaning unit is suitable for data cleaning of the original data set.

Specifically, consider the actual data set obtained by the airport, and formulate a specific processing plan for actual work. In terms of processing missing values, two methods are used: setting default values and deleting directly. Set the default default value, set the default default value "No" for "Restricted or not", set the default default value "None" for "Restricted content". After the default values are filled in, the attributes with more than half of the information missing are directly deleted, including "request to drive", "permission to drive", "block withdrawal time", "wake", "taxi speed", and "departure" Number of queues". After that, the completeness of the data set is checked, and data entries with missing information are deleted. In the aspect of outlier processing, firstly, the data type of all attributes and the basic check whether it is out of bounds are performed, and then the delimited detection method is used to further check the outliers for some attributes. Based on the actual situation of the airport surface operation, the attribute value range is defined, and the data whose value is not in the corresponding range is regarded as an abnormal value. Finally, data entries containing outliers are deleted from the data set. The attribute value range is shown in the following table.

Value range of some attributes

The data set acquisition unit is suitable for data integration of the original data set to acquire the data set.

Specifically, this step includes redundant attribute identification, data type conversion, and logic error checking. Identify and delete redundant attributes, identify redundant attributes that carry less information by calculating the information entropy of each attribute, and identify redundant attributes contained in other attributes by calculating mutual information between attributes. The redundant attributes "departure airport" and "execution date" have been deleted. Convert the data type to convert the information that is only used for identification in the non-numeric attribute into an integer value type that is easy to follow-up processing and use. The information contained in the "restricted content" attribute is difficult to quantify and is deleted after comprehensive consideration. Check for logical errors, consider the physical meaning of each feature, establish constraint relationships between features, and eliminate logical errors. Check the correspondence between the model and the number of engines, check the sequence of time nodes in the scene operation, and directly delete the information items with logical errors.

Specifically, the data set is divided into two parts, namely the training set and the test set. 90% of the data is the training set used in the training phase of the model, and 10% of the data is used as the test set to verify the effectiveness and robustness of the model. "That is to say, the training set and the test set are of the same origin. After the final processed data set is obtained, 10% of the data set is reserved for testing before the machine learning model training, and the remaining data set is used for testing. The next 90% is used as the training set to train the machine learning model.

In this embodiment, the indicator definition and quantification module includes:

Specifically, the macroscopic spatio-temporal network topology model is used to model the traffic situation of the airport surface. Figure 2 visualizes the general situation of the network topology during the taxiing process of departure and arrival in any time and space. In the actual operation of the airport scene, the processes of sliding in and sliding out are mutually coupled and interdependent. Therefore, the influence of the port arrival on the port departure process is also considered in the model. The spatio-temporal network topology model is a general framework for describing the macro resource flow of the airport system. As shown in Figure 2, the departure d ₁ ,..., d ₄ represent all four different relationships with the reference departure flight d _{0, which are "} Before launch, before takeoff", "before launch, after takeoff", "after launch, before takeoff" and "after launch, after takeoff". Similarly, inbound a ₁ ,..., a ₄ represent all four different relationships with the reference inbound flight a ₀ , namely "before landing, before arrival", "before landing, after arrival", and "after landing" , Before it is in place" and "after it is in place, after it is in place". t _on , t _in represents the landing time and arrival time of the reference inbound flight a _0. t _out , t _off means the launch time and departure time of the reference departure flight. δ represents the time threshold of arrival and departure.

Specifically, based on the macroscopic spatio-temporal network topology, a total of eight indicators in four categories that reflect on-site traffic are defined. These four categories are the surface instantaneous flow index (SIFIs), the surface cumulative flow index (SCFIs), the aircraft queue length index (AQLIs) and the slot resource demand index (SRDIs). Two statistics are calculated for each category, namely the number of departing aircraft (prefixed with D-) and the number of arriving aircraft (prefixed with A-). The following table shows the various statistics of the departure flight in Figure 2 _{with d 0 as the reference.}

Statistic results of traffic situation indicators for departure flight d ₀

Taking Figure 2 as an example, the definition and calculation method of the indicators in Table 1 are described in detail below. For any departure flight d ₀ , SIFIs include D-SIFI and A-SIFI, which respectively represent the number of taxi departure and arrival flights _{when d 0 is launched from the boarding gate.} SCFIs include D-SCFI and A-SCFI, which respectively represent the amount of overlap between _{the taxi period d 0 and the taxi period of the departing and arriving aircraft.} AQLIs include D-AQLI and A-AQLI, which respectively represent the number of aircraft taking off and landing on the runway during the entire taxiing process _{d 0.} SRDIs include D-SRDI and A-SRDI, which represent the number of aircraft launched and landed during the departure slot of _{aircraft d 0} _{[t 0} -δ, t _{0 +δ].} Generally speaking, the value of δ can be set between 10 minutes and 30 minutes.

In this embodiment, the feature set extraction module includes:

The original feature set extraction unit is suitable for extracting features that affect the sliding time of the scene from the data set and the traffic condition indicators to form the original feature set.

Specifically, the data set building module and the relevant factors that affect the time of the scene sliding out acquired by the index definition and quantification module are sorted to form the original feature set. Process the original feature set, and extract new features from the original feature set to replace some of the features in the original feature set.

The factors related to the sliding-out time of the affected scene acquired by the data set building module are: flight number, flight attributes, destination airport, planned departure time, aircraft type, airline company, launch time, actual departure time, departure runway, departure stand , Parking space type, engine type, corridor entrance, restricted or not, boarding gate. The time-related factors of the impact scene obtained by S120 are: D-SIFI, D-SCFI, D-AQLI, D-SRDI, Corridor_NO. Use the difference between the roll-out time and the actual take-off time as the taxi time of the scene, instead of the original feature. Extract new features of month, day, week, hour, and minute from the planned departure time to replace the original feature. Further divide and analyze the characteristics of parking bays and gates. The corresponding relationship between the runway and the stand/gate is extracted as a new feature. The final acquired original feature set, that is, the candidate influencing factors are shown in the following table:

Candidate influence factors

Specifically, based on the analysis result of the feature analysis unit, important features are selected from the original feature set formed by the original feature set extraction unit to form a feature set for the integrated machine learning model. The features that are less correlated with the coasting time of the surface are screened out. Including "engine type", "slot type", "month", "week", "day", "minute". The final feature set, namely the influencing factors, is shown in the following table:

The final selection of influencing factors

In this embodiment, the feature analysis unit includes:

Use one or more of the correlation coefficient, standardized mutual information, and factor analysis to perform feature analysis on the features of the original feature set. Figure 3, Figure 4, and Figure 5 respectively show the candidate influencing factors and the slip-out Time Pearson correlation coefficient, standardized mutual information between candidate influencing factors and slip-out time, and factor analysis results of candidate influencing factors.

In this embodiment, the model establishment module includes:

The training unit is suitable for training the initial model and adjusting the values of hyperparameters to complete the establishment of the prediction model for the time to slide out of the scene.

In this embodiment, the training unit is: select "maximum depth" as the control method for controlling the decision tree; select "least squares" as the loss function; under the optimal product value, select the maximum learning rate that can maintain stable performance And the corresponding minimum number of estimators; according to the overall data distribution of the sliding-out time in the training set, the minimum sample is set to be divided into 200; the initial model training is completed to establish the scene sliding-out time prediction model.

Specifically, the GradientBoostedRegressionTrees (GBRT) model, which is a typical representative of integrated learning, is used to complete the prediction operation of the slide-out time of the scene. The feature set obtained in step S133 is used as the input of the model, and the GBRT model is quickly trained by executing the algorithm in the scikit-learn library. The hyperparameters that need to be set are: decision tree size control, loss function type, number of estimators and learning rate, and minimum sample partition. There are two options for controlling the size of the decision tree, which are "max_depth" and "max_leaf_nodes". There are four optional loss functions in the regression task, namely "least squares (ls)", "least absolute deviation (lad)", "Huber loss (huber)" and "quantile loss (quantile)". Since the learning rate and the number of estimators have a high degree of interaction, the product of the two roughly reflects the iterative training situation. Therefore, when setting parameters, set different product values based on experience, and choose the product value that obtains the best performance in the training set. The minimum sample partition is used to control the lower limit of the number of samples in the leaf node, and is used to improve the robustness of the model. In general, the hyperparameter values need to be adjusted reasonably according to the data conditions of the application scenario.

Specifically, the GBRT model F(x) is an additive model of the following form:

Among them, h _m (x) is a basis function, usually called a weak learner under the concept of boosting, γ _m is the corresponding weight of the weak learner, and M is the sum of the number of weak learners. GBRT uses a fixed-size decision tree as a weak learner. Similar to other boosting algorithm ideas, GBRT greedily builds an additive model:

F _m (x)=F _m-1 (x)+γ _m h _m (x)

Among them, Fm(x) represents the G B R T model obtained in the mth iteration. Where h m(x) is given by

inferred. n is the total number of training samples, L is the selected loss function, yi is the label of the i-th sample, Fm-1(xi) is the prediction value of the i-th sample of the GBRT model obtained in the m-1th iteration, h (xi) is the predicted value of the i-th sample of the weak learner to be obtained. While γ _{m is} determined by

inferred. n is the total number of training samples, L is the selected loss function, yi is the label of the i-th sample, and Fm-1(xi) is the prediction value of the i-th sample obtained by the GBRT model obtained in the m-1 iteration

The initial model F ₀ is related to the problem. For least squares regression, the average value of the target value is usually selected.

That is, the untrained state of the model is

In this embodiment, the model establishment module further includes:

Specifically, the test set is used to verify the scene sliding-out time prediction model and the performance evaluation in the performance evaluation adopts the mean square error, and the calculation formula is:

In this embodiment, the MSE is used to monitor the performance changes of the model during training and testing, and the result is shown in FIG. 6. In the end, MSE reached 2.5 in the training set, and the performance on the test set was 5.5. Although there is a certain distance between the MSE performance of the training set and the test set, it reflects the generalization ability of the model to a certain extent.

On the other hand, the table below compares the prediction accuracy within different error ranges of the test set. In all test sets, 85.7% of the data sets have a taxi time error within 3 minutes; more than 93% of the data have a prediction error of 4 minutes; about 96.5% of the data have an error of less than 5 minutes. From the verification results on the above test set, it can be seen that the designed data mining model and algorithm can better meet the accuracy requirements of the actual scene dynamic sliding-out time prediction task.

Test set accuracy within different error ranges

误差范围tolerance scope	[-3,3][-3,3]	[-4,4][-4,4]	[-5,5][-5,5]
精度Precision	85.7％85.7%	93.1％93.1%	96.5％96.5%

In summary, the present invention provides a scene variable sliding-out time prediction system based on big data deep learning. The scene variable sliding-out time prediction system based on deep learning of big data includes: a data set establishment module, which is suitable for obtaining historical operating data and data cleaning to obtain a data set; an index definition and quantification module, which is suitable for defining and quantifying the traffic characteristics of the scene The traffic condition index; The feature set extraction module is suitable for analyzing and extracting the feature set that affects the scene slip-out time based on the data set and the traffic condition index; the model building module is suitable for establishing the scene slip-out time through the integrated machine learning method based on the feature set The prediction model, the prediction module, is suitable for predicting the airport surface slip-out time through the surface slip-out time prediction model. . Process the original recorded data of the airport, model the traffic conditions of the airport scene, analyze and extract the factors affecting taxi time, train the GBRT integrated learning model, and then obtain the taxi time prediction model, which provides a data basis for the management and optimization of airport operations.

Taking the above-mentioned ideal embodiment according to the present invention as enlightenment, through the above-mentioned description content, relevant staff can make various changes and modifications without departing from the scope of the technical idea of the present invention. The technical scope of the present invention is not limited to the content of the description, and its technical scope must be determined according to the scope of the claims.

Claims

A scene variable sliding-out time prediction system based on big data deep learning is characterized in that it includes:

The data set establishment module is suitable for obtaining historical operating data and performing data cleaning to obtain the data set;

Index definition and quantification module, suitable for defining and quantifying traffic condition indexes of traffic characteristics on the scene;

The feature set extraction module is suitable for analyzing and extracting the feature set that affects the sliding-out time of the scene based on the data set and the traffic condition index;

The model building module is suitable for establishing the prediction model of the time of the scene sliding out through the integrated machine learning method based on the feature set;

The prediction module is suitable for completing the prediction of the airport surface slide-out time through the surface slide-out time prediction model.
The scene variable sliding-out time prediction system based on big data deep learning according to claim 1, characterized in that:

The data set establishment module includes:

The original data set acquisition unit is suitable for acquiring historical operating data to construct the original data set;

The data cleaning unit is suitable for data cleaning of the original data set;

The data set acquisition unit is suitable for data integration of the original data set to acquire the data set;

The data set dividing unit is suitable for dividing the data set into a training set and a test set.
The scene variable sliding-out time prediction system based on big data deep learning according to claim 2, characterized in that:

The indicator definition and quantification module includes:

The network topology acquisition unit is suitable for adopting the macroscopic spatiotemporal network topology model to model the traffic situation of the airport surface to obtain the macroscopic spatiotemporal network topology;

The quantification unit is suitable for defining and quantifying four types of indicators reflecting the traffic volume of the scene based on the macroscopic spatio-temporal network topology structure.
The scene variable sliding-out time prediction system based on big data deep learning according to claim 3, characterized in that:

The feature set extraction module includes:

The original feature set extraction unit is suitable for extracting features that affect the sliding time of the scene from the data set and traffic condition indicators and form the original feature set;

Feature analysis unit, suitable for feature analysis of the features in the original feature set

The feature set building unit is suitable for building a feature set based on the result of feature analysis.
The scene variable sliding-out time prediction system based on big data deep learning according to claim 4, characterized in that:

The feature analysis unit is:

Use one or more of correlation coefficients, standardized mutual information, and factor analysis to perform feature analysis on the features of the original feature set;

Correlation measurement The correlation coefficient reflects the statistic of the degree of linear correlation between two variables. Its value is [-1,1]. The larger the absolute value, the stronger the degree of linear correlation. A positive value indicates a positive correlation, and a negative value indicates a negative correlation. X and Y are used to refer to any two variables, and the correlation coefficient P X, Y is defined as:

Where Cov(X, Y) is the covariance of X and Y, σ X and σ Y are the standard deviations of X and Y, and μ X and μ Y are the mean values of X and Y;

Standardized mutual information is a commonly used correlation measure, and its value range is [0, 1]. The larger the value, the greater the degree of correlation between variables. The standardized mutual information U X, Y is defined as:

Among them, I X, Y are the mutual information of X and Y, H X and H Y are the respective entropy of X and Y, p(x,y) is the joint probability distribution of X and Y, p(x), p( y) is the probability distribution of X and Y

Factor analysis, that is, the extracted feature x is completely controlled by the potential impact factor z, and the expression is x=Az+ε, where A is the coefficient matrix and ε is the error. The errors are independent of each other, and the final deduction is: ∑ x =AA T +∑ ε , where ∑ represents the covariance matrix, so A and z can be calculated.
The scene variable sliding-out time prediction system based on big data deep learning according to claim 5, characterized in that:

The model building module includes:

The initial model acquisition unit is suitable for acquiring the initial model by using the feature set as the input of the integrated learning model GBRT;

The training unit is suitable for training the initial model and adjusting the values of hyperparameters to complete the establishment of the prediction model for the time to slide out of the scene.
The scene variable sliding-out time prediction system based on big data deep learning according to claim 6, characterized in that:

The training unit is:

Select the maximum depth as the control method to control the decision tree;

Choose least squares as the loss function;

Under the optimal product value, select the largest learning rate and the corresponding smallest number of estimators that can maintain stable performance;

According to the overall data distribution of the sliding-out time in the training set, set the minimum sample to be divided into 200;

Complete the training of the initial model to establish a prediction model for the time to slide out of the scene.
The scene variable sliding-out time prediction system based on big data deep learning according to claim 7, characterized in that:

The model building module also includes

The model testing unit is suitable for using the test set to verify and evaluate the performance of the scene sliding-out time prediction model.
The scene variable sliding-out time prediction system based on big data deep learning according to claim 8, characterized in that:

The performance evaluation in the model test unit adopts the mean square error, and the calculation formula is:

Where N is the number of samples in the test set, o i is the actual taxi time of the i-th sample, and p i is the predicted taxi time of the model.