CN109920248A

CN109920248A - A kind of public transport arrival time prediction technique based on GRU neural network

Info

Publication number: CN109920248A
Application number: CN201910162263.5A
Authority: CN
Inventors: 孙玲; 陆俊天; 施佺; 曹阳; 沈琴琴; 朱森来
Original assignee: Nantong University
Current assignee: Nantong University
Priority date: 2019-03-05
Filing date: 2019-03-05
Publication date: 2019-06-21
Anticipated expiration: 2039-03-05
Also published as: CN109920248B

Abstract

The public transport arrival time prediction technique based on GRU neural network that the invention discloses a kind of, the described method includes: exporting historical data to CSV formatted file by database, initial data is obtained, removal the promiscuity of initial data, complexity and coefficient are analyzed and processed to the initial data using HBase distributed data base and Spark memory processing technique；Based on single attribute and multiple-factor angle using feature correlation organon processing analysis treated the initial data, standard time series categorical data is obtained；Variables choice is carried out to standard time series categorical data using Lasso method, rejects the feature vector that relevance is weak in standard time series categorical data；Based on the prediction model that arrives at a station of GRU neural network building public transport, input has rejected the standard time series categorical data of the weak feature vector of relevance to the prediction model that arrives at a station, and realizes and operates to the time prediction that public transport is arrived at a station；The present invention can effectively promote the accuracy to the prediction of public transport arrival time.

Description

A kind of public transport arrival time prediction technique based on GRU neural network

Technical field

The present invention relates to the monitoring of city bus and arrival time Predicting Techniques, and in particular to one kind is based on GRU nerve net The public transport arrival time prediction technique of network.

Background technique

Public transport is the important infrastructure to involve the interests of the state and the people, Information of Development, intelligentized advanced public transit system There is positive effect to urban public transport management and service level is improved.Public traffic vehicles schedule management is advanced public transit system Core, and public transit vehicle arrival time is the key parameter of public transit vehicle dynamic dispatching management, it is according to warp that traditional public transport, which is arranged an order according to class and grade, Running time interval is tested between judged fixed station to emulate to arrival time.In general, this method make it is pre- Estimate timetable it is error it is big, fitting degree is low, not can reflect real situation.

Arrival time prediction reduces passenger waiting time, facilitates passenger's reasonable distribution for improving public transit vehicle punctuality Travel time provides valuable help.Domestic and foreign scholars have done a large amount of research in terms of public transit vehicle arrival time prediction, mention The main having time sequence of prediction model out (Time Series, TS) model, artificial neural network (Artificial Neural Network, ANN) model, support vector machines (SupportVector Machines, SVM) model and Kalman filter model Deng.It is mobile to establish autoregression by carrying out difference processing to the unstable data in time series with test by Yang et al. Average time series model is fitted by residual analysis and data, is predicted arrival time, but in the Model sequence White noise influences seriously, to cause final precision of prediction not high；Bear Wenhua et al. is recorded by BP network with Floating Car and coil Data as network inputs, using vehicle travel time as output, which needs mass data to be fitted, arameter optimization It is complicated.

Summary of the invention

It can not reflect for above-mentioned degree of fitting big to the error of public transport arrival time prediction and its prediction in the prior art Public transport arrive at a station truth the problem of, the present invention in proposing a kind of public transport arrival time prediction technique based on GRU neural network, Specific technical solution is as follows:

A kind of public transport arrival time prediction technique based on GRU neural network, which comprises

S1, historical data is exported by database to CSV formatted file, obtain initial data, utilize HBase distributed data Library and Spark memory processing technique are analyzed and processed the promiscuity, complicated of the removal initial data to the initial data Property and coefficient；

S2, based on single attribute and multiple-factor angle using feature correlation organon processing analysis treated the original Beginning data obtain standard time series categorical data；

S3, variables choice is carried out to the standard time series categorical data using Lasso method, when rejecting the standard Between the weak feature vector of relevance in sequence type data；

The weak feature vector of relevance has been rejected in S4, the prediction model that arrives at a station based on the building public transport of GRU neural network, input The standard time series categorical data is realized and is operated to the time prediction that public transport is arrived at a station to the prediction model that arrives at a station.

Further, step S1 includes:

S11, the CSV formatted file is obtained from HDFS using SparkSQL, forms Spark DataFrame structure number According to；

S12, the history GPS track data that specified public transport is extracted using SparkSQL, and utilize HBase distributed data base The history GPS track data are matched with bus station distance.

Further, described to utilize HBase distributed data base by the history GPS track data and bus station distance It is matched, comprising:

S121, one particular value of setting are used to judge whether the matching to be less than the specified arrival location of public transport, if described The result matched is less than the particular value, then marks public transport arrival location corresponding with the matching；

S122, two GPS positioning points for taking time interval to be greater than t seconds are appointed into the matching in chronological order, according to two The slope of anchor point line judges the uplink and downlink operation conditions of public transport；

S123, positioning time nearest with website in the matching, the speed of service and acceleration based on public transport, note are chosen Record arrival time；

S124, the initial data is ranked up with arrival time and public transport corresponding vehicle number, and defeated using Spark It stores out into HDFS.

Further, the public transport arrival location is counted at a distance from actual location place by Greate-Circle distance It calculates formula to calculate, the Greate-Circle distance calculation formula are as follows:

Wherein, R is earth radius, A_j, A_wThe respectively longitude, latitude in actual location place；B_j, B_wRespectively public transport is arrived It stands longitude, the latitude in place.

Further, the calculating of the slope formula are as follows:

In formula, D_lon、D_latRepresent route uplink terminus longitude, latitude, S_lon、S_latRepresent route uplink inception point warp Degree, latitude, A_lon、A_latRepresent latter station longitude, the latitude of rear vehicle driving trace, B_lon、B_latRepresent previous station longitude, latitude Degree；Wherein, if K > 0, then it represents that with it is in the same direction for uplink, i.e. uplink is on the contrary then be downlink.

Further, step S223 passes through formulaWherein, s is that the last anchor point is leaving from station Point distance, v₀For the running velocity of public transport at the public transport arrival location, v_tFor speed of arriving at a station, it is the last fixed for being defaulted as 0, t Time used in site to bus station.

Further, the Lasso method defined formula are as follows:Its In, x_ijIt is row vector β for regression coefficient, y indicates training label for i-th group of j variable.

Public transport arrival time prediction technique based on GRU neural network of the invention, first by Spark to initial data Process handles to obtain standard time series categorical data, realizes and arrives at a station the extractions of data to public transport；Then it is mentioned using Lasso method The weak feature vector of relevance realizes variables choice operation out；Finally mould is predicted using GRU neural network arriving at a station for public transport of building Type is realized and is operated to the specific time prediction that public transport is arrived at a station；Compared with prior art, GRU neural network of the present invention has logarithm According to the operating process screened and selected, by arriving at a station the screening and selection of data to public transport, the method for the present invention can be mentioned effectively Rise the accuracy predicted public transport arrival time.

Detailed description of the invention

Fig. 1 is the flow chart of the public transport arrival time prediction technique described in the embodiment of the present invention based on GRU neural network Signal；

Fig. 2 is that GPS data process flow is illustrated in the embodiment of the present invention；

Fig. 3 is to illustrate in the embodiment of the present invention to the source data relevance process flow diagram collected；

Fig. 4 is the diagram meaning of GRU network model described in the embodiment of the present invention；

Fig. 5 is the algorithm flow chart signal of GRU network model described in the embodiment of the present invention；

Fig. 6 is the Loss function penalty values correlation curve that the method for the present invention and LSTM method carry out the prediction of public transport arrival time Diagram meaning；

Fig. 7 is that the public transport arrival time predicted using the method for the present invention and the practical arrival time comparison of public transport are illustrated Meaning；

Fig. 8 and Fig. 9 is that LSTM network and GRU network are respectively adopted in the embodiment of the present invention to carry out public transport and arrive at a station to predict to train Comparison diagram signal.

Specific embodiment

In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described.

Refering to fig. 1, in embodiments of the present invention, a kind of public transport arrival time prediction based on GRU neural network is provided Method, specifically includes that steps are as follows:

Step 1: exporting historical data to CSV formatted file by database, initial data is obtained, HBase distribution is utilized Database and Spark memory processing technique to initial data be analyzed and processed removal the promiscuity of initial data, complexity and Coefficient；In conjunction with Fig. 2, specifically, database is the historical record data for storing public transport real time execution, wherein historical record number It is recorded and is obtained together by GPS according to (i.e. historical data), and since initial data is remembered by the GPS instrument that is mounted in public transport There is the problems such as receiving precision delay and in public transport actual moving process in record, direct received data be possible to by GPS location precision and network influence, be present in reference format be not inconsistent, data apparent error, Data duplication the problems such as；It is based on This, the method for the present invention obtains original CSV formatted file first with SparkSQL from HDFS, forms Spark DataFrame The data of format extract operation to redundancy, abnormal data, and delete to redundant columns using time series, license number matching The number for removing, finally data being ranked up according to time, sequence of cars, and completed cleaning using HBase Phonenix interface According to storing into database；Followed by the history GPS track data for specifying public transport in HBase, pass through Spark elasticity distribution formula number History GPS track data are matched with bus station distance according to collection technology；In the matching process comprising steps of

A particular value is first set for judging whether matching is less than the setting arrival location of specified public transport, if matched result Less than particular value, then public transport arrival location corresponding with matching is marked；Public transport arrival location is led at a distance from actual location place Cross the calculating of Greate-Circle distance calculation formula, Greate-Circle distance calculation formula are as follows:Wherein, R is earth radius, A_j, A_wRespectively Longitude, the latitude in actual location place；B_j, B_wThe respectively longitude, latitude of public transport arrival location；It will match again in chronological order Appoint the corresponding anchor point of two GPS track data for taking time interval to be greater than t seconds, is judged according to the slope of two anchor point lines The uplink and downlink operation conditions of public transport；Wherein, the calculating of slope formula are as follows:In formula, D_lon、 D_latRepresent route uplink terminus longitude, latitude, S_lon、S_latRepresent route uplink inception point longitude, latitude, A_lon、A_latIt represents Latter station longitude, the latitude of rear vehicle driving trace, B_lon、B_latRepresent previous station longitude, latitude；In checkout result, if K > 0, Then indicate with it is in the same direction for uplink, i.e. uplink is on the contrary then be downlink；Then, positioning nearest with website in matching is chosen Time, the speed of service and acceleration based on public transport record arrival time, especially by formulaMeter Calculation obtains arrival time, and in formula, s is the last anchor point point distance leaving from station, v₀For public transport at the public transport arrival location Running velocity, v_tFor speed of arriving at a station, 0, t is defaulted as the time used in the last anchor point to bus station；Finally, to arrive Stand the time and the corresponding vehicle number of public transport be ranked up initial data, and using the output of Spark memory processing technique store to In HDFS；Meanwhile by finding corresponding GPS track data pair using map to site name and position in site information table The coordinate position answered analyzes its corresponding station spacing according to its operating line, forms site information table.

In a particular embodiment, if it exists the corresponding locating point position of a plurality of history GPS track data and bus station away from From matched data, then screened with nearest, earliest for essential condition, selection obtains best matching result；Wherein, of the invention Table one is seen using the format of initial data；The arrival time table of public transport sees table two；The specific website information table of public transport See table three.

Table one

Table two

Table three

Step 2: using feature correlation organon processing analysis, that treated is former based on single attribute and multiple-factor angle Beginning data obtain standard time series categorical data；In a particular embodiment, consider from single attribute, each shift station Service time between point necessarily affects the arrival time of the next stop, and in practical driving procedure, different vehicle is due to driving There is also certain changing rules for the person's of sailing difference, and consider existing connection between multiple-factor, website spacing and when dispatching a car Between and whether be the traffic-operating periods feature such as working day, the efficiency of operation of whole route is necessarily affected, so as to cause arrival time Variation, time series relationship existing for combined data script processes data into standard by feature correlation organon Time series type；In the present invention, method is from two angle analysis different times of transverse and longitudinal of time and space and weather feelings Influence of the condition difference for public transport arrival time specifically sees four content of table.

Table four

Step 3: carrying out variables choice to standard time series categorical data using Lasso method, standard time sequence is rejected The weak feature vector of relevance in column categorical data；Being arrived at a station due to prediction public transport is a kind of regression problem in actual operation, In order to avoid due in regression analysis process predicted vector it is excessive, the calculating process for causing subset to select has not practicability, And subset selection has inherent discontinuity, it is extremely changeable so as to cause subset selection；By the present invention in that with Lasso method Variables choice is carried out, the weak feature vector of relevance is rejected, Lasso method defined formula are as follows:In formula, x_ijFor i-th group of j variable, vector β is regression coefficient, and y is indicated Training label；In conjunction with Fig. 3, the detailed process of the weak feature vector of correlation is rejected using Lasso method are as follows:

It analyzes to obtain the coefficient of different attribute firstly, carrying out variables choice using Lasso method by specified programming language Value, the variables choice coefficient of Lasso method see table five；The specific implementation program code of Lasso method in the present embodiment are as follows:

Table five

Then, according to its relative coefficient, specified attribute outputting and inputting as prediction model is selected, it is preferred that this Embodiment selects BUSNO, STOP, WEEKDAY, and DISTANCE, STARTTIME, input of the WEATHER attribute as model will Arrival time (STOPTIME) is exported as model；Certainly, this is only the preferred embodiment of the method for the present invention, in other embodiments In, it can be selected according to the actual situation, the present invention is not limited to this and fixed.

In actual operation, when increasing data volume due to after regression analysis pre-processes, needing to look to the future in data, The inconsistent problem of dimension, it is therefore desirable to operation are standardized to data, the expression formula for having dimension is transformed to nondimensional Expression formula；In this regard, the present invention is defined using class label, it is assumed that 10 vehicle license numbers are indicated with 0~9；It is marked using zero-mean Standardization is defined as,In formula, x indicates former fixed type data, and x* indicates that new data, μ indicate sample average, σ Indicate sample standard deviation；And deviation standardization, defined formula are utilized for his data are as follows:In formula, Y indicates standard value, and x indicates former characteristic value；The benefit that data become scalar is had from there through normalization, searching can be effectively reduced The time of optimal solution, the convergence rate and its precision of prediction of lift scheme, the contribution phase that each feature can be allowed to make result Together；Solve the problems, such as new data dimension difference；The forecasting efficiency and precision of prediction of the method for the present invention can be promoted.

Step 4: based on GRU neural network building public transport the prediction model that arrives at a station, input rejected the weak feature of relevance to The standard time series categorical data of amount is realized and is operated to the time prediction that public transport is arrived at a station to the prediction model that arrives at a station；In conjunction with Fig. 4, It can be seen that GRU neural network possesses resetting door and updates two doors of door, and GRU neural network will not control and retain inside Remember C_t；The principle of GRU neural network are as follows: firstly, updating door when time step is t, pass through formula z_t=σ (W^(z)x_t+U^(s) h_t-1) update door is calculated, in formula, x_tFor t-th of component of list entries x, pass through a linear transformation and weight matrix W^(z) It is multiplied, h_t-1The information for saving previous time step, by weight matrix U^(s)Carry out linear transformation；Update goalkeeper this two Partial information is added, and is converted using Sigmoid activation primitive, activation result is compressed between 0 to 1；Door is updated to determine By historical data number pass to future, reduce the risk that gradient disappears；Resetting door determines the forgetting process of data, leads to Cross formula r_t=σ (W^(r)x_t+U^(z)h_t-1) indicate；Similar to update door, the letter that the component of list entries and back are saved Breath carries out linear transformation, carries out transformation output finally by Sigmoid activation primitive.

Then, in use, new content will use the data in the history step of resetting door storage to resetting door, specifically Can by formula h '_t=tan h (Wx_t+r_t⊙Uh_t-1) be calculated, wherein input quantity x_tWith the information h of back_t-1It first passes through Linear transformation processing, i.e., the right side multiplies matrix W, U respectively；Since resetting door is one by 0 to 1 vector, its value measurement, which gates, is opened The size opened；When the corresponding gate value of some element is 0, then having meant that this element will be lost in this step by network Forget, resets door r by calculating_tAnd Uh_t-1Hadamard product, can determine the information content to be retained or be forgotten；Finally Two parts computer is crossed into addition investment tanh activation primitive tanh.

Finally, calculating the final memory h of GRU neural network current time step_t, especially by formula: h_t=z_t⊙h_t-1+ (1-z_t)⊙h′_tIt calculates, h_tInformation required for active cell will be retained and pass to next unit, used update herein The activation result z of door_t, to determine current memory content h '_tWith back information h_t-1The middle information for needing to collect；Wherein z_tWith h_t-1The previous time step of Hadamard product representation remain into the information finally remembered, which remains into plus current memory The information finally remembered can calculate the content of final gating cycle unit output.

In a particular embodiment, the built-in protection of every layer of GRU neural network and the update door of its state is controlled, for realizing Parameter sharing and circulation memory；Especially by the function being added for realizing exponential damping learning rate, and using under Adam gradient Drop method, specifically, Adam gradient descent method is to single order momentum index rolling average calculation formula are as follows:

Wherein m_tRepresent single order momentum, v_tGeneration Table second order momentum, β₁、β₂, represent objective function immediately, in the stage of primary iteration, two momentum have the offset to initial value, That is m_t=0, v_t=0；Therefore, formula can be passed through to itIt is biased correction, and uses formulaGradient is updated；Compared to the prediction model constructed based on LSTM, the method for the present invention based on The prediction model overall structure of arriving at a station of GRU neural network is simpler, and when front and back gradient direction is consistent, can speed up It practises；When front and back gradient direction is inconsistent, it is able to suppress oscillation, cost module is used to calculate predicted value and the loss of true value is poor It is different, based on the next step optimal way of the obtained loss diversity judgement GRU neural network, and determine the optimization side of gradient To；Save module guarantees that the safety of model can be by mould that is, after being trained using a model for storage model parameter Type completely saves, and on the one hand realizes the continuous preservation of data, on the other hand, can use guarantor during predict next time The model deposited is realized to the optimization of entire prediction process steps, is conducive to the forecasting efficiency for promoting the method for the present invention.

Refering to Fig. 5, in embodiments of the present invention, the process of the prediction model of GRU neural network building are as follows:

Choose hyper parameter first: preferred, the invention of this reality is 0.1 to be just distributed very much to initialize weight as standard deviation, just Beginningization deviation is 0.1, and initial learning rate is 0.001, attenuation coefficient 0.9, the rate of decay 1000, training dataset Batch_size is 800, and all sample training number Epoch are 30, and time step Timesteps is 30.

Then model training is carried out: it is preferred, specifically, the present embodiment was gone through using Nantong Area No. 41 bus 14 days History data of arriving at a station are analyzed, and take training set of preceding 10 day data as the prediction model that arrives at a station, using quadratic loss function i_t =σ (W_i·[h_t-1,x_t]+b_i) minimum error function as the model training, and using rear 4 day data in 14 days as inspection Test the test verify data of model training result；Formula can specifically be announced

It indicates, in formula, C is quadratic loss function value, and x is input value, and y (x) is The true value of arrival time, a are the corresponding output valve i.e. predicted value for inputting x and obtaining, and n indicates once trained total amount of data.It is real In the application of border, over-fitting in order to prevent, and preferably reduce error, so that model is studied in depth, is added in loss function L2 regular terms, ω indicate weight, and λ is for weighing quadratic loss function and weight this two relative importance.

By the public transport constructed the present invention is based on GRU neural network arrive at a station prediction model and tradition based on LSTM building prediction Model carries out loss late comparison, refering to Fig. 6, it can be seen that, the method for the present invention rapid decrease before four iteration, and five It tends towards stability after secondary, shows that the prediction model that arrives at a station of the method for the present invention building has been subjected at this time and train up, i.e., the present invention can To complete the forecast function of model in the case where frequency of training is few, predetermined speed of entire model is effectively improved, is integrally mentioned Rise forecasting efficiency.

Refering to Fig. 7, by the practical arrival time pair of the public transport arrival time predicted by the method for the invention and public transport Than specifically, being different from mean absolute percentage error MAPE index, present invention employs formulasIt is fixed The linear regression fit degree index R-squared of justice judges, wherein y indicates practical arrival time, y* expression arrival time Based on GRU neural network building prediction model predicted value of arriving at a station,Represent average value；And according to formulaCalculate the quasi- of 3 days all shifts of the prediction model fitting of arriving at a station constructed based on GRU neural network Right index R-squared, then be averaged, show that the degree of fitting of the prediction model that arrives at a station based on the building of GRU neural network reaches To 94.547%, by practical arrival time compared with the predicted time of the prediction model that arrives at a station constructed based on GRU neural network It is found that the prediction result of the method for the present invention is close with the practical arrival time of public transport, error is smaller.

Again by the method for the present invention and prediction model degree of being fitted and performance comparison based on LSTM building, refering to table six, It can be seen that the method for the present invention compared to it is traditional based on LSTM construct prediction model, GM11 algorithm and SVM algorithm come It says, degree of fitting is promoted obvious, i.e., the precision of prediction of surface the method for the present invention is higher than traditional public transport and arrives at a station precision of prediction；Refering to figure 8 and Fig. 9, therefrom can be with compared with the method for the present invention is carried out ten training with traditional LSTM prediction model in combination with table seven Find out, howsoever take epoch and batchsize that can find, the time-consuming of the method for the present invention is fewer than LSTM, in epoch number When for 100, batchsize being 300, the average time-consuming of LSTM network has had more 7.168% compared to GRU network, in epoch number When for 300, batchsize being 3000, the average time-consuming of LSTM network has been higher by 14.1% compared to GRU network；With this it is found that In the case where data volume constantly becomes more, calculating money can be more saved using the prediction model that arrives at a station that GRU neural network constructs Model training the time it takes, the operation efficiency of lift scheme are reduced in source.

Table six

Table seven

In summary, the public transport arrival time prediction technique of the invention based on GRU neural network, passes through Spark first It handles to obtain standard time series categorical data to initial data process, realizes and arrive at a station the extractions of data to public transport；Then it utilizes Lasso method proposes that the weak feature vector of relevance realizes variables choice operation；Finally utilize the building public transport of GRU neural network Arrive at a station prediction model, realizes and operates to the specific time prediction that public transport is arrived at a station；Compared with prior art, GRU nerve net of the present invention Network has the operating process being screened and selected to data, by arriving at a station the screening and selection of data to public transport, side of the present invention Method can effectively promote the accuracy to the prediction of public transport arrival time.

The foregoing is merely a prefered embodiment of the invention, is not intended to limit the scope of the patents of the invention, although referring to aforementioned reality Applying example, invention is explained in detail, for a person skilled in the art, still can be to aforementioned each specific Technical solution documented by embodiment is modified, or carries out equivalence replacement to part of technical characteristic.All utilizations The equivalent structure that description of the invention and accompanying drawing content are done directly or indirectly is used in other related technical areas, together Reason is within the invention patent protection scope.

Claims

1. a kind of public transport arrival time prediction technique based on GRU neural network, which is characterized in that the described method includes:

S1, historical data is exported to CSV formatted file by database, obtains initial data, using HBase distributed data base and Spark memory processing technique to the initial data be analyzed and processed removal the promiscuity of initial data, complexity and Coefficient；

S2, based on single attribute and multiple-factor angle using feature correlation organon processing analysis treated the original number According to obtaining standard time series categorical data；

S3, variables choice is carried out to the standard time series categorical data using Lasso method, rejects the standard time sequence The weak feature vector of relevance in column categorical data；

The described of the weak feature vector of relevance has been rejected in S4, the prediction model that arrives at a station based on the building public transport of GRU neural network, input Standard time series categorical data is realized and is operated to the time prediction that public transport is arrived at a station to the prediction model that arrives at a station.

2. the public transport arrival time prediction technique based on GRU neural network as described in claim 1, which is characterized in that step S1 includes:

S11, the CSV formatted file is obtained from HDFS using SparkSQL, forms Spark DataFrame structured data；

S12, the history GPS track data that specified public transport is extracted using SparkSQL, and utilize HBase distributed data base by institute History GPS track data are stated to be matched with bus station distance.

3. the public transport arrival time prediction technique based on GRU neural network as claimed in claim 2, which is characterized in that described The history GPS track data are matched with bus station distance using HBase distributed data base, comprising:

S121, one particular value of setting are used to judge whether the matching to be less than the specified arrival location of public transport, if described matched As a result it is less than the particular value, then marks public transport arrival location corresponding with the matching；

S122, two GPS positioning points for taking time interval to be greater than t seconds are appointed into the matching in chronological order, are positioned according to two The slope of point line judges the uplink and downlink operation conditions of public transport；

S123, positioning time nearest with website in the matching is chosen, the speed of service and acceleration based on public transport are recorded It stands the time；

S124, the initial data is ranked up with arrival time and public transport corresponding vehicle number, and is deposited using Spark output Storage is into HDFS.

4. the public transport arrival time prediction technique based on GRU neural network as claimed in claim 3, which is characterized in that described Public transport arrival location is calculated at a distance from actual location place by Greate-Circle distance calculation formula, the Greate- Circle distance calculation formula are as follows:

Wherein, R is earth radius, A_j, A_wThe respectively longitude, latitude in actual location place；B_j, B_wRespectively public transport arrival location Longitude, latitude.

5. the public transport arrival time prediction technique based on GRU neural network as claimed in claim 3, which is characterized in that described The calculating of slope formula are as follows:

In formula, D_lon、D_latRepresent route uplink terminus longitude, latitude, S_lon、S_latRepresent route uplink inception point longitude, latitude Degree, A_lon、A_latRepresent latter station longitude, the latitude of rear vehicle driving trace, B_lon、B_latRepresent previous station longitude, latitude；Its In, if K > 0, then it represents that with it is in the same direction for uplink, i.e. uplink is on the contrary then be downlink.

6. the public transport arrival time prediction technique based on GRU neural network as claimed in claim 3, which is characterized in that step S223 passes through formulaWherein, s is the last anchor point point distance leaving from station, v₀It is arrived for the public transport It stands the running velocity of public transport at place, v_tFor speed of arriving at a station, 0, t is defaulted as used in the last anchor point to bus station Time.

7. the public transport arrival time prediction technique based on GRU neural network as described in claim 1, which is characterized in that described Lasso method defined formula are as follows:

Wherein, x_ijIt is row vector β to return system for i-th group of j variable Number, y indicate training label.