CN110796317B

CN110796317B - Urban taxi scheduling method based on demand prediction

Info

Publication number: CN110796317B
Application number: CN201911214182.1A
Authority: CN
Inventors: 熊盛武; 程浩; 段鹏飞; 陆丽萍; 路雄博; 曹丹凤
Original assignee: Wuhan Shuixiang Electronic Technology Co ltd; Wuhan University of Technology WUT
Current assignee: Wuhan Shuixiang Electronic Technology Co ltd; Wuhan University of Technology WUT
Priority date: 2019-12-02
Filing date: 2019-12-02
Publication date: 2022-11-01
Anticipated expiration: 2039-12-02
Also published as: CN110796317A

Abstract

The invention discloses a demand prediction-based urban taxi dispatching method, which carries out data preprocessing on passenger carrying track data by filtering noise data; extracting spatial correlation characteristics and urban area demand characteristics with a time period rule through a two-dimensional convolutional neural network; and (4) scheduling the urban taxis by using an NSGA-II algorithm in combination with the demand number prediction and the track destination prediction of the taxis in the urban area. The prediction accuracy is improved. The method of the invention can make the taxi find the passenger as early as possible under the condition of meeting the demand, improve the operation efficiency of the taxi and reduce the dispatching time delay of the taxi.

Description

Urban taxi scheduling method based on demand prediction

Technical Field

The invention relates to deep learning and trajectory data mining, in particular to a demand prediction-based urban taxi scheduling method.

Background

With the continuous acceleration of urban development process, the problem of traffic jam becomes an urgent problem to be solved in cities. The urban taxi is used as a position floating vehicle, great convenience is provided for urban residents to go out, and a new solution is provided for solving the problem of urban traffic jam. Then, the urban taxies have the problems of high driving rate, difficulty in searching passengers and the like, so that urban resource waste and economic loss are caused, and the road traffic pressure is further aggravated.

In order to solve the problems of high empty driving rate and difficulty in searching for a taxi, researchers propose a taxi scheduling algorithm to manually intervene in the taxi passenger searching process so as to improve the taxi operation efficiency and reduce the empty driving rate. The existing scheduling algorithm is to schedule according to the real-time requirement and the real-time position of a taxi, and the taxi reaches the scheduling position from the existing position due to the required time in the scheduling process to generate scheduling time delay.

The deep learning technology has been used with great success in the field of image and voice recognition, and is also widely applied in vehicle trajectory research. Aiming at the problem of predicting the quantity of urban area demands, the long-term and short-term memory neural network has great advantages. The time sequence dependence characteristics of the urban area demand data can be extracted through the neural network. However, the time regularity and the spatial characteristics of taxi demand data are not fully considered in the existing research.

At present, the problem of scheduling delay still exists in a taxi scheduling system. Taxi dispatching aiming at real-time requirements needs a taxi from a departure place to a dispatching destination, and the dispatching time delay in the dispatching process can cause long waiting time of passengers and change of the requirements. The above problem can be solved by urban area demand prediction.

Disclosure of Invention

The invention aims to solve the defects of the background technology, and provides a demand prediction-based urban taxi dispatching method.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a demand prediction-based urban taxi dispatching method comprises the following steps:

step 1: carrying out data preprocessing on the passenger carrying track data through noise filtering data;

step 2: extracting spatial correlation characteristics and urban area demand characteristics of a time period rule through a two-dimensional convolutional neural network;

and 3, step 3: and (4) combining the prediction of the number of the taxis in the urban area and the prediction of taxi track destinations, and scheduling the urban taxis by using an NSGA-II algorithm.

The invention has the beneficial effects that: the method utilizes the long-short term memory neural network and the convolution neural network in machine learning to predict the taxi demand in the urban area, and improves the prediction accuracy. According to the method, the demand data of the taxies in the urban area is predicted, and demand scheduling is performed on the empty taxi, so that the taxi can find passengers as early as possible under the condition that the taxi meets the demand, the taxi operation efficiency is improved, and the taxi scheduling time delay is reduced; and finally, modeling the taxi dispatching problem into a multi-objective optimization problem, taking the minimized dispatching distance and the maximized area demand satisfaction degree as target problems, and solving the optimization problem by using an NSGA-II algorithm.

Drawings

FIG. 1: is a flow chart of an embodiment of the invention;

FIG. 2: a schematic diagram of the trajectory data drift for the implementation of the invention;

FIG. 3: a schematic diagram of a neural network for forecasting urban regional requirements is implemented;

FIG. 4: the schematic diagram of the extraction of the spatial characteristics of urban regional demands is implemented;

FIG. 5: the urban area demand prediction experiment result is obtained;

FIG. 6: the invention is a schematic diagram of the taxi dispatching experiment result.

Detailed Description

In order to more clearly illustrate the present invention and/or the technical solutions in the prior art, the following will describe embodiments of the present invention with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.

As shown in fig. 1, a method for urban taxi dispatching based on demand prediction and destination prediction mainly includes four aspects of taxi trajectory data preprocessing, urban area demand prediction, taxi destination prediction and taxi multi-objective optimized dispatching, and includes the following steps:

step 1, carrying out data preprocessing on passenger carrying track data through noise filtering data;

the passenger carrying track data in the step 1 are as follows:

and (4) a track sequence consisting of track points in the passenger carrying process of all taxis in the city. Different taxis do not have obvious track discrimination, so the tracks generated by different taxis are not distinguished.

tra represents a complete track of the passenger carrying track, and is composed of n track points p_iForming;

p_iis a triple (lat)_i，lng_i，t_i) Wherein lat_iLongitude, lng, representing points of track_iIndicating the latitude, t, of the track point_iRepresenting the time of the track point;

a complete trajectory of a passenger trajectory can be represented as:

tra＝{p₁，p₂，p₃，…，p_n}；

wherein n is the number of track points;

the specific method for filtering the noise data in the step 1 is as follows:

step 1.1, checking the integrity of data;

data integrity refers to checking tra = { p₁，p₂，p₃，...，p_nWhether all track points of the track have horizontal and vertical coordinates and time or not is judged, and if the track point triples exist and are effectively represented, the track is complete; otherwise, discarding the whole track for the incomplete track data;

step 1.2, judging whether the track points are in the urban area;

determining the maximum coordinate range of the city according to the city administrative division range:

(lat_min，lat_max，lng_min，lng_max)

wherein, lat_minRepresenting the longitude minimum, lat, of a point of track_maxRepresenting the maximum longitude, lng, of a point of track_minIndicating the minimum latitude, lng, of the trace point_maxRepresenting the maximum latitude value of the track point;

judging whether the track point is in the range:

if lat_min≤lat_i≤lat_maxAnd lng_min≤lng_i≤lng_maxThen is the effective track point;

otherwise, the track points are invalid track points, and the whole track data where the invalid track points are located is abandoned;

judging whether two adjacent points in the track drift: judging the speed between two points, wherein the point which is generally drifted can generate a transition phenomenon in the conventional time, and if the distance between the two points is divided by the time and is not in the conventional range, the data drift phenomenon is considered to be generated;

as shown in FIG. 2, if haversine ((lat)_i，lng_i)-(lat_i-1，lng_i-1))/(t_i-t_i-1) If the speed is higher than the conventional speed of the vehicle (set as 30 m/s), the track point is shifted, and the whole track data where the track point is located is filtered, wherein haversine is a geographical spherical surface calculation formula;

step 2: extracting urban area demand characteristics of time period rules and space correlation characteristics through a two-dimensional convolutional neural network, wherein a figure 3 is a prediction model neural network structure;

in the step 2, the urban area demand characteristics of the spatial correlation characteristics and the time period rules are extracted through a two-dimensional convolutional neural network, and the method comprises the following steps:

step 2.1, the urban area demand feature extraction of the spatial correlation features through the two-dimensional convolutional neural network is as follows:

dividing the city area range into I multiplied by J grid areas, positioning each area by using a two-dimensional array subscript mode, and determining the area r with subscript (I, J)_i,jTaxi requisition at time tIs defined as

The taxi demand of all areas of the city at time t is defined as:

for region r_i,jExtracting taxi demand numerical values and a range area which takes (i, j) as a central point and is L multiplied by L as taxi demand characteristic data of the area, as shown in FIG. 4;

the extracted regional taxi demand characteristic data can be regarded as an L × L image (such as 7 × 7 image in FIG. 4) with only one channel, and the regional taxi demand is represented as

Will be provided with

The input is into S-layer convolution network, the input of S-th layer is

Each layer of convolutional network data conversion is defined as:

where denotes the convolution operation, f (-) is the activation function, here the ReLU activation function, w^sAnd b^sIs the parameter set of the s convolutional layer;

after convolution operation, output obtained by multiple convolution kernels

Stretching to obtain region r_i,jFeature vector at t

Finally, reducing the dimension of the feature vector by using a full connection layer:

wherein, W^fcAnd b^fcIs a learning parameter of the full-link layer,

is a region r_i,jCarrying out convolution processing on taxi demand data to obtain a space feature vector with the length of convLen;

step 2.2, the urban area demand characteristics with regular time period are extracted through a two-dimensional convolution neural network, and the method comprises the following steps:

the urban area demand data is from taxi track data, and the taxi track data has obvious time continuity characteristic and has different demands at different time every day;

the temporal characteristic extraction of taxi demand data is combined with the spatial characteristic, so that the temporal and spatial characteristics of taxi demands can be reflected better;

the method comprises the following steps of (1) extracting time characteristics of required data, namely extracting data time sequence dependence characteristics by using a long-term and short-term memory neural network, and embedding time period information as auxiliary information into a network architecture, wherein the specific flow is as follows:

setting the time sequence length of taxi demand data in the urban area as step, namely taking the demand data of each area in continuous step time periods as a piece of sample data to perform feature extraction;

extracting the spatial features in the previous step of each time series data to obtain a feature vector s with step length^i,j∈R^step ^×convlen；

s^i，jIs a two-dimensional vector of step × convlen, and each row represents a spatial feature vector under a time sequence;

will vector s^i,jAs long and short termThe input of a memory neural network (LSTM) extracts the region r by virtue of the LSTM's ability to extract time series data dependency features_i,jTime sequence dependency of taxi demand data;

the LSTM shares four important concepts of an input gate, a forgetting gate, an output gate and a memory unit. Memory cell C_t"remember" historical sequences before time t and thus learn sequence dependencies. Forgetting door f_tReading the hidden state h output at the last moment_t-1And input x at the current time_tDetermining the slave memory cell C_t-1Which information to discard; input door i_tDetermining how much new information is input into the memory unit at the current time t; output gate o_tControlling the output of the memory unit.

The input gates are:

i_t＝σ(W_ix_t+U_ih_t-1+b_i)

the forgetting gate is:

f_t＝σ(W_fx_t+U_fh_t-1+b_f)

the output gate is:

θ_t＝tanh(W_gx_t+U_gh_t-1+b_g)

a new candidate vector is calculated as:

o_t＝σ(W_ox_t+U_oh_t-1+b_o)

the new candidate vector is used with the input gate to update the memory cell state as:

in the formula, the old memory cell state is multiplied by a forgetting gate, and a new candidate value vector is added to obtain the memory cell state at the current time t;

defining the hidden state of the output gate multiplied by the state of the memory unit under the current time t as:

after time sequence data is input into LSTM, hidden state h is output under each time sequence_tAnd finally obtaining the final output result h by the data of the time sequence_endThe output result includes time-dependent characteristics of the time-series data.

Will vector s^i,jInput into LSTM, and finally output out of the combining region r_i,jThe taxi demand distribution space correlation characteristic and the vector of the time sequence dependence characteristic, namely the space-time characteristic vector ST of the taxi demand data in the urban area^i,j；

In addition to regional taxi demand data, the predictive model also requires the input of additional information:

region identification information, i.e., region _ ID: the method comprises the steps of identifying an area to which sample data belongs;

time period information, time _ Slot: representing a time period to which the sample data belongs in a day;

traffic working day cycle information, weekday: the number of days the sample data belongs to in the week;

by adding time information into the prediction model, the demand data of different time periods can be predicted according to a time change rule, and in order to better represent the three information in the neural network, subscripts are respectively carried out on the three information:

table 1: three information subscript ranges

Because the three kinds of information are expressed in a range subscript mode, if the subscript is directly input into the network, the mathematical significance of the subscript can influence the feature extraction of the neural network;

therefore, the invention inputs the region identification information, the time period information and the traffic working day period information into the neural network in the form of embedded vectors, and the information is expressed by the embedded vectorsThe mathematical meaning of the magnitude of the subscript value is eliminated. The resulting embedded vector and spatio-temporal feature vector ST^i,jPerforming combined splicing and inputting into a full connection layer as an area r_i,jTaking taxi demand data of the region at the moment as a prediction result;

the loss function of the prediction model is:

wherein

Is the prediction result of the prediction model output, y_iIs the real demand data and n is the number of training set samples. The prediction model uses gradient back propagation to carry out parameter tuning, and finally the LSTM taxi demand prediction model is obtained.

And 3, step 3: and (4) scheduling the urban taxis by using an NSGA-II algorithm in combination with the demand number prediction and the track destination prediction of the taxis in the urban area.

According to the step 2, training a prediction model from historical data of urban regional demands, predicting the demands in a future period of time, and obtaining the quantity of taxi demands in all regions of the urban area in the future period of time;

scheduling the taxies according to the position distribution condition of the urban taxies at the current time, and reasonably allocating taxi transportation capacity resources;

extracting regional demand prediction distribution D of a timedrop time period in the weekday in the data set from urban regional demand sample data and taxi tracks;

the position information and the passenger carrying state (no-load or passenger carrying) of the taxi can be acquired in real time through a vehicle-mounted GPS system and a passenger carrying state sensor;

obtaining the position information and the passenger carrying states of all taxis at the timesale-1 time of the weekday, filtering all taxis in the passenger carrying states, and obtaining the position distribution K of all unloaded taxis;

the taxi dispatching method based on the two distributions carries out taxi dispatching, and solves the taxi dispatching by using a multi-objective optimization algorithm. There are two main optimization problems in taxi dispatching:

the method has the advantages that the satisfaction degree of taxi requirements in urban areas is maximized, and the running distances of all dispatching vehicles are minimized;

the method disclosed by the invention has the following advantages that the average value S of the satisfaction degree of taxi demands in a real city taxi maximized city area i:

wherein, the first and the second end of the pipe are connected with each other,

m in the formula (4-8)_iNumber of taxis dispatched to area i, D_i' is the area demand data calculated in section 4.1. The demand satisfaction degree shows the efficiency of the taxi dispatching system, and the higher the satisfaction degree is, the higher the probability that the taxi finds the passenger is.

Minimizing the sum of all scheduled vehicle travel distances DIS:

wherein the content of the first and second substances,

dis in equations (4-9)_ijIs the grid distance between region i and region j, calculated by converting the region number to a two-dimensional index. The smaller the running distance of the dispatching vehicle is, the smaller the operating oil consumption of the taxi is, and the smaller the economic loss is.

The invention uses NSGA-II algorithm to solve the multi-target problem to obtain the most effective solution set.

The accuracy analysis of the demand forecast in the present invention is shown in fig. 5. Compared with other methods, the prediction algorithm provided by the invention has higher accuracy on demand prediction. The method and the system schedule the urban taxis in advance according to the real-time taxi demand by predicting the demand of the urban area at the moment of timeload and according to the no-load distribution of the urban taxis at the moment of timeload-1. The invention carries out experimental verification on a real urban taxi data set, and the scheduling conditions based on demand prediction at three different moments in a day are shown in figure 6.

It should be understood that parts of the specification not set forth in detail are of the prior art.

It should be understood that the above description of the preferred embodiments is illustrative, and not restrictive, and that various changes and modifications may be made therein by those skilled in the art without departing from the scope of the invention as defined in the appended claims.

Claims

1. A demand prediction-based urban taxi scheduling method is characterized by comprising the following steps:

step 1, carrying out data preprocessing on passenger carrying track data by filtering noise data;

the passenger carrying track data in the step 1 are as follows:

track sequences formed by track points in the passenger carrying process of all taxis in a city; different taxis do not have obvious track discrimination, so the tracks generated by different taxis are not distinguished;

p_iis a triple (lat)_i，lng_i，t_i) Wherein lat_iLongitude, lng, representing a point of track_iIndicating the latitude, t, of the locus point_iRepresenting the time of the track point;

a complete trajectory of a passenger trajectory can be represented as:

tra＝{p₁，p₂，p₃，…，p_n}；

wherein n is the number of the track points;

the specific method for filtering the noise data in the step 1 is as follows:

step 1.1, checking the integrity of data;

step 1.2, judging whether the track points are in the urban area;

(lat_min，lat_max，lng_min，lng_max)

wherein, lat_minRepresenting the longitude minimum, lat, of a point of track_maxLongitude maximum, lng, representing a point of track_minRepresents the minimum latitude value of the track point, lng_maxRepresenting the maximum latitude value of the track point;

judging whether the track point is in the range:

otherwise, the track points are invalid track points, and the whole track data where the invalid track points are located is discarded;

judging whether two adjacent points in the track drift: judging the speed between two points, wherein the point which drifts can generate a transition phenomenon in the conventional time, and if the distance between the two points is divided by the time and is not in the conventional range, the data drift phenomenon is considered to be generated;

if haversine ((lat)_i，lng_i)-(lat_i-1，lng_i-1))/(t_i-t_i-1) If the speed is higher than the conventional speed of the vehicle, the track point drifts, and the whole track data where the track point is located is filtered, wherein haversine is a geographical spherical surface calculation formula;

and 2, step: extracting urban area demand characteristics of time period rules and space correlation characteristics through a two-dimensional convolutional neural network;

in the step 2, the extraction of the urban area demand characteristics of the spatial correlation characteristics and the time period regularity through the two-dimensional convolutional neural network comprises the following steps:

dividing the city area range into I multiplied by J grid areas, positioning each area by using a two-dimensional array subscript mode, and determining the area r with subscript (I, J)_i,jTaxi demand at time t is defined as

The taxi demand of all areas of the city at time t is defined as:

for region r_i,jExtracting taxi demand numerical values and a range area which takes (i, j) as a central point and is L multiplied by L as taxi demand characteristic data of the area;

the extracted regional taxi demand characteristic data can be regarded as an L multiplied by L image with only one channel, and the regional taxi demand is expressed as

Will be provided with

Input into S-layer convolutional network, S-th layerThe input is

Each layer of convolutional network data conversion is defined as:

where denotes the convolution operation, f (-) is the activation function, using the ReLU activation function, w^sAnd b^sIs the parameter set of the s convolutional layer;

after convolution operation, output obtained by a plurality of convolution kernels

Stretching to obtain region r_i,jFeature vector at t

wherein, W^fcAnd b^fcIs the learning parameter of the full-link layer,

the urban area demand data is taxi track data which presents an obvious time continuity characteristic and has different demands at different time every day;

s^i,jIs a two-dimensional vector of step × convlen, and each row represents a spatial feature vector under a time sequence;

will vector s^i,jExtracting the region r as input of long-short term memory neural network (LSTM) by virtue of the capability of the LSTM to extract time-series data dependency characteristics_i,jThe time sequence dependency of taxi demand data;

the LSTM has four important concepts of an input gate, a forgetting gate, an output gate and a memory unit; memory cell C_t"remember" the historical sequence before time t, and then learn the sequence dependency; forgetting door f_tReading the hidden state h output at the last moment_t-1And input x of the current time_tDetermining the slave memory cell C_t-1Which information to discard; input door i_tDetermining how much new information is input into the memory unit at the current time t; output gate o_tControlling an output of the memory unit;

the input gates are:

i_t＝σ(W_ix_t+U_ih_t-1+b_i)

the forgetting gate is as follows:

f_t＝σ(W_fx_t+U_fh_t-1+b_f)

the output gate is:

θ_t＝tanh(W_gx_t+U_gh_t-1+b_g)

a new candidate vector is calculated as:

o_t＝σ(W_ox_t+U_oh_t-1+b_o)

defining the hidden state obtained by multiplying the output gate by the state of the memory unit at the current time t as follows:

after time sequence data is input into LSTM, hidden state h is output under each time sequence_tAnd finally obtaining the final output result h by the data of the time sequence_endThe output result contains the time-dependent characteristic of the time sequence data;

will vector s^i,jInput into LSTM, and output out of the combined region r_i,jThe taxi demand distribution space correlation characteristic and the vector of the time sequence dependence characteristic, namely the space-time characteristic vector ST of the taxi demand data in the urban area^i,j；

the time information is added into the prediction model, so that the demand data of different time periods can be predicted according to the time change rule, and in order to better represent the three information in the neural network, subscripts are respectively represented on the three information:

inputting the region identification information, the time period information and the traffic working day cycle information into a neural network in the form of an embedded vector, wherein the embedded vector of the information represents the mathematical meaning of eliminating the magnitude of subscript values; the resulting embedded vector and spatio-temporal feature vector ST^i,jPerforming combined splicing input into the full connection layer as an area r_i,jThe taxi demand data of the region at the moment is used as a prediction result;

the loss function of the prediction model is:

wherein

Is the prediction result of the prediction model output, y_iIs the real demand data, n is the number of training set samples; the prediction model uses gradient back propagation to carry out parameter tuning, and finally, the LSTM taxi demand prediction model is obtained;

and step 3: combining the demand number prediction and taxi track destination prediction of taxis in the urban area, and scheduling the urban taxis by using an NSGA-II algorithm;

according to the step 2, training a prediction model from historical data of urban regional demands, predicting the demands in a future period of time, and obtaining the taxi demand quantity of all regions in the city in the future period of time;

the taxi position information and the passenger carrying state can be acquired in real time through the vehicle-mounted GPS system and the passenger carrying state sensor;

obtaining position information and passenger carrying states of all taxis at the timesale-1 time of the weekday day, filtering all taxis in the passenger carrying states, and obtaining position distribution K of all unloaded taxis;

maximizing the average value S of the satisfaction degree of taxi demands in the urban area:

wherein the content of the first and second substances,

in the formula M_iNumber of taxis dispatched to area i, D'_iIs the calculated regional demand data; the demand satisfaction meter shows the efficiency of the taxi dispatching system, and the higher the satisfaction degree is, the higher the probability that the taxi finds the passenger is;

minimizing the sum DIS of the distances traveled by all dispatch vehicles:

wherein the content of the first and second substances,

dis_ijthe grid distance between the area i and the area j is calculated by converting the area number into a two-dimensional subscript; the smaller the driving distance of the dispatching vehicle is, the smaller the fuel consumption of the taxi is, and the smaller the economic loss is.