CN110598917B - Destination prediction method, system and storage medium based on path track - Google Patents

Destination prediction method, system and storage medium based on path track Download PDF

Info

Publication number
CN110598917B
CN110598917B CN201910788582.7A CN201910788582A CN110598917B CN 110598917 B CN110598917 B CN 110598917B CN 201910788582 A CN201910788582 A CN 201910788582A CN 110598917 B CN110598917 B CN 110598917B
Authority
CN
China
Prior art keywords
path
order
predicted
similarity
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910788582.7A
Other languages
Chinese (zh)
Other versions
CN110598917A (en
Inventor
余明辉
王昌栋
詹增荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Panyu Polytechnic
Original Assignee
Guangzhou Panyu Polytechnic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Panyu Polytechnic filed Critical Guangzhou Panyu Polytechnic
Priority to CN201910788582.7A priority Critical patent/CN110598917B/en
Publication of CN110598917A publication Critical patent/CN110598917A/en
Application granted granted Critical
Publication of CN110598917B publication Critical patent/CN110598917B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Abstract

The invention discloses a destination prediction method, a destination prediction system and a storage medium based on a path track, wherein the method comprises the following steps: performing numerical value conversion processing on path data in an order to be predicted according to a preset conversion rule to obtain corresponding path track representation data; inputting the corresponding path track representation data into a local sensitive Hash model, and calculating according to an LSH algorithm to obtain a corresponding matching result; and clustering the corresponding matching results according to the number of destinations by adopting a K-Means algorithm, calculating the path similarity of the matching results in each cluster according to a similarity measurement formula, and selecting the path with the highest similarity in each cluster as a multi-destination prediction result. The method can match the approximate path set through the Hash model, adopt different methods aiming at different problems of predicting a unique destination, screening multiple destinations and the like, improve the destination prediction capability based on the path track, and provide reference for applications in which personal information of users or drivers is unavailable.

Description

Destination prediction method, system and storage medium based on path track
Technical Field
The invention relates to the technical field of intelligent traffic information processing, in particular to a destination prediction method and system based on a path track and a storage medium.
Background
With the development of GPS and 4G networks, modern mobile devices, such as smart phones, are basically built with GPS receivers and navigation systems, which can locate users with high accuracy. These devices generate a large amount of Location data that can be used for a variety of Location-Based Services (LBS) including route planning, real-time feedback of road conditions, recommendations of eating, shopping, or tourist attractions, Location-Based social network analysis, and the like. The application of LBS greatly facilitates people's daily life, one of the most popular applications being various taxi taking software. The software platforms collect a large amount of order and track information every day, collected data provide unprecedented opportunities for people to mine behavior characteristics of drivers and users, and real-time intelligent decision systems such as order scheduling, taxi demand prediction and route planning can be built in different applications.
In daily life, as the demand for applications based on location information increases, more and more research on predicting the current route destination through the driver's travel trajectory is also being conducted. The use of user destination prediction in ad placement is apparent. When a user is riding a taxi, the LBS provider may collect location information from their GPS device of their cell phone or taxi, predict the most likely destination and recommend to the user advertisements for restaurants or malls near the destination, which may attempt to recommend a sightseeing spot if the user is determined to be a traveler. Destination prediction may also be used to assist in route determination, crowd anomaly detection, and the like. Additionally, in a navigation system, the prediction of a person's destination may help determine whether the person deviates from an expected route. As a potential application, by predicting places where most people go in a certain time period, an administrator can judge the crowd scale of the places and take corresponding preventive measures; even in extreme cases, when a certain route is acquired, the investigation personnel of some government agencies can prejudge the destination of the suspect and arrange countermeasures in advance.
Due to the importance of the destination prediction problem in the above applications, extensive research has been conducted on the same, wherein one direction is destination prediction based on trajectory data. Most existing methods are based on various (hidden) markov chain models. A typical approach is to divide the area evenly into grid cells or the road into segments and use the cells or segments as states for the markov process, with the historical trajectories being used to train the state transition probability matrix for the markov chain. However, using a (first order) markov chain model, the assumption that the vehicle is traveling in a memory-free random driving manner is implicit, which is clearly contradictory to our practical experience, i.e., the true trajectory is not completely random.
Also, in the course of research and practice on the prior art, the inventors of the present invention have found that in the destination prediction problem, many times we need not have only one value, in applications such as advertising, we may need to get multiple different destinations. Most of the existing methods use probabilistic reasoning to calculate the probability of each destination and return the top k values with the highest probability. When predicting the destination of the ongoing trip, the conditional probabilities of reaching certain places are sequentially calculated, and the positions corresponding to the top k values (i.e., the top k most likely places) are returned as the prediction result. Obviously, this method does not take further processing on the probability calculation results. The prediction probabilities for the various destinations may be small and close to each other, where the first k most probable results returned may be very close to each other geographically. In this case, the k results are not good for the real life application since other places which are not close to the k positions geographically but have similar probability are ignored.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a destination prediction method, a destination prediction system and a storage medium based on a path trajectory, which can adopt different prediction methods for predicting a unique destination and screening multiple destinations, thereby improving the prediction accuracy.
To solve the above problem, an embodiment of the present invention provides a destination prediction method based on a path trajectory, including at least the following steps:
performing numerical value conversion processing on path data in an order to be predicted according to a preset conversion rule to obtain corresponding path track representation data;
inputting the corresponding path track representation data into a local sensitive Hash model, and calculating according to an LSH algorithm to obtain a corresponding matching result;
and clustering the corresponding matching results according to the number of destinations by adopting a K-Means algorithm, calculating the path similarity of the matching results in each cluster according to a similarity measurement formula, and selecting the path with the highest similarity in each cluster as a multi-destination prediction result.
Further, the method for predicting a destination based on a path trajectory further includes:
screening sample orders in a preset time range before and after the departure time of the order to be predicted;
clustering the sample orders according to the starting point positions by adopting a K-Means algorithm, and recording corresponding clustering centers;
finding the nearest cluster center according to the starting point position of the order to be predicted, and determining the cluster to which the order belongs;
screening the corresponding matching results, and reserving sample orders which are clustered in the same cluster with the order to be predicted;
and calculating the path similarity of the matching result according to a similarity measurement formula, and selecting the path with the highest similarity in the same cluster as a single destination prediction result.
Further, the numerical value conversion processing specifically includes:
dividing the map into a plurality of grids according to the longitude and latitude lines, and labeling the grids according to the sequence of the longitude and latitude;
and performing numerical value conversion on the two-dimensional point sequence in the path track data, reading line by line in a line scanning mode, and splicing and converting the two-dimensional point sequence into a one-dimensional array.
Further, the preset conversion rule specifically includes: keeping the appearance sequence of each coordinate point in the original sequence in the path track data; replacing each coordinate point in the original sequence by using the grid label; if continuous coordinate points appear in the grid, only one coordinate point is reserved.
Further, the locally sensitive hash model includes an Offline phase and an Online phase, where the Offline phase specifically includes: carrying out MinHash conversion on sample tracks with any length through h designed Hash functions to obtain h-dimensional vectors; averagely dividing h-dimensional data into b bands by an LSH algorithm, wherein each Band contains r results of hash functions; for each Band, storing the orders with the same r hash values into the same Bucket, wherein the index value of the Bucket is the corresponding r hash values; storing the calculation result of the LSH stage as the matching of the order to be predicted in the subsequent OffLine stage;
the Online stage specifically comprises the following steps: for an order to be predicted, in the Query process, traversing each Band, finding a corresponding Bucket according to the hash value of the order track in the Band, and taking all sample orders stored in the Buckets as the matching result of the order to be predicted in the hash model.
Further, the similarity measure formula is specifically SIM (S1, S2) ═ COMm (S1, S2)/len (S1)
Wherein S1 is the path of the order to be forecasted, S2 is the forecasted path, COMm (S1, S2) is the maximum common length of discontinuity of the path S2 relative to the path S1, and len (S1) is the length of the path S1.
One embodiment of the present invention provides a destination prediction system based on a path trajectory, including:
the path preprocessing module is used for performing numerical conversion processing on path data in the order to be predicted according to a preset conversion rule to obtain corresponding path track representation data;
the local sensitive hash model module is used for inputting the corresponding path track representation data into a local sensitive hash model and calculating according to an LSH algorithm to obtain a corresponding matching result;
and the multi-destination prediction module is used for clustering the corresponding matching results according to the number of destinations by adopting a K-Means algorithm, calculating the path similarity of the matching results in each cluster according to a similarity measurement formula, and selecting the path with the highest similarity in each cluster as a multi-destination prediction result.
Further, the destination prediction system based on the path trajectory further includes:
the sample screening module is used for screening sample orders in a preset time range before and after the departure time of the order to be predicted in the OffLine stage and the OnLine stage; screening the corresponding matching results, and reserving sample orders which are clustered in the same cluster with the order to be predicted;
the single-destination prediction module is used for clustering the sample orders according to the starting point positions by adopting a K-Means algorithm and recording corresponding clustering centers; finding the nearest cluster center according to the starting point position of the order to be predicted, and determining the cluster to which the order belongs; and calculating the path similarity of the matching result according to a similarity measurement formula, and selecting the path with the highest similarity in the same cluster as a single destination prediction result.
Further, the locally sensitive hash model module comprises an Offline unit and an Online unit, wherein,
the Offline unit is used for carrying out MinHash conversion on sample tracks with any length through h designed Hash functions to obtain h-dimensional vectors; averagely dividing h-dimensional data into b bands by an LSH algorithm, wherein each Band contains r results of hash functions; for each Band, storing the orders with the same r hash values into the same Bucket, wherein the index value of the Bucket is the corresponding r hash values; storing the calculation result of the LSH stage as the matching of the order to be predicted in the subsequent OffLine stage;
and the Online unit is used for traversing each Band in the Query process of the order to be predicted, finding out a corresponding Bucket according to the hash value of the order track in the Band, and taking all sample orders stored in the Buckets as the matching result of the order to be predicted in the hash model.
An embodiment of the present invention further provides a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the method for predicting a destination based on a path trajectory as described above.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a destination prediction method, a destination prediction system and a storage medium based on a path track, wherein the method comprises the following steps: performing numerical value conversion processing on path data in an order to be predicted according to a preset conversion rule to obtain corresponding path track representation data; inputting the corresponding path track representation data into a local sensitive Hash model, and calculating according to an LSH algorithm to obtain a corresponding matching result; and clustering the corresponding matching results according to the number of destinations by adopting a K-Means algorithm, calculating the path similarity of the matching results in each cluster according to a similarity measurement formula, and selecting the path with the highest similarity in each cluster as a multi-destination prediction result. According to the method, the tracks can be subjected to dimension reduction processing and matched with approximate path sets through a least-Hash-based locality sensitive Hash algorithm, different methods are adopted aiming at different problems of predicting a unique destination, screening multiple destinations and the like, effect verification of different methods of multi-parameter evaluation is designed, the destination prediction capability based on the track of the tracks is improved, and the method does not contain any personal attribute characteristic information, and provides reference for application that personal information of users or drivers is unavailable.
Drawings
Fig. 1 is a schematic flowchart of a destination prediction method based on a path trajectory according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a destination prediction system based on a path trajectory according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
First, an application scenario that the present invention can provide, such as destination prediction based on path trajectory data, will be described.
The first embodiment of the present invention:
please refer to fig. 1.
As shown in fig. 1, the method for predicting a destination based on a path trajectory according to this embodiment at least includes the following steps:
s101, performing numerical conversion processing on path data in an order to be predicted according to a preset conversion rule to obtain corresponding path track representation data;
specifically, for step S101, all trace points are two-dimensional for a path trace represented by WGS-84 coordinates. In order to input the path trajectory into the hash model, a two-dimensional point sequence is converted into a one-dimensional array, that is, a row-by-row reading and splicing is converted into a one-dimensional array in a row-by-row scanning manner. After the path representation mode is defined, the input format of the model, namely the one-dimensional array, is determined.
S102, inputting the corresponding path track representation data into a local sensitive Hash model, and calculating according to an LSH algorithm to obtain a corresponding matching result;
specifically, for step S102, the whole model can be divided into an OffLine phase of the advance training data and an OnLine phase of the order to be predicted. The flow of the hash model plays a role in both the OffLine stage and the OnLine stage. In the OffLine phase, all sample orders are partitioned into buckets in a hash model, which is referred to as the "LSH process"; in the OnLine stage, for an order to be predicted, a sample order with a similar path high probability is obtained according to a bucket on which the order falls through a hash model, and the process is called a Query process. These orders tend to be multiple but very small parts compared to all sample orders.
S103, clustering the corresponding matching results according to the number of destinations by adopting a K-Means algorithm, calculating the path similarity of the matching results in each cluster according to a similarity measurement formula, and selecting the path with the highest similarity in each cluster as a multi-destination prediction result.
Specifically, for step S103, for multiple destination prediction problems, a new measurement mode is designed for selecting paths with high similarity and expecting the end points of the orders to be far apart, the matching results (sample order sets) of the hash submodels are clustered according to the number of destinations by using K-Means, and the path with the highest similarity in each cluster is taken as the result.
In a preferred embodiment, the method for predicting a destination based on a path trajectory further includes:
screening sample orders in a preset time range before and after the departure time of the order to be predicted;
clustering the sample orders according to the starting point positions by adopting a K-Means algorithm, and recording corresponding clustering centers;
finding the nearest cluster center according to the starting point position of the order to be predicted, and determining the cluster to which the order belongs;
screening the corresponding matching results, and reserving sample orders which are clustered in the same cluster with the order to be predicted;
and calculating the path similarity of the matching result according to a similarity measurement formula, and selecting the path with the highest similarity in the same cluster as a single destination prediction result.
In particular, for single destination prediction problems, the deviation of the predicted destination from the actual destination is heavily studied. And further screening the Query result of the hash model by adding factors of time or geographical distance, and then reducing the predicted error from the screening of the sample by using an improved similarity measurement mode. And marking a class label for auxiliary screening for each sample order in the OffLine stage, and finishing the screening process and the similarity comparison for the sample to be predicted in the OnLine stage.
It should be noted that the screening destination is essentially a set of conditions, and thus the form is variable, and there are different methods or means for different problems. The method can play a role in both the OffLine stage and the OnLine stage, and the screening condition mainly considers the influence of time attributes and the pre-clustering effect.
In a preferred embodiment, the numerical value conversion process specifically includes:
dividing the map into a plurality of grids according to the longitude and latitude lines, and labeling the grids according to the sequence of the longitude and latitude;
and performing numerical value conversion on the two-dimensional point sequence in the path track data, reading line by line in a line scanning mode, and splicing and converting the two-dimensional point sequence into a one-dimensional array.
In a preferred embodiment, the preset conversion rule specifically includes: keeping the appearance sequence of each coordinate point in the original sequence in the path track data; replacing each coordinate point in the original sequence by using the grid label; if continuous coordinate points appear in the grid, only one coordinate point is reserved.
Specifically, it is assumed that the earth is divided into individual grids by longitude and latitude lines on the earth according to a certain degree, and the marks are artificially marked according to the sequence of the longitude and latitude lines. For a path S ═ { a, b, c, d, e, f, g, h, i, j, k }, the path is numerically converted according to a preset rule through the labels of the grid, and is converted into a one-dimensional array; the preset rules include: the sequence of the appearance of each point of the original sequence is reserved; replacing each (coordinate) point with a grid label; if a situation occurs where successive points are in the grid, i.e. there are successive multiple repeated values, only one value is retained.
In a preferred embodiment, the locality sensitive hash model includes an Offline phase and an Online phase, wherein,
the Offline stage specifically includes: carrying out MinHash conversion on sample tracks with any length through h designed Hash functions to obtain h-dimensional vectors; averagely dividing h-dimensional data into b bands by an LSH algorithm, wherein each Band contains r results of hash functions; for each Band, storing the orders with the same r hash values into the same Bucket, wherein the index value of the Bucket is the corresponding r hash values; storing the calculation result of the LSH stage as the matching of the order to be predicted in the subsequent OffLine stage;
the Online stage specifically comprises the following steps: for an order to be predicted, in the Query process, traversing each Band, finding a corresponding Bucket according to the hash value of the order track in the Band, and taking all sample orders stored in the Buckets as the matching result of the order to be predicted in the hash model.
Specifically, the locality sensitive hash model is divided into an Offline phase and an Online phase.
In the Offline stage, for sample tracks with any length, the sample tracks are converted into h-dimensional vectors through h designed hash functions (MinHash process). Next, the data in h dimension is divided into b Bands on average by the LSH algorithm, each Band contains the results of r hash functions, and h ═ b × r is satisfied here. For each Band, storing the orders with the same r hash values into the same Bucket, wherein the index value of the Bucket is the corresponding r hash values. And storing the result of the LSH stage so as to match the order to be predicted in the subsequent OffLine stage.
In the Online stage, for an order to be predicted, the local sensitive hash model traverses each Band in the Query process, finds a corresponding Bucket according to the hash value of the order track in the Band, and notices that at most one Bucket corresponds to each Band. And finally, all the sample orders stored in the Buckets are the matching results of the orders to be predicted in the hash submodel. It should be noted that, in the embodiment of the present invention, the LSH model parameter of the present technology employs a similarity threshold of 50% and 256 hash functions.
In a preferred embodiment, the similarity measure formula is SIM (S1, S2) ═ COMm (S1, S2)/len (S1)
Wherein S1 is the path of the order to be forecasted, S2 is the forecasted path, COMm (S1, S2) is the maximum common length of discontinuity of the path S2 relative to the path S1, and len (S1) is the length of the path S1.
Specifically, in this embodiment, two ways of calculating the similarity are adopted:
(1) jaccard similarity, which is the similarity used in screening similar paths in MinHash and LSH. However, this calculation has two significant disadvantages. Firstly, the lengths of the two paths are calculated when the similarity is calculated, namely all elements contained in the two paths after dimension reduction are used; secondly, the time sequence of the original path is disturbed by the hash process, which is not beneficial to accurately predicting the destination. For the second problem, there are two solutions: firstly, path screening is carried out on initial point clusters; secondly, a timing sequence considered metric is customized, and the metric also overcomes the first problem.
(2) In the SIMm calculation mode of the longest discontinuous similarity, if a path is regarded as a character string, a task can be converted into the solution of the maximum discontinuous public length of the two character strings, and the solution is usually realized by dynamic programming. Note that path S2 has a maximum discontinuity common length COMm with respect to S1 (S1, S2). Since a similar path of the path S1 needs to be searched, and the Jaccard similarity takes into account the lengths of the two matching paths during calculation, if the path S1 is small and the matching path is long, the path is affected by the long path. In order to improve the rationality, the similarity calculation is modified by considering the problem of the path sequence and emphasizing the similarity of other paths relative to the path S1, and the improved similarity measurement formula is as follows:
SIM(S1,S2)=COMm(S1,S2)/len(S1)
here, len (S1) is the length of the path S1.
The embodiment provides a destination prediction method based on a path track, which comprises the following steps: performing numerical value conversion processing on path data in an order to be predicted according to a preset conversion rule to obtain corresponding path track representation data; inputting the corresponding path track representation data into a local sensitive Hash model, and calculating according to an LSH algorithm to obtain a corresponding matching result; and clustering the corresponding matching results according to the number of destinations by adopting a K-Means algorithm, calculating the path similarity of the matching results in each cluster according to a similarity measurement formula, and selecting the path with the highest similarity in each cluster as a multi-destination prediction result. According to the method, the tracks can be subjected to dimension reduction processing and matched with approximate path sets through a least-Hash-based locality sensitive Hash algorithm, different methods are adopted aiming at different problems of predicting a unique destination, screening multiple destinations and the like, effect verification of different methods of multi-parameter evaluation is designed, the destination prediction capability based on the track of the tracks is improved, and the method does not contain any personal attribute characteristic information, and provides reference for application that personal information of users or drivers is unavailable.
Second embodiment of the invention
Please refer to fig. 2.
As shown in fig. 2, an embodiment of the present invention further provides a destination prediction system based on a path trajectory, including:
the path preprocessing module 100 is configured to perform numerical conversion processing on path data in an order to be predicted according to a preset conversion rule to obtain corresponding path trajectory representation data;
specifically, for the path preprocessing module 100, all trace points are two-dimensional for a path trace represented by WGS-84 coordinates. In order to input the path trajectory into the hash model, a two-dimensional point sequence is converted into a one-dimensional array, that is, a row-by-row reading and splicing is converted into a one-dimensional array in a row-by-row scanning manner. After the path representation mode is defined, the input format of the model, namely the one-dimensional array, is determined.
The locality sensitive hash model module 200 is configured to input the corresponding path trajectory representation data to a locality sensitive hash model, and obtain a corresponding matching result according to calculation of an LSH algorithm;
specifically, for the partially sensitive hash model module 200, the whole model can be divided into an OffLine phase for training data in advance and an OnLine phase for an order to be predicted. The flow of the hash model plays a role in both the OffLine stage and the OnLine stage. In the OffLine phase, all sample orders are partitioned into buckets in a hash model, which is referred to as the "LSH process"; in the OnLine stage, for an order to be predicted, a sample order with a similar path high probability is obtained according to a bucket on which the order falls through a hash model, and the process is called a Query process. These orders tend to be multiple but very small parts compared to all sample orders.
And the multi-destination prediction module 300 is configured to cluster the corresponding matching results according to the number of destinations by using a K-Means algorithm, calculate the path similarity of the matching results in each cluster according to a similarity measurement formula, and select a path with the highest similarity in each cluster as a multi-destination prediction result.
Specifically, for the multi-destination prediction module 300, for a plurality of destination prediction problems, a new measurement mode is designed by picking out paths with high similarity and expecting the end points of the orders to be far away, clustering is performed on the matching results (sample order sets) of the hash submodels by using K-Means according to the number of destinations, and the path with the highest similarity in each cluster is taken as the result.
For similarity calculation, two similarity calculation methods are adopted in this embodiment, including:
(1) jaccard similarity, which is the similarity used in screening similar paths in MinHash and LSH. However, this calculation has two significant disadvantages. Firstly, the lengths of the two paths are calculated when the similarity is calculated, namely all elements contained in the two paths after dimension reduction are used; secondly, the time sequence of the original path is disturbed by the hash process, which is not beneficial to accurately predicting the destination. For the second problem, there are two solutions: firstly, path screening is carried out on initial point clusters; secondly, a timing sequence considered metric is customized, and the metric also overcomes the first problem.
(2) In the SIMm calculation mode of the longest discontinuous similarity, if a path is regarded as a character string, a task can be converted into the solution of the maximum discontinuous public length of the two character strings, and the solution is usually realized by dynamic programming. Note that path S2 has a maximum discontinuity common length COMm with respect to S1 (S1, S2). Since a similar path of the path S1 needs to be searched, and the Jaccard similarity takes into account the lengths of the two matching paths during calculation, if the path S1 is small and the matching path is long, the path is affected by the long path. In order to improve the rationality, the similarity calculation is modified by considering the problem of the path sequence and emphasizing the similarity of other paths relative to the path S1, and the improved similarity measurement formula is as follows:
SIM(S1,S2)=COMm(S1,S2)/len(S1)
here, len (S1) is the length of the path S1.
In a preferred embodiment, the system for predicting a destination based on a path trajectory further includes:
the sample screening module 400 is used for screening sample orders in a preset time range before and after the departure time of the order to be predicted in the OffLine stage and the Online stage; screening the corresponding matching results, and reserving sample orders which are clustered in the same cluster with the order to be predicted;
specifically, the sample screening module 400 includes sample screening for hash results and cluster selection for hash model matching results.
Wherein sample screening for hash results is applied to a single-destination prediction problem. The sample order set obtained after LSH has higher similarity with the order to be predicted under the condition of neglecting time factors, and the sample orders are not directly subjected to order screening according to the size of the similarity SIMm, but part of the sample orders are removed firstly. Two further angles for screening sample orders are provided herein and describe the manner of use in the present technique: 1. angle based on time. The departure time of each order to be predicted can be obtained, and sample orders before and after the departure time of the order to be predicted can be selected according to a certain time range. 2. Based on the angle of the space. And clustering all the sample orders by using K-Means according to the starting point positions, and recording corresponding clustering centers. And finding the nearest cluster center of the order to be predicted according to the position of the starting point, determining the cluster to which the order belongs, and further screening the result obtained by the LSH (least squares) to only keep the sample order in the same cluster with the order to be predicted.
And the cluster selection of the hash model matching result is applied to the multi-destination prediction problem. Since order screening directly according to the size of similarity SIMm has a large possibility of obtaining several results with close distances, it is not favorable for application in life. Again, K-Means are used here, but the clustering objects are all results from LSH, the number of clusters is equal to the number of required destinations, and finally the destination with the highest SIMm in each cluster is selected as the result of multi-destination prediction.
The single-destination prediction module 500 is used for clustering the sample orders according to the starting point positions by adopting a K-Means algorithm and recording corresponding clustering centers; finding the nearest cluster center according to the starting point position of the order to be predicted, and determining the cluster to which the order belongs; and calculating the path similarity of the matching result according to a similarity measurement formula, and selecting the path with the highest similarity in the same cluster as a single destination prediction result.
In particular, for the single destination prediction module 500, the deviation of the predicted destination from the actual destination is of great concern for the single destination prediction problem. And further screening the Query result of the hash sub-model by adding factors of time or geographical distance, and then reducing the prediction error from the screening of the sample by using an improved similarity measurement mode. And marking a class label for auxiliary screening for each sample order in the OffLine stage, and finishing the screening process and the similarity comparison for the sample to be predicted in the OnLine stage. The method comprises the following specific steps: screening sample orders in a preset time range before and after the departure time of the order to be predicted; clustering the sample orders according to the starting point positions by adopting a K-Means algorithm, and recording corresponding clustering centers; finding the nearest cluster center according to the starting point position of the order to be predicted, and determining the cluster to which the order belongs; screening the corresponding matching results, and reserving sample orders which are clustered in the same cluster with the order to be predicted; and calculating the path similarity of the matching result according to a similarity measurement formula, and selecting the path with the highest similarity in the same cluster as a single destination prediction result.
In a preferred embodiment, the locally sensitive hash model module 200 includes an Offline unit and an Online unit, wherein,
the Offline unit is used for carrying out MinHash conversion on sample tracks with any length through h designed Hash functions to obtain h-dimensional vectors; averagely dividing h-dimensional data into b bands by an LSH algorithm, wherein each Band contains r results of hash functions; for each Band, storing the orders with the same r hash values into the same Bucket, wherein the index value of the Bucket is the corresponding r hash values; storing the calculation result of the LSH stage as the matching of the order to be predicted in the subsequent OffLine stage;
and the Online unit is used for traversing each Band in the Query process of the order to be predicted, finding out a corresponding Bucket according to the hash value of the order track in the Band, and taking all sample orders stored in the Buckets as the matching result of the order to be predicted in the hash model.
The present embodiment provides a destination prediction system based on a path trajectory, including: performing numerical value conversion processing on path data in an order to be predicted according to a preset conversion rule to obtain corresponding path track representation data; inputting the corresponding path track representation data into a local sensitive Hash model, and calculating according to an LSH algorithm to obtain a corresponding matching result; and clustering the corresponding matching results according to the number of destinations by adopting a K-Means algorithm, calculating the path similarity of the matching results in each cluster according to a similarity measurement formula, and selecting the path with the highest similarity in each cluster as a multi-destination prediction result. According to the method, the tracks can be subjected to dimension reduction processing and matched with approximate path sets through a least-Hash-based locality sensitive Hash algorithm, different methods are adopted aiming at different problems of predicting a unique destination, screening multiple destinations and the like, effect verification of different methods of multi-parameter evaluation is designed, the destination prediction capability based on the track of the tracks is improved, and the method does not contain any personal attribute characteristic information, and provides reference for application that personal information of users or drivers is unavailable.
Another embodiment of the present invention further provides a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the method for predicting a destination based on a path trajectory as described above.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the modules may be a logical division, and in actual implementation, there may be another division, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The foregoing is directed to the preferred embodiment of the present invention, and it is understood that various changes and modifications may be made by one skilled in the art without departing from the spirit of the invention, and it is intended that such changes and modifications be considered as within the scope of the invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Claims (8)

1. A destination prediction method based on a path track is characterized by at least comprising the following steps:
performing numerical value conversion processing on path data in an order to be predicted according to a preset conversion rule to obtain corresponding path track representation data;
inputting the corresponding path track representation data into a local sensitive Hash model, and calculating according to an LSH algorithm to obtain a corresponding matching result;
clustering the corresponding matching results according to the number of destinations by adopting a K-Means algorithm, calculating the path similarity of the matching results in each cluster according to a similarity measurement formula, and selecting the path with the highest similarity in each cluster as a multi-destination prediction result;
screening sample orders in a preset time range before and after the departure time of the order to be predicted;
clustering the sample orders according to the starting point positions by adopting a K-Means algorithm, and recording corresponding clustering centers;
finding the nearest cluster center according to the starting point position of the order to be predicted, and determining the cluster to which the order belongs;
screening the corresponding matching results, and reserving sample orders which are clustered in the same cluster with the order to be predicted;
and calculating the path similarity of the matching result according to a similarity measurement formula, and selecting the path with the highest similarity in the same cluster as a single destination prediction result.
2. The method for predicting a destination based on a path trajectory according to claim 1, wherein the numerical conversion process specifically includes:
dividing the map into a plurality of grids according to the longitude and latitude lines, and labeling the grids according to the sequence of the longitude and latitude;
and performing numerical value conversion on the two-dimensional point sequence in the path track data, reading line by line in a line scanning mode, and splicing and converting the two-dimensional point sequence into a one-dimensional array.
3. The method for predicting a destination based on a path trajectory according to claim 1, wherein the preset conversion rule specifically comprises: keeping the appearance sequence of each coordinate point in the original sequence in the path track data; replacing each coordinate point in the original sequence by using the grid label; if continuous coordinate points appear in the grid, only one coordinate point is reserved.
4. The method according to claim 1, wherein the locality sensitive hash model comprises an Offline phase and an Online phase, wherein,
the Offline stage specifically includes: carrying out MinHash conversion on sample tracks with any length through h designed Hash functions to obtain h-dimensional vectors; averagely dividing h-dimensional data into b bands by an LSH algorithm, wherein each Band contains r results of hash functions; for each Band, storing the orders with the same r hash values into the same Bucket, wherein the index value of the Bucket is the corresponding r hash values; storing the calculation result of the LSH stage as the matching of the order to be predicted in the subsequent OffLine stage;
the Online stage specifically comprises the following steps: for an order to be predicted, in the Query process, traversing each Band, finding a corresponding Bucket according to the hash value of the order track in the Band, and taking all sample orders stored in the Buckets as the matching result of the order to be predicted in the hash model.
5. The method of claim 1, wherein the similarity measure formula is a similarity measure formula
SIM(S1,S2)=COMm(S1,S2)/len(S1)
Wherein S1 is the path of the order to be forecasted, S2 is the forecasted path, COMm (S1, S2) is the maximum common length of discontinuity of the path S2 relative to the path S1, and len (S1) is the length of the path S1.
6. A destination prediction system based on a path trajectory, comprising:
the path preprocessing module is used for performing numerical conversion processing on path data in the order to be predicted according to a preset conversion rule to obtain corresponding path track representation data;
the local sensitive hash model module is used for inputting the corresponding path track representation data into a local sensitive hash model and calculating according to an LSH algorithm to obtain a corresponding matching result;
the multi-destination prediction module is used for clustering the corresponding matching results according to the number of destinations by adopting a K-Means algorithm, calculating the path similarity of the matching results in each cluster according to a similarity measurement formula, and selecting the path with the highest similarity in each cluster as a multi-destination prediction result;
the sample screening module is used for screening sample orders in a preset time range before and after the departure time of the order to be predicted in the OffLine stage and the OnLine stage; screening the corresponding matching results, and reserving sample orders which are clustered in the same cluster with the order to be predicted;
the single-destination prediction module is used for clustering the sample orders according to the starting point positions by adopting a K-Means algorithm and recording corresponding clustering centers; finding the nearest cluster center according to the starting point position of the order to be predicted, and determining the cluster to which the order belongs; and calculating the path similarity of the matching result according to a similarity measurement formula, and selecting the path with the highest similarity in the same cluster as a single destination prediction result.
7. The path-trajectory-based destination prediction system of claim 6, wherein the locally sensitive hash model module comprises an Offline unit and an Online unit, wherein,
the Offline unit is used for carrying out MinHash conversion on sample tracks with any length through h designed Hash functions to obtain h-dimensional vectors; averagely dividing h-dimensional data into b bands by an LSH algorithm, wherein each Band contains r results of hash functions; for each Band, storing the orders with the same r hash values into the same Bucket, wherein the index value of the Bucket is the corresponding r hash values; storing the calculation result of the LSH stage as the matching of the order to be predicted in the subsequent OffLine stage;
and the Online unit is used for traversing each Band in the Query process of the order to be predicted, finding out a corresponding Bucket according to the hash value of the order track in the Band, and taking all sample orders stored in the Buckets as the matching result of the order to be predicted in the hash model.
8. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method for path-trajectory-based destination prediction according to any one of claims 1 to 5.
CN201910788582.7A 2019-08-23 2019-08-23 Destination prediction method, system and storage medium based on path track Active CN110598917B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910788582.7A CN110598917B (en) 2019-08-23 2019-08-23 Destination prediction method, system and storage medium based on path track

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910788582.7A CN110598917B (en) 2019-08-23 2019-08-23 Destination prediction method, system and storage medium based on path track

Publications (2)

Publication Number Publication Date
CN110598917A CN110598917A (en) 2019-12-20
CN110598917B true CN110598917B (en) 2020-11-24

Family

ID=68855526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910788582.7A Active CN110598917B (en) 2019-08-23 2019-08-23 Destination prediction method, system and storage medium based on path track

Country Status (1)

Country Link
CN (1) CN110598917B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139137B (en) * 2020-01-19 2022-05-03 北京三快在线科技有限公司 Method and device for determining POI coordinates, storage medium and electronic equipment
CN111309977A (en) * 2020-02-24 2020-06-19 北京明略软件系统有限公司 ID space-time trajectory matching method and device
CN112752232B (en) * 2021-01-07 2022-07-12 重庆大学 Privacy protection-oriented driver-passenger matching method
CN115062868B (en) * 2022-07-28 2022-11-11 北京建筑大学 Pre-polymerization type vehicle distribution path planning method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488597A (en) * 2015-12-28 2016-04-13 中国民航信息网络股份有限公司 Passenger destination prediction method and system
CN108286980A (en) * 2017-12-29 2018-07-17 广州通易科技有限公司 A method of prediction destination and recommendation drive route
CN108592927A (en) * 2018-03-05 2018-09-28 武汉光庭信息技术股份有限公司 Destination prediction technique and system based on history traffic path

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9618343B2 (en) * 2013-12-12 2017-04-11 Microsoft Technology Licensing, Llc Predicted travel intent

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488597A (en) * 2015-12-28 2016-04-13 中国民航信息网络股份有限公司 Passenger destination prediction method and system
CN108286980A (en) * 2017-12-29 2018-07-17 广州通易科技有限公司 A method of prediction destination and recommendation drive route
CN108592927A (en) * 2018-03-05 2018-09-28 武汉光庭信息技术股份有限公司 Destination prediction technique and system based on history traffic path

Also Published As

Publication number Publication date
CN110598917A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
CN110598917B (en) Destination prediction method, system and storage medium based on path track
US9904932B2 (en) Analyzing semantic places and related data from a plurality of location data reports
Mohamed et al. Accurate real-time map matching for challenging environments
Gao et al. Optimize taxi driving strategies based on reinforcement learning
CN106875066B (en) Vehicle travel behavior prediction method, device, server and storage medium
CN106931974B (en) Method for calculating personal commuting distance based on mobile terminal GPS positioning data record
CN100555355C (en) The method and system that the passage rate of road traffic calculates and mates
CN110414732B (en) Travel future trajectory prediction method and device, storage medium and electronic equipment
CN103134505B (en) Path planning system and method thereof
CN111582559B (en) Arrival time estimation method and device
Ye et al. A method for driving route predictions based on hidden Markov model
CN107679558A (en) A kind of user trajectory method for measuring similarity based on metric learning
Chen et al. An analysis of movement patterns between zones using taxi GPS data
CN110276387B (en) Model generation method and device
Amirat et al. Nextroute: a lossless model for accurate mobility prediction
CN112381078B (en) Elevated-based road identification method, elevated-based road identification device, computer equipment and storage medium
CN113888867A (en) Parking space recommendation method and system based on LSTM position prediction
Xie et al. High-Accuracy off-line map-matching of trajectory network division based on weight adaptation HMM
CN115691140B (en) Analysis and prediction method for space-time distribution of automobile charging demand
CN112380443B (en) Guide recommendation method, device, computer equipment and storage medium
Ashqar Strategic design of smart bike-sharing systems for smart cities
CN113611115A (en) Vehicle track clustering method based on road network sensitive characteristics
Li et al. Map matching for taxi GPS data with extreme learning machine
Zuyun et al. A quick map-matching algorithm by using grid-based selecting
Guo et al. Vehicle Destination Prediction Based on Trajectory Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant