CN112347376B - Taxi passenger carrying point recommendation method based on multi-time-space clustering - Google Patents
Taxi passenger carrying point recommendation method based on multi-time-space clustering Download PDFInfo
- Publication number
- CN112347376B CN112347376B CN202010952020.4A CN202010952020A CN112347376B CN 112347376 B CN112347376 B CN 112347376B CN 202010952020 A CN202010952020 A CN 202010952020A CN 112347376 B CN112347376 B CN 112347376B
- Authority
- CN
- China
- Prior art keywords
- taxi
- point
- points
- passenger carrying
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 230000029305 taxis Effects 0.000 claims abstract description 20
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 12
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 4
- 150000001875 compounds Chemical class 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G06Q50/40—
Abstract
The invention discloses a taxi passenger carrying point recommending method based on multi-time space clustering, which can improve recommending accuracy, and sequentially comprises the following steps: preprocessing the original track data of the taxis, and removing track noise points of the taxis; extracting taxi history passenger carrying point information from the preprocessed track data; clustering historical passenger carrying points of the taxis by using a multi-space-time clustering method; generating a plurality of candidate passenger carrying point cluster clusters according to the real-time request information of the target taxis; calculating recommendation degree for each candidate carrying point cluster by using the real-time popularity, the average speed and the distance information; selecting the highest recommendation degreeNAnd the individual passenger carrying point cluster is provided as a recommendation result to the target taxi.
Description
Technical Field
The invention relates to the technical field of recommendation, in particular to a taxi passenger carrying point recommendation method based on multi-space-time clustering, which can improve recommendation accuracy.
Background
With the wide application of GPS mobile equipment in urban vehicles, a large amount of vehicle track information is generated every day, and urban intelligent traffic planning can be realized by reasonably utilizing the vehicle track information. Clustering the vehicle track information is one of the most effective methods for processing track data, and can recommend passenger carrying points for taxis, so that taxis are prevented from randomly cruising on streets, and passenger carrying cost of taxis is reduced. However, the existing passenger carrying point recommendation method cannot reasonably cluster non-convex data such as similar urban road track points and the like and uses global information to conduct static recommendation, so that similar passenger carrying points are often recommended for all taxis in an area, a plurality of taxis are recommended together, and traffic jam is caused.
Disclosure of Invention
The invention aims to solve the technical problems in the prior art and provides a taxi passenger carrying point recommendation method based on multi-space-time clustering, which can improve recommendation accuracy.
The technical scheme of the invention is as follows: a taxi passenger carrying point recommending method based on multi-time space clustering is characterized by comprising the following steps of:
step 1, preprocessing original track data of a taxi, and removing track noise points of the taxi; the taxi track noise points are divided into two categories: one type is a track of a taxi in an idle running state or a passenger carrying state in one day in an operation state, and the other type is a track of a taxi in which the state shows an error;
step 2, extracting historical taxi passenger carrying points from the track data from which the noise is removed; the taxi passenger carrying point is a data point in the track data, when the taxi is at the data point, the speed is zero, the state is empty, and when the taxi is driven away from the data point, the state is changed from empty to passenger carrying;
step 3, clustering historical passenger carrying points of the taxis by using a multi-space-time clustering method;
step 3.1. The extracted passenger carrying points of the taxies are made to form a matrixO=[o 1 ,o 2 ,…,o m ] T WhereinmRepresenting the number of passenger carrying points of the taxi,o i represent the firstiCarrying out passenger carrying points; from the slaveOIs selected randomlynMatrix of individual special pointsS=[s 1 ,s 2 ,…,s n ] T WhereinnIndicating the number of special points to be used,s j represent the firstjA special point;
step 3.2. Calculate the carrying Point using equation (1)o i And a special points j Distance including time factor:
(1)
In the formula (1), the components are as follows,representing the decay rate; />And->Respectively represent the passenger carrying pointso i And a special points j Is the current time of (2); />Andrespectively represent the passenger carrying pointso i And a special points j Longitude of (2); />And->Respectively represent the passenger carrying pointso i And a special points j Longitude and latitude differences between;
step 3.3. Dividing the carrying points and the special points into the following points by solving the formula (2)KClustering the individual carrying points:
(2)
in the formula (2), the amino acid sequence of the compound,representing a passenger carrying pointo i And a special points j Probability of connection between->Representing regularization parameters, ++>Representation matrixPA friebinis norm of (c);
step 4, generating candidate passenger carrying point cluster according to the real-time request information of the target taxi;
based on the current time of seeking passenger demand information from the target taxitTaking a plurality of passenger carrying point cluster closest to the position of the target taxi as a candidate passenger carrying point cluster recommended to the target taxi;
step 5, calculating the recommendation degree of candidate carrying point cluster according to the formula (3)RD(t);
(3)
The saidHP(t) For real-time popularity, refer to at the time pointtThe number of all historical riding points contained in the previous candidate riding point cluster;
the saidAS(t) Mean speed, means at the point in timetWhen the taxi speed is equal to the average speed of all taxis in the candidate passenger carrying point cluster;
the saidDis(t) Is distance information, which means at the time pointtWhen the taxi is in a taxi, the distance between the center point of the candidate passenger carrying point cluster and the target taxi;
step 6, selecting the highest recommendation degreeNAnd recommending the individual passenger carrying point cluster to the target taxi.
Compared with the prior art, the invention has the following advantages:
1. according to the invention, a multi-time no-load passenger point clustering algorithm is provided for the first time, and passenger points distributed on urban roads can be effectively clustered into corresponding class clusters according to the characteristic of non-convexity of the spatial-temporal distribution of the passenger points of taxis, so that the problem that non-convexity data samples cannot be effectively clustered by a traditional clustering method is effectively solved;
2. the method introduces a concept of real-time recommendation degree to the taxi carrying points, can start from real-time requirements sent by target taxis, starts from real-time popularity, average speed and distance information, and provides different types of recommendation results for different taxis by calculating the recommendation degree of candidate carrying point cluster in real time, thereby effectively improving the accuracy of taxi carrying point recommendation, further improving the carrying rate of taxi drivers and avoiding traffic jam caused by the fact that a plurality of taxis travel to the carrying points simultaneously.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
FIG. 2 is a diagram showing the results of the present invention and the prior art Precision metric as a function of the recommended number.
FIG. 3 is a diagram showing the results of the embodiment of the present invention according to the recommended number of the Recall metrics.
FIG. 4 is a graph showing the results of the embodiment of the present invention and the prior art F1 metric as a function of the recommended number.
Detailed Description
The taxi passenger carrying point recommending method based on multi-time space clustering is shown in fig. 1, and sequentially comprises the following steps:
step 1, preprocessing original track data of a taxi, and removing track noise points of the taxi; although a taxi has a running track, due to the problem of transmission of a communication device, sometimes the state of the taxi is always (from the beginning to the end in one day) a state of empty driving or passenger carrying, and in addition, the passenger temporarily cancels the taxi taking service and the like to cause that the taxi state information is not updated in time and is displayed in error, so that the track noise of the taxi is divided into two types: one type is a track of a taxi in a state of always being empty or carrying passengers, and the other type is a track of a taxi in a state of displaying errors;
the partial track data of a certain taxi is shown in table 1.
TABLE 1
In Table 1, at P 3 At the data point, the speed of the taxi is not 0, so P 3 The data points may be referred to as false-labeled data points, like P, due to the passenger temporarily canceling the taxi service 3 Data points of the data point type will be removed as noise.
Trace data for noise removal is shown in table 2.
TABLE 2
Step 2, extracting historical taxi passenger carrying points from the track data from which the noise is removed; the passenger carrying point of the taxi is a data point in the track data, the speed is zero and the state is empty when the taxi is at the data point, the state is changed from empty to passenger carrying when the taxi is driven away from the data point, and only P is shown in the table 2 6 The data points are taxi passenger points.
Accordingly, 77405 passenger carrying points are extracted from 6075587 data points of taxi track data in the Shanghai region of 2 months and 20 days in 2007.
Step 3, clustering historical passenger carrying points of the taxis by using a multi-space-time clustering method;
step 3.1. The information of the passenger carrying points of the taxies is extracted to form a matrixO=[o 1 ,o 2 ,…,o m ] T WhereinmRepresenting the number of passenger carrying points of the taxi,o i represent the firstiCarrying out passenger carrying points; from the slaveOIs selected randomlynMatrix of individual special pointsS=[s 1 , s 2 ,…,s n ] T WhereinnIndicating the number of special points to be used,s j represent the firstjA special point;
step 3.2. Calculate the carrying Point using equation (1)o i And a special points j Distance including time factor:
(1)
In the formula (1), the components are as follows,represents the decay rate if->=24 hours, then time is represented as decay in days;and->Respectively represent the passenger carrying pointso i And a special points j Is the current time of (2); />And->Respectively represent the passenger carrying pointso i And a special points j Longitude of (2); />And->Respectively represent the passenger carrying pointso i And a special points j Longitude and latitude differences between;
for example, as shown in table 3, the distances between 5 carrying points and 4 special points.
TABLE 3 Table 3
Step 3.3. Dividing the carrying points and the special points into the following points by solving the formula (2)KClustering the individual carrying points:
(2)
in the formula (2), the amino acid sequence of the compound,representing a passenger carrying pointo i And a special points j Probability of connection between->Representing regularization parameters, ++>Representation matrixPA friebinis norm of (c);
solving probability matricesPThe normalized laplace operator of (a) as shown in equation (2-1):
(2-1)
in the formula (3), the amino acid sequence of the compound,Irepresenting the identity matrix of the cell,Vrepresenting a diagonal matrix, where the firstiThe value of each diagonal element is equal toMiddle (f)jSum of column values->Representation matrixPThen, according to the connectivity of the binary network, obtainKClustering the individual carrying points;
77405 passenger carrying points extracted from 6075587 taxi data points of taxi track data of the Shanghai region of 2 months and 20 days in 2007 are gathered into 1024 classes, and then an average of 76 taxi passenger carrying points data exist in each class.
Step 4, generating candidate passenger carrying point cluster according to the real-time request information of the target taxi;
the target taxi sends out the information of the passenger demand, and the current position and time of the target taxi are used for sending the information of the passenger demandtTaking a plurality of passenger carrying point cluster clusters closest to the target taxi as candidate passenger carrying point cluster recommended to the target taxi, wherein for example, 5 candidate passenger carrying point cluster clusters are recommended, and C is respectively adopted 1 、C 2 、C 3 、C 4 、C 5 ;
Step 5, calculating the recommendation degree of candidate carrying point cluster according to the formula (3)RD(t);
(3)
The saidHP(t) For real-time popularity, refer to at the time pointtThe number of all historical riding points contained in the previous candidate riding point cluster;
the saidAS(t) Mean speed, means at the point in timetWhen the taxi speed is equal to the average speed of all taxis in the candidate passenger carrying point cluster;
the saidDis(t) Is distance information, which means at the time pointtWhen the taxi is in a taxi, the distance between the center point of the candidate passenger carrying point cluster and the target taxi;
for example, candidate passenger cluster C 1 、C 2 、C 3 、C 4 、C 5 Information of (2) is shown in table 4:
TABLE 4 Table 4
Through calculation, 5 candidate passenger carrying points are gatheredCluster C 1 、C 2 、C 3 、C 4 、C 5 Recommendation degree of (2) is RD respectively 1 (t)=-0.222;RD 2 (t)=-0.322;RD 3 (t)=496;RD 4 (t)=-0.615;RD 5 (t) = -0.003. Sequencing to obtain: { C 3 , C 5 , C 1 , C 2 , C 4 }。
Step 6, selecting the highest recommendation degreeNAnd recommending the individual passenger carrying point cluster to the target taxi.
Comprehensive real-time popularity, average speed and distance information, although candidate passenger point cluster C 4 Nearest to the target taxi, but with the lowest recommendation; therefore, 2 passenger point cluster clusters { C with highest recommendation degree are selected 3 , C 5 And providing the recommended result to the target taxi.
The experimental demonstration is carried out on the method of the invention, which comprises the following steps:
(1) Use of public data sets
The public data set used in the invention is used for collecting taxi track data in the Shanghai region of 2 months and 20 days in 2007. The dataset contained 6075587 track data points collected from 4316 taxis.
(2) Evaluation metrics
The invention adopts Precision, recall and F1 to measure the accuracy of the recommended result. The larger the values of Precision, recall and F1, the higher the recommendation accuracy:
(4)
(5)
(6)
in the formulas (4), (5) and (6),representing recommendations to the firstlTaxi of individualNAnd a collection of carrying points. />Represent the firstlA historical collection of passenger carrying points for each taxi.
(3) Comparison and analysis of experimental results
The taxi passenger carrying point recommendation method based on the multi-time space clustering is represented by MSTCRS, and compared with the traditional K-means clustering method, the result is shown in figures 2-4. FIGS. 2-4 show the results of the experiments Precision, recall and F1, from which data it is seen that the results Precision, recall and F1 of the MSTCRS method are significantly higher than the results of the K-mean method as the recommended number increases. Since the larger the values of Precision, recall and F1, the higher the recommendation accuracy, MSTCRS is able to recommend passenger points with higher accuracy than K-means.
Claims (1)
1. A taxi passenger carrying point recommending method based on multi-time space clustering is characterized by comprising the following steps of:
step 1, preprocessing original track data of a taxi, and removing track noise points of the taxi; the taxi track noise points are divided into two categories: one type is a track of a taxi in an idle running state or a passenger carrying state in one day in an operation state, and the other type is a track of a taxi in which the state shows an error;
step 2, extracting historical taxi passenger carrying points from the track data from which the noise is removed; the taxi passenger carrying point is a data point in the track data, when the taxi is at the data point, the speed is zero, the state is empty, and when the taxi is driven away from the data point, the state is changed from empty to passenger carrying;
step 3, clustering historical passenger carrying points of the taxis by using a multi-space-time clustering method;
step 3.1. The extracted passenger carrying points of the taxies are made to form a matrixO=[o 1 ,o 2 ,…,o m ] T WhereinmRepresenting the number of passenger carrying points of the taxi,o i represent the firstiCarrying out passenger carrying points; from the slaveOIs selected randomlynMatrix of individual special pointsS=[s 1 ,s 2 ,…,s n ] T WhereinnIndicating the number of special points to be used,s j represent the firstjA special point;
step 3.2. Calculate the carrying Point using equation (1)o i And a special points j Distance including time factor:
(1)
In the formula (1), the components are as follows,representing the decay rate; />And->Respectively represent the passenger carrying pointso i And a special points j Is the current time of (2); />And->Respectively represent the passenger carrying pointso i And a special points j Longitude of (2); />And->Respectively represent the passenger carrying pointso i And a special points j Longitude and latitude differences between;
step 3.3. Dividing the carrying points and the special points into the following points by solving the formula (2)KClustering the individual carrying points:
(2)
in the formula (2), the amino acid sequence of the compound,representing a passenger carrying pointo i And a special points j Probability of connection between->The regularization parameters are represented by a set of values,representation matrixPA friebinis norm of (c);
step 4, generating candidate passenger carrying point cluster according to the real-time request information of the target taxi;
based on the current time of seeking passenger demand information from the target taxitTaking a plurality of passenger carrying point cluster closest to the position of the target taxi as a candidate passenger carrying point cluster recommended to the target taxi;
step 5, calculating the recommendation degree of candidate carrying point cluster according to the formula (3)RD(t);
(3)
The saidHP(t) For real-time popularity, refer to at the time pointtThe number of all historical riding points contained in the previous candidate riding point cluster;
the saidAS(t) Mean speed, means at the point in timetWhen the taxi speed is equal to the average speed of all taxis in the candidate passenger carrying point cluster;
the saidDis(t) Is distance information, which means at the time pointtWhen the taxi is in a taxi, the distance between the center point of the candidate passenger carrying point cluster and the target taxi;
step 6, selecting the highest recommendation degreeNAnd recommending the individual passenger carrying point cluster to the target taxi.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010952020.4A CN112347376B (en) | 2020-09-11 | 2020-09-11 | Taxi passenger carrying point recommendation method based on multi-time-space clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010952020.4A CN112347376B (en) | 2020-09-11 | 2020-09-11 | Taxi passenger carrying point recommendation method based on multi-time-space clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112347376A CN112347376A (en) | 2021-02-09 |
CN112347376B true CN112347376B (en) | 2023-10-27 |
Family
ID=74357260
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010952020.4A Active CN112347376B (en) | 2020-09-11 | 2020-09-11 | Taxi passenger carrying point recommendation method based on multi-time-space clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112347376B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115472008B (en) * | 2022-08-30 | 2023-09-19 | 东南大学 | Network vehicle travel space-time characteristic analysis method based on k-means clustering |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102568200A (en) * | 2011-12-21 | 2012-07-11 | 辽宁师范大学 | Method for judging vehicle driving states in real time |
CN105045858A (en) * | 2015-07-10 | 2015-11-11 | 湖南科技大学 | Voting based taxi passenger-carrying point recommendation method |
CN106227884A (en) * | 2016-08-08 | 2016-12-14 | 深圳市未来媒体技术研究院 | A kind of recommendation method of calling a taxi online based on collaborative filtering |
CN107392245A (en) * | 2017-07-19 | 2017-11-24 | 南京信息工程大学 | A kind of taxi trajectory clustering algorithm Tr OPTICS |
CN110222786A (en) * | 2019-06-14 | 2019-09-10 | 深圳大学 | Dynamic share-car method and system based on trip information |
EP3547153A1 (en) * | 2018-03-29 | 2019-10-02 | Palantir Technologies Inc. | Interactive geographical map |
-
2020
- 2020-09-11 CN CN202010952020.4A patent/CN112347376B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102568200A (en) * | 2011-12-21 | 2012-07-11 | 辽宁师范大学 | Method for judging vehicle driving states in real time |
CN105045858A (en) * | 2015-07-10 | 2015-11-11 | 湖南科技大学 | Voting based taxi passenger-carrying point recommendation method |
CN106227884A (en) * | 2016-08-08 | 2016-12-14 | 深圳市未来媒体技术研究院 | A kind of recommendation method of calling a taxi online based on collaborative filtering |
CN107392245A (en) * | 2017-07-19 | 2017-11-24 | 南京信息工程大学 | A kind of taxi trajectory clustering algorithm Tr OPTICS |
EP3547153A1 (en) * | 2018-03-29 | 2019-10-02 | Palantir Technologies Inc. | Interactive geographical map |
CN110222786A (en) * | 2019-06-14 | 2019-09-10 | 深圳大学 | Dynamic share-car method and system based on trip information |
Non-Patent Citations (2)
Title |
---|
Jing Lian,等.Robust Ordinal Regression: User Credit Grading with Triplet Loss-Based Sampling.《DATA '18: Proceedings of the First Workshop on Data Acquisition To Analysis》.2018,第3-4页. * |
融合关系挖掘与协同过滤的物品冷启动推荐算法;任永功,等;《模式识别与人工智能》;第第33卷卷(第第1期期);第75-85页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112347376A (en) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111966729B (en) | Vehicle track data processing method, device, equipment and storage medium | |
CN108346292B (en) | Urban expressway real-time traffic index calculation method based on checkpoint data | |
CA2530909C (en) | System and method of optimizing a fixed route transit network | |
CN104318324B (en) | Shuttle Bus website and route planning method based on taxi GPS records | |
US20140035921A1 (en) | Analysis and visualization of passenger movement in a transportation system | |
US10032368B1 (en) | Method and apparatus for measurement of parking duration from anonymized data | |
CN112270460A (en) | Goods source station identification method for overweight truck based on multi-source data | |
DE112006002676T5 (en) | Calculation of an optimal route based on a cohort analysis | |
CN110555992B (en) | Taxi driving path information extraction method based on GPS track data | |
CN108122186B (en) | Job and live position estimation method based on checkpoint data | |
CN105260832A (en) | Performance evaluation method for taxi drivers based on order data | |
CN103942312B (en) | Bus transfer lines planning method and device | |
CN110375762B (en) | Method and device for assisting in planning navigation path | |
CN108573600B (en) | Driver behavior induction and local traffic flow optimization method | |
CN106297304A (en) | A kind of based on MapReduce towards the fake-licensed car recognition methods of extensive bayonet socket data | |
DE112021001926T5 (en) | SYSTEM AND METHOD FOR FILTERLESS THrottling OF VEHICLE EVENT DATA PROCESSING TO IDENTIFY PARKING AREAS | |
CN108803559A (en) | Vehicle trouble analysis method, device and system | |
DE102019115367A1 (en) | DECENTRALIZED DISTRIBUTED CARD USING BLOCKCHAIN | |
CN112347376B (en) | Taxi passenger carrying point recommendation method based on multi-time-space clustering | |
CN114463972A (en) | Road section interval traffic analysis and prediction method based on ETC portal communication data | |
CN112767686B (en) | Road network automobile emission estimation method based on multi-source data fusion | |
CN116090785B (en) | Custom bus planning method for two stages of large-scale movable loose scene | |
EP1804226B1 (en) | Method for route information transmission | |
CN112748452B (en) | GPS track cleaning method based on road network data | |
CN114407661A (en) | Data-driven electric vehicle energy consumption prediction method, system, device and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |