CN112347376B - Taxi passenger carrying point recommendation method based on multi-time-space clustering - Google Patents

Taxi passenger carrying point recommendation method based on multi-time-space clustering Download PDF

Info

Publication number
CN112347376B
CN112347376B CN202010952020.4A CN202010952020A CN112347376B CN 112347376 B CN112347376 B CN 112347376B CN 202010952020 A CN202010952020 A CN 202010952020A CN 112347376 B CN112347376 B CN 112347376B
Authority
CN
China
Prior art keywords
taxi
point
points
passenger carrying
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010952020.4A
Other languages
Chinese (zh)
Other versions
CN112347376A (en
Inventor
张志鹏
张尧
任永功
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning Normal University
Original Assignee
Liaoning Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning Normal University filed Critical Liaoning Normal University
Priority to CN202010952020.4A priority Critical patent/CN112347376B/en
Publication of CN112347376A publication Critical patent/CN112347376A/en
Application granted granted Critical
Publication of CN112347376B publication Critical patent/CN112347376B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06Q50/40

Abstract

The invention discloses a taxi passenger carrying point recommending method based on multi-time space clustering, which can improve recommending accuracy, and sequentially comprises the following steps: preprocessing the original track data of the taxis, and removing track noise points of the taxis; extracting taxi history passenger carrying point information from the preprocessed track data; clustering historical passenger carrying points of the taxis by using a multi-space-time clustering method; generating a plurality of candidate passenger carrying point cluster clusters according to the real-time request information of the target taxis; calculating recommendation degree for each candidate carrying point cluster by using the real-time popularity, the average speed and the distance information; selecting the highest recommendation degreeNAnd the individual passenger carrying point cluster is provided as a recommendation result to the target taxi.

Description

Taxi passenger carrying point recommendation method based on multi-time-space clustering
Technical Field
The invention relates to the technical field of recommendation, in particular to a taxi passenger carrying point recommendation method based on multi-space-time clustering, which can improve recommendation accuracy.
Background
With the wide application of GPS mobile equipment in urban vehicles, a large amount of vehicle track information is generated every day, and urban intelligent traffic planning can be realized by reasonably utilizing the vehicle track information. Clustering the vehicle track information is one of the most effective methods for processing track data, and can recommend passenger carrying points for taxis, so that taxis are prevented from randomly cruising on streets, and passenger carrying cost of taxis is reduced. However, the existing passenger carrying point recommendation method cannot reasonably cluster non-convex data such as similar urban road track points and the like and uses global information to conduct static recommendation, so that similar passenger carrying points are often recommended for all taxis in an area, a plurality of taxis are recommended together, and traffic jam is caused.
Disclosure of Invention
The invention aims to solve the technical problems in the prior art and provides a taxi passenger carrying point recommendation method based on multi-space-time clustering, which can improve recommendation accuracy.
The technical scheme of the invention is as follows: a taxi passenger carrying point recommending method based on multi-time space clustering is characterized by comprising the following steps of:
step 1, preprocessing original track data of a taxi, and removing track noise points of the taxi; the taxi track noise points are divided into two categories: one type is a track of a taxi in an idle running state or a passenger carrying state in one day in an operation state, and the other type is a track of a taxi in which the state shows an error;
step 2, extracting historical taxi passenger carrying points from the track data from which the noise is removed; the taxi passenger carrying point is a data point in the track data, when the taxi is at the data point, the speed is zero, the state is empty, and when the taxi is driven away from the data point, the state is changed from empty to passenger carrying;
step 3, clustering historical passenger carrying points of the taxis by using a multi-space-time clustering method;
step 3.1. The extracted passenger carrying points of the taxies are made to form a matrixO=[o 1 ,o 2 ,…,o m ] T WhereinmRepresenting the number of passenger carrying points of the taxi,o i represent the firstiCarrying out passenger carrying points; from the slaveOIs selected randomlynMatrix of individual special pointsS=[s 1 ,s 2 ,…,s n ] T WhereinnIndicating the number of special points to be used,s j represent the firstjA special point;
step 3.2. Calculate the carrying Point using equation (1)o i And a special points j Distance including time factor
(1)
In the formula (1), the components are as follows,representing the decay rate; />And->Respectively represent the passenger carrying pointso i And a special points j Is the current time of (2); />Andrespectively represent the passenger carrying pointso i And a special points j Longitude of (2); />And->Respectively represent the passenger carrying pointso i And a special points j Longitude and latitude differences between;
step 3.3. Dividing the carrying points and the special points into the following points by solving the formula (2)KClustering the individual carrying points:
(2)
in the formula (2), the amino acid sequence of the compound,representing a passenger carrying pointo i And a special points j Probability of connection between->Representing regularization parameters, ++>Representation matrixPA friebinis norm of (c);
step 4, generating candidate passenger carrying point cluster according to the real-time request information of the target taxi;
based on the current time of seeking passenger demand information from the target taxitTaking a plurality of passenger carrying point cluster closest to the position of the target taxi as a candidate passenger carrying point cluster recommended to the target taxi;
step 5, calculating the recommendation degree of candidate carrying point cluster according to the formula (3)RD(t);
(3)
The saidHP(t) For real-time popularity, refer to at the time pointtThe number of all historical riding points contained in the previous candidate riding point cluster;
the saidAS(t) Mean speed, means at the point in timetWhen the taxi speed is equal to the average speed of all taxis in the candidate passenger carrying point cluster;
the saidDis(t) Is distance information, which means at the time pointtWhen the taxi is in a taxi, the distance between the center point of the candidate passenger carrying point cluster and the target taxi;
step 6, selecting the highest recommendation degreeNAnd recommending the individual passenger carrying point cluster to the target taxi.
Compared with the prior art, the invention has the following advantages:
1. according to the invention, a multi-time no-load passenger point clustering algorithm is provided for the first time, and passenger points distributed on urban roads can be effectively clustered into corresponding class clusters according to the characteristic of non-convexity of the spatial-temporal distribution of the passenger points of taxis, so that the problem that non-convexity data samples cannot be effectively clustered by a traditional clustering method is effectively solved;
2. the method introduces a concept of real-time recommendation degree to the taxi carrying points, can start from real-time requirements sent by target taxis, starts from real-time popularity, average speed and distance information, and provides different types of recommendation results for different taxis by calculating the recommendation degree of candidate carrying point cluster in real time, thereby effectively improving the accuracy of taxi carrying point recommendation, further improving the carrying rate of taxi drivers and avoiding traffic jam caused by the fact that a plurality of taxis travel to the carrying points simultaneously.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
FIG. 2 is a diagram showing the results of the present invention and the prior art Precision metric as a function of the recommended number.
FIG. 3 is a diagram showing the results of the embodiment of the present invention according to the recommended number of the Recall metrics.
FIG. 4 is a graph showing the results of the embodiment of the present invention and the prior art F1 metric as a function of the recommended number.
Detailed Description
The taxi passenger carrying point recommending method based on multi-time space clustering is shown in fig. 1, and sequentially comprises the following steps:
step 1, preprocessing original track data of a taxi, and removing track noise points of the taxi; although a taxi has a running track, due to the problem of transmission of a communication device, sometimes the state of the taxi is always (from the beginning to the end in one day) a state of empty driving or passenger carrying, and in addition, the passenger temporarily cancels the taxi taking service and the like to cause that the taxi state information is not updated in time and is displayed in error, so that the track noise of the taxi is divided into two types: one type is a track of a taxi in a state of always being empty or carrying passengers, and the other type is a track of a taxi in a state of displaying errors;
the partial track data of a certain taxi is shown in table 1.
TABLE 1
In Table 1, at P 3 At the data point, the speed of the taxi is not 0, so P 3 The data points may be referred to as false-labeled data points, like P, due to the passenger temporarily canceling the taxi service 3 Data points of the data point type will be removed as noise.
Trace data for noise removal is shown in table 2.
TABLE 2
Step 2, extracting historical taxi passenger carrying points from the track data from which the noise is removed; the passenger carrying point of the taxi is a data point in the track data, the speed is zero and the state is empty when the taxi is at the data point, the state is changed from empty to passenger carrying when the taxi is driven away from the data point, and only P is shown in the table 2 6 The data points are taxi passenger points.
Accordingly, 77405 passenger carrying points are extracted from 6075587 data points of taxi track data in the Shanghai region of 2 months and 20 days in 2007.
Step 3, clustering historical passenger carrying points of the taxis by using a multi-space-time clustering method;
step 3.1. The information of the passenger carrying points of the taxies is extracted to form a matrixO=[o 1 ,o 2 ,…,o m ] T WhereinmRepresenting the number of passenger carrying points of the taxi,o i represent the firstiCarrying out passenger carrying points; from the slaveOIs selected randomlynMatrix of individual special pointsS=[s 1 , s 2 ,…,s n ] T WhereinnIndicating the number of special points to be used,s j represent the firstjA special point;
step 3.2. Calculate the carrying Point using equation (1)o i And a special points j Distance including time factor
(1)
In the formula (1), the components are as follows,represents the decay rate if->=24 hours, then time is represented as decay in days;and->Respectively represent the passenger carrying pointso i And a special points j Is the current time of (2); />And->Respectively represent the passenger carrying pointso i And a special points j Longitude of (2); />And->Respectively represent the passenger carrying pointso i And a special points j Longitude and latitude differences between;
for example, as shown in table 3, the distances between 5 carrying points and 4 special points.
TABLE 3 Table 3
Step 3.3. Dividing the carrying points and the special points into the following points by solving the formula (2)KClustering the individual carrying points:
(2)
in the formula (2), the amino acid sequence of the compound,representing a passenger carrying pointo i And a special points j Probability of connection between->Representing regularization parameters, ++>Representation matrixPA friebinis norm of (c);
solving probability matricesPThe normalized laplace operator of (a) as shown in equation (2-1):
(2-1)
in the formula (3), the amino acid sequence of the compound,Irepresenting the identity matrix of the cell,Vrepresenting a diagonal matrix, where the firstiThe value of each diagonal element is equal toMiddle (f)jSum of column values->Representation matrixPThen, according to the connectivity of the binary network, obtainKClustering the individual carrying points;
77405 passenger carrying points extracted from 6075587 taxi data points of taxi track data of the Shanghai region of 2 months and 20 days in 2007 are gathered into 1024 classes, and then an average of 76 taxi passenger carrying points data exist in each class.
Step 4, generating candidate passenger carrying point cluster according to the real-time request information of the target taxi;
the target taxi sends out the information of the passenger demand, and the current position and time of the target taxi are used for sending the information of the passenger demandtTaking a plurality of passenger carrying point cluster clusters closest to the target taxi as candidate passenger carrying point cluster recommended to the target taxi, wherein for example, 5 candidate passenger carrying point cluster clusters are recommended, and C is respectively adopted 1 、C 2 、C 3 、C 4 、C 5
Step 5, calculating the recommendation degree of candidate carrying point cluster according to the formula (3)RD(t);
(3)
The saidHP(t) For real-time popularity, refer to at the time pointtThe number of all historical riding points contained in the previous candidate riding point cluster;
the saidAS(t) Mean speed, means at the point in timetWhen the taxi speed is equal to the average speed of all taxis in the candidate passenger carrying point cluster;
the saidDis(t) Is distance information, which means at the time pointtWhen the taxi is in a taxi, the distance between the center point of the candidate passenger carrying point cluster and the target taxi;
for example, candidate passenger cluster C 1 、C 2 、C 3 、C 4 、C 5 Information of (2) is shown in table 4:
TABLE 4 Table 4
Through calculation, 5 candidate passenger carrying points are gatheredCluster C 1 、C 2 、C 3 、C 4 、C 5 Recommendation degree of (2) is RD respectively 1 (t)=-0.222;RD 2 (t)=-0.322;RD 3 (t)=496;RD 4 (t)=-0.615;RD 5 (t) = -0.003. Sequencing to obtain: { C 3 , C 5 , C 1 , C 2 , C 4 }。
Step 6, selecting the highest recommendation degreeNAnd recommending the individual passenger carrying point cluster to the target taxi.
Comprehensive real-time popularity, average speed and distance information, although candidate passenger point cluster C 4 Nearest to the target taxi, but with the lowest recommendation; therefore, 2 passenger point cluster clusters { C with highest recommendation degree are selected 3 , C 5 And providing the recommended result to the target taxi.
The experimental demonstration is carried out on the method of the invention, which comprises the following steps:
(1) Use of public data sets
The public data set used in the invention is used for collecting taxi track data in the Shanghai region of 2 months and 20 days in 2007. The dataset contained 6075587 track data points collected from 4316 taxis.
(2) Evaluation metrics
The invention adopts Precision, recall and F1 to measure the accuracy of the recommended result. The larger the values of Precision, recall and F1, the higher the recommendation accuracy:
(4)
(5)
(6)
in the formulas (4), (5) and (6),representing recommendations to the firstlTaxi of individualNAnd a collection of carrying points. />Represent the firstlA historical collection of passenger carrying points for each taxi.
(3) Comparison and analysis of experimental results
The taxi passenger carrying point recommendation method based on the multi-time space clustering is represented by MSTCRS, and compared with the traditional K-means clustering method, the result is shown in figures 2-4. FIGS. 2-4 show the results of the experiments Precision, recall and F1, from which data it is seen that the results Precision, recall and F1 of the MSTCRS method are significantly higher than the results of the K-mean method as the recommended number increases. Since the larger the values of Precision, recall and F1, the higher the recommendation accuracy, MSTCRS is able to recommend passenger points with higher accuracy than K-means.

Claims (1)

1. A taxi passenger carrying point recommending method based on multi-time space clustering is characterized by comprising the following steps of:
step 1, preprocessing original track data of a taxi, and removing track noise points of the taxi; the taxi track noise points are divided into two categories: one type is a track of a taxi in an idle running state or a passenger carrying state in one day in an operation state, and the other type is a track of a taxi in which the state shows an error;
step 2, extracting historical taxi passenger carrying points from the track data from which the noise is removed; the taxi passenger carrying point is a data point in the track data, when the taxi is at the data point, the speed is zero, the state is empty, and when the taxi is driven away from the data point, the state is changed from empty to passenger carrying;
step 3, clustering historical passenger carrying points of the taxis by using a multi-space-time clustering method;
step 3.1. The extracted passenger carrying points of the taxies are made to form a matrixO=[o 1 ,o 2 ,…,o m ] T WhereinmRepresenting the number of passenger carrying points of the taxi,o i represent the firstiCarrying out passenger carrying points; from the slaveOIs selected randomlynMatrix of individual special pointsS=[s 1 ,s 2 ,…,s n ] T WhereinnIndicating the number of special points to be used,s j represent the firstjA special point;
step 3.2. Calculate the carrying Point using equation (1)o i And a special points j Distance including time factor
(1)
In the formula (1), the components are as follows,representing the decay rate; />And->Respectively represent the passenger carrying pointso i And a special points j Is the current time of (2); />And->Respectively represent the passenger carrying pointso i And a special points j Longitude of (2); />And->Respectively represent the passenger carrying pointso i And a special points j Longitude and latitude differences between;
step 3.3. Dividing the carrying points and the special points into the following points by solving the formula (2)KClustering the individual carrying points:
(2)
in the formula (2), the amino acid sequence of the compound,representing a passenger carrying pointo i And a special points j Probability of connection between->The regularization parameters are represented by a set of values,representation matrixPA friebinis norm of (c);
step 4, generating candidate passenger carrying point cluster according to the real-time request information of the target taxi;
based on the current time of seeking passenger demand information from the target taxitTaking a plurality of passenger carrying point cluster closest to the position of the target taxi as a candidate passenger carrying point cluster recommended to the target taxi;
step 5, calculating the recommendation degree of candidate carrying point cluster according to the formula (3)RD(t);
(3)
The saidHP(t) For real-time popularity, refer to at the time pointtThe number of all historical riding points contained in the previous candidate riding point cluster;
the saidAS(t) Mean speed, means at the point in timetWhen the taxi speed is equal to the average speed of all taxis in the candidate passenger carrying point cluster;
the saidDis(t) Is distance information, which means at the time pointtWhen the taxi is in a taxi, the distance between the center point of the candidate passenger carrying point cluster and the target taxi;
step 6, selecting the highest recommendation degreeNAnd recommending the individual passenger carrying point cluster to the target taxi.
CN202010952020.4A 2020-09-11 2020-09-11 Taxi passenger carrying point recommendation method based on multi-time-space clustering Active CN112347376B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010952020.4A CN112347376B (en) 2020-09-11 2020-09-11 Taxi passenger carrying point recommendation method based on multi-time-space clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010952020.4A CN112347376B (en) 2020-09-11 2020-09-11 Taxi passenger carrying point recommendation method based on multi-time-space clustering

Publications (2)

Publication Number Publication Date
CN112347376A CN112347376A (en) 2021-02-09
CN112347376B true CN112347376B (en) 2023-10-27

Family

ID=74357260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010952020.4A Active CN112347376B (en) 2020-09-11 2020-09-11 Taxi passenger carrying point recommendation method based on multi-time-space clustering

Country Status (1)

Country Link
CN (1) CN112347376B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115472008B (en) * 2022-08-30 2023-09-19 东南大学 Network vehicle travel space-time characteristic analysis method based on k-means clustering

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102568200A (en) * 2011-12-21 2012-07-11 辽宁师范大学 Method for judging vehicle driving states in real time
CN105045858A (en) * 2015-07-10 2015-11-11 湖南科技大学 Voting based taxi passenger-carrying point recommendation method
CN106227884A (en) * 2016-08-08 2016-12-14 深圳市未来媒体技术研究院 A kind of recommendation method of calling a taxi online based on collaborative filtering
CN107392245A (en) * 2017-07-19 2017-11-24 南京信息工程大学 A kind of taxi trajectory clustering algorithm Tr OPTICS
CN110222786A (en) * 2019-06-14 2019-09-10 深圳大学 Dynamic share-car method and system based on trip information
EP3547153A1 (en) * 2018-03-29 2019-10-02 Palantir Technologies Inc. Interactive geographical map

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102568200A (en) * 2011-12-21 2012-07-11 辽宁师范大学 Method for judging vehicle driving states in real time
CN105045858A (en) * 2015-07-10 2015-11-11 湖南科技大学 Voting based taxi passenger-carrying point recommendation method
CN106227884A (en) * 2016-08-08 2016-12-14 深圳市未来媒体技术研究院 A kind of recommendation method of calling a taxi online based on collaborative filtering
CN107392245A (en) * 2017-07-19 2017-11-24 南京信息工程大学 A kind of taxi trajectory clustering algorithm Tr OPTICS
EP3547153A1 (en) * 2018-03-29 2019-10-02 Palantir Technologies Inc. Interactive geographical map
CN110222786A (en) * 2019-06-14 2019-09-10 深圳大学 Dynamic share-car method and system based on trip information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Jing Lian,等.Robust Ordinal Regression: User Credit Grading with Triplet Loss-Based Sampling.《DATA '18: Proceedings of the First Workshop on Data Acquisition To Analysis》.2018,第3-4页. *
融合关系挖掘与协同过滤的物品冷启动推荐算法;任永功,等;《模式识别与人工智能》;第第33卷卷(第第1期期);第75-85页 *

Also Published As

Publication number Publication date
CN112347376A (en) 2021-02-09

Similar Documents

Publication Publication Date Title
CN111966729B (en) Vehicle track data processing method, device, equipment and storage medium
CN108346292B (en) Urban expressway real-time traffic index calculation method based on checkpoint data
CA2530909C (en) System and method of optimizing a fixed route transit network
CN104318324B (en) Shuttle Bus website and route planning method based on taxi GPS records
US20140035921A1 (en) Analysis and visualization of passenger movement in a transportation system
US10032368B1 (en) Method and apparatus for measurement of parking duration from anonymized data
CN112270460A (en) Goods source station identification method for overweight truck based on multi-source data
DE112006002676T5 (en) Calculation of an optimal route based on a cohort analysis
CN110555992B (en) Taxi driving path information extraction method based on GPS track data
CN108122186B (en) Job and live position estimation method based on checkpoint data
CN105260832A (en) Performance evaluation method for taxi drivers based on order data
CN103942312B (en) Bus transfer lines planning method and device
CN110375762B (en) Method and device for assisting in planning navigation path
CN108573600B (en) Driver behavior induction and local traffic flow optimization method
CN106297304A (en) A kind of based on MapReduce towards the fake-licensed car recognition methods of extensive bayonet socket data
DE112021001926T5 (en) SYSTEM AND METHOD FOR FILTERLESS THrottling OF VEHICLE EVENT DATA PROCESSING TO IDENTIFY PARKING AREAS
CN108803559A (en) Vehicle trouble analysis method, device and system
DE102019115367A1 (en) DECENTRALIZED DISTRIBUTED CARD USING BLOCKCHAIN
CN112347376B (en) Taxi passenger carrying point recommendation method based on multi-time-space clustering
CN114463972A (en) Road section interval traffic analysis and prediction method based on ETC portal communication data
CN112767686B (en) Road network automobile emission estimation method based on multi-source data fusion
CN116090785B (en) Custom bus planning method for two stages of large-scale movable loose scene
EP1804226B1 (en) Method for route information transmission
CN112748452B (en) GPS track cleaning method based on road network data
CN114407661A (en) Data-driven electric vehicle energy consumption prediction method, system, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant