CN107679558A - A kind of user trajectory method for measuring similarity based on metric learning - Google Patents
A kind of user trajectory method for measuring similarity based on metric learning Download PDFInfo
- Publication number
- CN107679558A CN107679558A CN201710847477.7A CN201710847477A CN107679558A CN 107679558 A CN107679558 A CN 107679558A CN 201710847477 A CN201710847477 A CN 201710847477A CN 107679558 A CN107679558 A CN 107679558A
- Authority
- CN
- China
- Prior art keywords
- user
- matrix
- similarity
- track
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 239000011159 matrix material Substances 0.000 claims abstract description 76
- 230000006870 function Effects 0.000 claims abstract description 24
- 238000005259 measurement Methods 0.000 claims abstract description 22
- 239000013598 vector Substances 0.000 claims abstract description 15
- 238000000691 measurement method Methods 0.000 claims description 10
- 238000012512 characterization method Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 claims description 8
- 238000005457 optimization Methods 0.000 claims description 7
- 102000002274 Matrix Metalloproteinases Human genes 0.000 claims description 6
- 108010000684 Matrix Metalloproteinases Proteins 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 238000004140 cleaning Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 238000000638 solvent extraction Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000013480 data collection Methods 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 230000003595 spectral effect Effects 0.000 abstract description 3
- 230000002123 temporal effect Effects 0.000 abstract description 2
- 230000000977 initiatory effect Effects 0.000 abstract 1
- 238000007418 data mining Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- BULVZWIRKLYCBC-UHFFFAOYSA-N phorate Chemical compound CCOP(=S)(OCC)SCSCC BULVZWIRKLYCBC-UHFFFAOYSA-N 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Remote Sensing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of user trajectory method for measuring similarity based on metric learning, and the similitude between user trajectory is obtained by calculating user trajectory distance with reference to measurement learning method.Firstly generate the place temporal joint probability distribution matrix of each user.The initial similarity between the different user track based on user distribution matrix is calculated followed by KL divergences, and the initial category (user trajectory being divided into different classifications according to similarity matrix, calculate with convenient follow-up similarity measurements flow function) for the method generation user trajectory for passing through spectral clustering.Finally on the basis of initial similarity matrix S and Track Initiation category set C, with reference to measurement learning art, obtain the similitude for possessing user preference pattern and the user trajectory with identical dimensional and characterize vector sum metric function number, on this basis, the distance between user trajectory is calculated, obtains user trajectory similitude.
Description
Technical Field
The invention belongs to the technical field of track similarity measurement, and particularly relates to a user track similarity measurement method based on metric learning.
Background
With the development of positioning satellites, personal positioning devices and wireless networks, user trajectory data has shown explosive growth. In consideration of the potential social value of user trajectory data mining, the technical field is more and more concerned by all circles, especially the fields of computer science, geographic information science, social science and the like. Meanwhile, in the industrial field, analysis and mining of user trajectory data create huge commercial values for various fields. For example, a traffic management department can analyze traffic flow data to avoid the urban congestion phenomenon of travel peak and solve the problems of similar urban traffic and urban environment; an enterprise relating to user travel business can solve the problems of user travel path planning, neighbor user recommendation, customer location optimization and the like by carrying out data mining on user trajectory data and establishing an effective model.
In a user trajectory data mining algorithm, measurement of user trajectory similarity is often involved, such as trajectory clustering, trajectory prediction, abnormal trajectory detection and the like. The user track similarity measurement is a core technology in user track data mining and has important theoretical and application values. The current user trajectory similarity measurement is mainly divided into measurement in a space-time space and a feature space.
In the space-time space, because the user trajectory has a time characteristic, the similarity measurement method usually extends the similarity measurement method of a time sequence from a time-attribute sequence to a three-dimensional space-time sequence of a time-space-attribute, such as a maximum public subsequence, a dynamic time warping, a minimum edit distance and the like. The common drawback of this method is that all coordinates and time information in the user track are considered equally, and some key location or time information existing in the user track is ignored. Some similarity measurement methods only consider coordinate information, and the user trajectory has space-time tight coupling, so that the similarity of the user trajectory cannot be effectively measured.
In the feature space, the basic idea is to extract some inherent features of the user trajectory, such as the speed, curvature, length, starting point, etc. of the user trajectory. The method relies on expert knowledge, so that great redundancy exists among the characteristics easily, and key division information between the user track and the user track is lost.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a user trajectory similarity measurement method based on metric learning, which is used for generating a similarity representation of user trajectory global property, user preference pattern and consistent feature dimension.
In order to achieve the above object, the present invention provides a user trajectory similarity measurement method based on metric learning, which is characterized by comprising the following steps:
(1) User mobile data collection and cleaning
Collecting user mobile data, and sorting and cleaning the user mobile data according to analysis requirements: extracting time position information of a key Point hidden in user mobile data by adopting a key Point information extraction technology (namely POI (Point of Interest) to obtain a track representation of a user based on the key Point;
(2) User location-time joint distribution calculation
Firstly, clustering the time and position information of key places of all users by using a clustering algorithm (such as DBSCAN) to obtain a hot spot area, then obtaining the key places accessed by all users at high frequency by combining the position information of the known key places of the city, and extracting P key places which are ranked at the front, namely more accessed key places as places accessed by a user track.
The activity time of the whole user is dynamically divided according to the distribution of the activity time on the time dimension to obtain T time periods, and each user is divided into access places and time periods based on user tracksIts site-time joint probability distribution matrixi =1,2, \ 8230;. M, m is the number of users, and the matrix directly reflects the distribution of each user track in the space dimension and the time dimension;
(3) Obtaining the initial similarity matrix of the user track
Location-time joint probability distribution matrix based on each userCalculating an initial similarity matrix S between the tracks of the user:
wherein the initial similarity matrix S is a symmetric similarity matrix S i,j Representing the similarity between user i and user j, is defined as follows:
wherein, sigma is a function width parameter, determined according to specific implementation conditions, and KL divergence d i,j Is defined as:
wherein w i (p, t) is a site-time joint probability distribution matrixIn time period t, the probability of the user trajectory to appear towards visiting location p, user i, w j (p, t) is a site-time joint probability distribution matrixIn the time period t, the probability that the user track tends to the visiting place pis appears;
(4) Initial category acquisition of trajectory
Summing each row of the initial similarity matrix S, sequentially using the sum as the diagonal elements of the diagonal matrix D according to the row correspondence, then calculating the Laplace matrix L = D-S, and solving the first k minimum eigenvalues of the Laplace matrix L through SVD (singular value decomposition)And corresponding feature vectors
Constructing a matrix M: each feature vector is combinedSequentially serving as a column to form a matrix M with M rows and k columns, wherein each row of the matrix M corresponds to each row in the original initial similarity matrix S, namely a k-dimensional representation of a user track;
finally, on the K-dimensional representation, obtaining category label information of each user track in a K-Means mode to form a track initial category set C;
(5) Trajectory similarity metric learning
Respectively corresponding the initial similarity matrix S and the initial track category set C to two elements in metric learning, namely: the similarity matrix and the marginal information are processed by a metric learning method to obtain a metric function A after learning optimization, and meanwhile, similarity characterization vectors of user tracks in the same feature space can be obtained
Finally, the similarity characterization vector of the user track is combinedAnd measuring a function number A, and calculating by using a Mahalanobis distance algorithm to obtain the distance between user tracks:
distance between user trajectories dist (sv) i ,sv j ) The smaller the similarity, the larger the similarity, and vice versa.
The object of the invention is thus achieved.
The user track similarity measurement method based on measurement learning combines the measurement learning method to obtain the similarity between the user tracks by calculating the user track distance. Firstly, collecting, arranging and cleaning user mobile data, and then extracting P user track in the user mobile data to approach to an access place (hereinafter referred to as an access place for short) by using a clustering method; meanwhile, the whole user activity time is dynamically divided according to the distribution of the user activity time in the time dimension to obtain T time periods, so that a location-time combined probability distribution matrix (hereinafter referred to as a user distribution matrix) of each user is generated. And then, calculating initial similarity among different user tracks based on the user distribution matrix through KL divergence, and generating initial categories of the user tracks through a spectral clustering method (namely, dividing the user tracks into different categories according to the similarity matrix so as to facilitate subsequent similarity measurement function calculation). And finally, on the basis of the initial similarity matrix S and the track initial category set C, combining with a metric learning technology to obtain similarity characteristic vectors of user tracks which have the user preference mode and have the same dimensionalityAnd measuring a function number A, and on the basis, calculating the distance between the user tracks to obtain the similarity of the user tracks.
In the invention, aiming at the problems of non-uniform track spatio-temporal scale and effective feature extraction of the traditional method in the aspect of space-time space, the effectiveness of the user track similarity measurement is improved by obtaining the spatio-temporal distribution feature of the user track and adopting a measurement learning method, so that the purposes of similarity and dissimilarity are achieved. The method can be widely applied to various user trajectory data mining technologies.
Drawings
FIG. 1 is a flow chart of an embodiment of a method for measuring track similarity based on metric learning according to the present invention;
FIG. 2 is a schematic diagram of time period dynamic partitioning in the present invention, wherein T1-T4 represent four time periods, respectively having a track coordinate probability distribution partitioned according to a fixed time period (1 hour) and a track probability distribution subjected to dynamic partitioning;
FIG. 3 is a system diagram of a user trajectory similarity measure method according to an embodiment of the present invention;
fig. 4 is a schematic diagram of the principle of the metric learning technique in the present invention, in which (a) represents the distance between two class samples before the metric learning, and (b) represents the distance between two class samples after the metric learning.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the main content of the present invention.
Fig. 1 is a flowchart of a specific embodiment of the method for measuring track similarity based on metric learning according to the present invention.
In this embodiment, as shown in fig. 1, the trajectory similarity measurement method based on metric learning of the present invention includes the following steps:
step S1: user mobile data collection and cleaning
User movement data generally contains a user number and trajectory information, where the trajectory information is an ordered sequence of binary groups of < location coordinates, timestamps >. For common GPS (global positioning system) trajectories, such as: vehicle positioning GPS tracks, personal cell phone GPS positioning tracks, etc., are coordinate plus time series generated with higher frequency acquisition, and therefore need to be sorted and cleaned.
In the GPS data, since redundant information is excessive in the GPS data, it is necessary to extract time and position information of a key (important) place having a temporal-spatial distribution characteristic. The invention adopts the traditional POI (Point of Interest) extraction method to extract the position information of the key (important) points hidden in the GPS data and extract the time distribution information of the corresponding positions to obtain the time position information of the key points of the user track, namely the track representation of the user based on the key points. In the specific implementation process, other similar extraction methods or expert knowledge may also be used to extract the time and position information of the key location in the position data.
Step S2: user location-time joint distribution computation
In the invention, on the aspect of position POI extraction, a density-based DBSCAN clustering method is adopted to obtain a hot spot area accessed by all user tracks, and the hot spot area implicitly indicates that the area has higher probability of being accessed aiming at all users. And combining the position information of the known key places of the city to obtain the key places visited by all users at high frequency, and extracting P key places which are ranked at the top, namely more visited as places which tend to be visited by the user track. Therefore, the user track length can be obviously reduced, and the original motion characteristic mode of the user is reserved.
And dynamically dividing the whole user activity time according to the distribution of the whole user activity time in the time dimension to obtain T time periods. On the time period division, statistical analysis is performed on the time dimension of the user trajectory, resulting in a time probability distribution as shown in fig. 2. The invention solves the time interval division from the original time interval division (such as 24 hours a day) to the dynamic time interval division through the difference of the occurrence frequency of the track points of different time slices. For example, assume that a user trajectory data set has the following distribution in the time dimension:
< (0 point to 1 point, w = 0.3), (1 point to 2 points, w = 0.05),
(3 point to 4 point, w = 0.35), (4 point to 5 point, w = 0.3) >, and
wherein w is the probability of the user appearing in the time period, and can be divided according to the dynamic time period:
wherein p is t The probability of the occurrence of the user in the time period T, delta is the span of the time period which is used for controlling the division parameter, the value is between 0 and 1 and can be automatically adjusted, and T t Representing a time period t. In the present embodiment, δ =0.1 is taken, and a new time distribution as shown in fig. 2 is obtained. It can be seen that most users appear between 3-5 points and very few users appear between 1-2 points before dynamic time division is performed, which results in that when a user distribution matrix is calculated, the elements of the users between 1-2 points approach to zero, and if the time period is large, the whole user distribution matrix becomes sparse. After dynamic time division, the span of the time period for which the user is sparse changes from the original fixed width (1 hour) to delta/p t Equal to extending the time period of the original user back and forth by 0.5 × (δ/p) at the same time t -1), more visited sites are included, thus solving the problem of sparseness for the user. In this example, it is assumed that after the dynamic partitioning process, the partitioning of the access point in the time dimension becomes:
the probability of the track coordinate appearing between the original 1 point and the original 2 points is obviously improved.
The spatial and temporal features are obtained by the method, and a user distribution matrix of each user is generated:
and step S3: user trajectory initial similarity matrix acquisition
Aiming at the user distribution matrix obtained in the step S2The invention adopts KL divergence mode to calculate the initial similarity matrix S of the user track:
wherein the initial similarity matrix S is a symmetric similarity matrix S i,j Representing the similarity between user i and user j, defined as follows:
wherein, sigma is a function width parameter, determined according to specific implementation conditions, and KL divergence d i,j Is defined as:
wherein w i (p, t) is a site-time joint probability distribution matrixIn time period t, the probability of the user trajectory to appear towards visiting location p, user i, w j (p, t) is a site-time joint probability distribution matrixIn a period of timet, probability that the user track tends to appear at the visiting place pdusej;
it can be considered that a new characterization of the user trajectory in the similarity space is obtained, that is, each row in the matrix corresponds to a user trajectory, and the subsequent metric learning is also based on the similarity matrix. Intuitively understand that if the similarity of two tracks and the similarity of other tracks are the same, the two tracks can be considered to be similar; if two tracks have significant difference from the similarity of the rest tracks, the two tracks are considered to be dissimilar.
And step S4: trajectory initial class acquisition
Once the initial similarity matrix S of all user track data is obtained, clustering division is carried out on the user tracks on the similar space by using a spectral clustering method, and category label information of each user track is obtained.
In this embodiment, the specific method of cluster partitioning is as follows:
4.1 A) and carrying out summation operation on each row of the initial similarity matrix S of all user track data to obtain each row of the similarity matrix of the track data set and d i
The row sums are then taken as the elements on the diagonal of the diagonal matrix D. The physical meaning of the diagonal matrix can be interpreted as a sum of similar weights for each user trajectory and other user trajectories similar thereto, and then we calculate:
L=D-S
wherein L is a laplace matrix. After the Laplace matrix L is obtained through calculation, matrix decomposition is carried out on the Laplace matrix L to sequentially output the first k minimum eigenvaluesAnd corresponding feature vectors
4.2 K), the number of initial categories of the track is selected
Before labeling a user track with labels obtained by clustering division, the number k of specific category labels needs to be initialized. From a practical point of view, it is difficult for one to determine the magnitude of the k value a priori. The invention provides a method for selecting a k value based on a minimum description length so as to perform clustering operation on a user track.
Specifically, n different k values are initialized, and for different k values, a description length of k based on model parameters can be calculated for the similarity matrix of the track data set. From the principle of minimum description length, we can know that when the used k minimizes the value of all similarity vectors of model coding observation data, namely, the trajectory data set, we can consider the current k value as the optimal choice.
Suppose is provided with k 1 ,k 2 Two parameter choices, we calculate:
wherein θ is k 1 Or k 2 ,|C i I is the number of samples contained in the ith cluster, s j Is the jth row, μ, in the trace similarity matrix i Is the mean of the ith cluster. dist is a distance function, in this example using the Euclidean distance metric. If there is Loss 1 >Loss 2 Then, the selection parameter k is stated 2 All data can be better encoded so we will choose this value as the label number parameter for the model.
4.3 ) and K-Means clustering to obtain user track category label information
Constructing a matrix M: each feature vector is combinedSequentially forming a matrix M with M rows and k columns as a column, wherein each row of the matrix M corresponds to each row in the original initial similarity matrix SIs a k-dimensional representation of the user trajectory; and finally, on the K-dimensional representation, obtaining the category label information of each user track in a K-Means mode to form a track initial category set C.
Step S5: trajectory similarity metric learning
In the invention, the representation with uniform characteristic dimension between user tracks and a more accurate and robust similarity measurement function are obtained by using measurement learning.
5.1 Definitions involved in metric learning)
5.1.1 In a specific metric learning task, there are often three sets of sample pairs, which are necessarily connected, necessarily unconnected, and similar difference sets, and are exemplified below for the convenience of the reader of the present invention to understand better;
assuming that there is a sample set < a, B, C >, and it is known that a, B are similar to each other to a high degree and B, C are similar to each other to a low degree, there are a necessarily connected set S = { (a, B) }, a necessarily unconnected set D = { (B, C) }, and a similarity difference set Diff = { ((a, B), C) }. Through the samples in each set, the similarity or the difference between the samples can be obtained, and then the samples are used as constraint conditions to be added into the subsequent learning process. In this example, we have already obtained the category label information of each user track through S4, and a necessarily connected set and a necessarily unconnected set can be established in sequence for the basis. The criterion for allocating the sample pairs to the sets is based on adding the sample pairs to the sets which need to be connected if the two user track label information are consistent, and adding the sample pairs to the sets which need not be connected if the user track label information is different.
5.1.2 Known by a euclidean distance function), known by
The extended distance metric function, namely mahalanobis distance, can be obtained:
wherein the transformation matrix A ∈ R d×d And must be a semi-positive definite matrix. It can be seen that when a = I, the mahalanobis distance degenerates to the euclidean distance. When the constraint a is a diagonal matrix, we learn a metric function with different weights in each feature dimension. If A is decomposed to obtain A 1/2 Then, multiplying a certain sample in the original space, namely, equivalently, transforming each dimension of the sample to obtain the representation of the sample on the new feature space.
5.2 In the same class), although we have embedded the user trajectory into a feature space with one dimension being consistent through the previous steps, there is a bias in the distance measure such that the distance between samples that originally belong to the same class is larger than the distance between samples that belong to different classes. At this time, we have obtained an initial trajectory similarity metric and a constructed constraint set as shown in fig. 3, and using the category information or constraint information, and on the basis of the initial similarity matrix, we perform one of the most critical steps of the present invention, i.e., iteratively optimize the metric function.
The result that the present invention needs to achieve, i.e. learning a new metric function a, is determined as follows. To obtain a new measurement function through calculation and combine with user similarity constraint information, an optimization model is constructed
A≥1
Wherein A ≧ 1 represents that A is a semi-positive definite matrix. And solving the optimization problem by adopting an optimization method of gradient descent and iterative mapping, and finally returning a similarity measurement function matrix A.
As shown in fig. 4, assuming that different graphs respectively represent different user trajectory categories, the physical meaning of the model is that, for two trajectories belonging to the set that must be connected, the distance between them is minimized (here, the characterization in the similarity space is used), i.e., the distance between circles and the distance between triangles in the graph are shortened; while maximizing the user trajectories belonging to the necessarily disjoint set, i.e. enlarging the distance between the circle and the triangle. Therefore, the purpose of optimizing the distance measurement function (corresponding to the distance measurement of the shadow sample and other samples successfully corrected in the diagram) can be achieved by carrying out corresponding weight change on different dimensions in the measurement space.
Specifically, in the present invention, the initial similarity matrix S and the initial trajectory category set C are respectively corresponding to two elements in metric learning: the similarity matrix and the marginal information are processed by using a metric learning method to the whole user track set, so that a metric function A after learning optimization is obtained, and similarity characterization vectors of all user tracks in the same feature space can be obtained
Finally, the similarity characterization vector of the user track is combinedAnd measuring a function number A, and calculating by using a Mahalanobis distance algorithm to obtain the distance between user tracks:
distance between user trajectories dist (sv) i ,sv j ) The smaller the similarity, the larger the similarity, and vice versa.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all inventions utilizing the inventive concept are protected.
Claims (2)
1. A user trajectory similarity measurement method based on metric learning is characterized by comprising the following steps:
(1) User mobile data collection and cleaning
Collecting user mobile data, and sorting and cleaning the user mobile data according to analysis requirements: extracting time position information of the key points hidden in the user mobile data by adopting a key Point information extraction technology (namely POI (Point of Interest), so as to obtain a track representation of the user based on the key points;
(2) User location-time joint distribution calculation
Firstly, clustering key location time and position information of all users by using a clustering algorithm (such as DBSCAN) to obtain a hot spot area, then obtaining key locations accessed by all users at high frequency by combining position information of known key locations of cities, and extracting P key locations ranked ahead, namely more accessed key locations as locations that user tracks tend to access.
The activity time of the whole user is dynamically divided according to the distribution of the activity time in the time dimension to obtain T time periods, and a location-time joint probability distribution matrix of each user is obtained based on the user track trend access location and time period divisioni =1,2, \ 8230;. M, m is the number of users, and the matrix directly reflects the distribution of each user track in the space dimension and the time dimension;
(3) Obtaining the initial similarity matrix of the user track
Location-time joint probability distribution matrix based on each userCalculating an initial similarity matrix S between the tracks of the user:
wherein the initial similarity matrix S is a symmetric similarity matrix S i,j Representing the similarity between user i and user j, is defined as follows:
wherein, sigma is a function width parameter, determined according to specific implementation conditions, and KL divergence d i,j Is defined as:
wherein w i (p, t) is a site-time joint probability distribution matrixIn time period t, probability of occurrence of user i when user track tends to visit place p, w j (p, t) is a site-time joint probability distribution matrixIn the time period t, the probability that the user track tends to the visiting place pis appears;
(4) Initial trajectory category acquisition
Summing each row of the initial similarity matrix S, sequentially using the sum as elements on diagonal lines of a diagonal matrix D according to row correspondence, then calculating a Laplace matrix L = D-S, and solving the first k minimum eigenvalues of the Laplace matrix L through SVD (singular value decomposition)And corresponding feature vectors
Constructing a matrix M: each feature vector is combinedSequentially serving as a column to form a matrix M with M rows and k columns, wherein each row of the matrix M corresponds to each row in the original initial similarity matrix S, namely a k-dimensional representation of a user track;
finally, on the K-dimensional representation, obtaining the category label information of each user track in a K-Means mode to form a track initial category set C;
(5) And learning track similarity measurement
Respectively corresponding the initial similarity matrix S and the initial track category set C to two elements in metric learning, namely: the similarity matrix and the marginal information are processed by a metric learning method to obtain a metric function A after learning optimization, and meanwhile, similarity characterization vectors of user tracks in the same feature space can be obtained
Finally, the similarity characterization vector of the user track is combinedAnd measuring a function number A, and calculating by using a Mahalanobis distance algorithm to obtain the distance between user tracks:
distance between user trajectories dist (sv) i ,sv j ) The smaller, the greater the similarity, the inverseThe smaller the similarity.
2. The method according to claim, wherein the dynamic partitioning is:
wherein p is t The probability of the occurrence of the user in the time period T, delta is the span of the time period which is used for controlling the division parameter, the value is between 0 and 1 and can be automatically adjusted, and T t Representing a time period t.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710847477.7A CN107679558B (en) | 2017-09-19 | 2017-09-19 | A kind of user trajectory method for measuring similarity based on metric learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710847477.7A CN107679558B (en) | 2017-09-19 | 2017-09-19 | A kind of user trajectory method for measuring similarity based on metric learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107679558A true CN107679558A (en) | 2018-02-09 |
CN107679558B CN107679558B (en) | 2019-09-24 |
Family
ID=61137557
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710847477.7A Expired - Fee Related CN107679558B (en) | 2017-09-19 | 2017-09-19 | A kind of user trajectory method for measuring similarity based on metric learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107679558B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108537254A (en) * | 2018-03-23 | 2018-09-14 | 浙江工业大学 | A kind of stroke lines global clustering method based on drawing time |
CN108595539A (en) * | 2018-04-04 | 2018-09-28 | 烟台海颐软件股份有限公司 | A kind of recognition methods of trace analogical object and system based on big data |
CN110059141A (en) * | 2019-04-22 | 2019-07-26 | 珠海网博信息科技股份有限公司 | A method of relationship analysis is carried out to different acquisition feature by log track |
CN110309383A (en) * | 2019-06-17 | 2019-10-08 | 武汉科技大学 | Ship trajectory clustering analysis method based on improved DBSCAN algorithm |
CN111193742A (en) * | 2019-12-31 | 2020-05-22 | 广东电网有限责任公司 | D-S evidence theory-based power communication network anomaly detection method |
CN111291278A (en) * | 2020-01-16 | 2020-06-16 | 深圳市前海随手数据服务有限公司 | Method and device for calculating track similarity, storage medium and terminal |
CN111328403A (en) * | 2018-10-16 | 2020-06-23 | 华为技术有限公司 | Improved trajectory matching based on quality indicators allowed using weighted confidence values |
CN111523765A (en) * | 2020-03-25 | 2020-08-11 | 平安科技(深圳)有限公司 | Material demand analysis method, equipment, device and readable storage medium |
CN112101132A (en) * | 2020-08-24 | 2020-12-18 | 西北工业大学 | Traffic condition prediction method based on graph embedding model and metric learning |
CN112541646A (en) * | 2019-09-20 | 2021-03-23 | 杭州海康威视数字技术股份有限公司 | Periodic behavior analysis method and device |
CN112561948A (en) * | 2020-12-22 | 2021-03-26 | 中国联合网络通信集团有限公司 | Method, device and storage medium for recognizing accompanying track based on space-time track |
CN113033615A (en) * | 2021-03-01 | 2021-06-25 | 电子科技大学 | Radar signal target real-time association method based on online micro-cluster clustering |
CN113128282A (en) * | 2019-12-31 | 2021-07-16 | 深圳云天励飞技术有限公司 | Crowd category dividing method and device and terminal |
CN113128572A (en) * | 2021-03-30 | 2021-07-16 | 西安理工大学 | Exercise prescription validity range calculation method based on probability distribution |
CN113158415A (en) * | 2021-02-23 | 2021-07-23 | 电子科技大学长三角研究院(衢州) | Vehicle track similarity evaluation method based on error analysis |
CN113408640A (en) * | 2021-06-30 | 2021-09-17 | 电子科技大学 | Moving object space-time trajectory clustering method considering multidimensional semantics |
CN113487865A (en) * | 2021-07-02 | 2021-10-08 | 江西锦路科技开发有限公司 | System and method for acquiring information of vehicles running on highway |
CN113704371A (en) * | 2021-07-16 | 2021-11-26 | 重庆工商大学 | Method for adaptively detecting and dividing sub-regions in geographic information network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100923723B1 (en) * | 2007-03-16 | 2009-10-27 | 제주대학교 산학협력단 | Method for clustering similar trajectories of moving objects in road network databases |
CN102880719A (en) * | 2012-10-16 | 2013-01-16 | 四川大学 | User trajectory similarity mining method for location-based social network |
CN103914563A (en) * | 2014-04-18 | 2014-07-09 | 中国科学院上海微系统与信息技术研究所 | Pattern mining method for spatio-temporal track |
CN106407519A (en) * | 2016-08-31 | 2017-02-15 | 浙江大学 | Modeling method for crowd moving rule |
WO2017070160A1 (en) * | 2015-10-20 | 2017-04-27 | Georgetown University | Systems and methods for in silico drug discovery |
CN106778876A (en) * | 2016-12-21 | 2017-05-31 | 广州杰赛科技股份有限公司 | User classification method and system based on mobile subscriber track similitude |
-
2017
- 2017-09-19 CN CN201710847477.7A patent/CN107679558B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100923723B1 (en) * | 2007-03-16 | 2009-10-27 | 제주대학교 산학협력단 | Method for clustering similar trajectories of moving objects in road network databases |
CN102880719A (en) * | 2012-10-16 | 2013-01-16 | 四川大学 | User trajectory similarity mining method for location-based social network |
CN103914563A (en) * | 2014-04-18 | 2014-07-09 | 中国科学院上海微系统与信息技术研究所 | Pattern mining method for spatio-temporal track |
WO2017070160A1 (en) * | 2015-10-20 | 2017-04-27 | Georgetown University | Systems and methods for in silico drug discovery |
CN106407519A (en) * | 2016-08-31 | 2017-02-15 | 浙江大学 | Modeling method for crowd moving rule |
CN106778876A (en) * | 2016-12-21 | 2017-05-31 | 广州杰赛科技股份有限公司 | User classification method and system based on mobile subscriber track similitude |
Non-Patent Citations (2)
Title |
---|
GAO, LL等: "Learning in high-dimensional multimedia data: the state of the art", 《MULTIMEDIA SYSTEMS》 * |
刘松灵: "基于度量学习的轨迹聚类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108537254A (en) * | 2018-03-23 | 2018-09-14 | 浙江工业大学 | A kind of stroke lines global clustering method based on drawing time |
CN108595539A (en) * | 2018-04-04 | 2018-09-28 | 烟台海颐软件股份有限公司 | A kind of recognition methods of trace analogical object and system based on big data |
CN111328403B (en) * | 2018-10-16 | 2023-09-29 | 华为技术有限公司 | Trajectory matching based on quality index improvement allowed using weighted confidence values |
CN111328403A (en) * | 2018-10-16 | 2020-06-23 | 华为技术有限公司 | Improved trajectory matching based on quality indicators allowed using weighted confidence values |
US11889382B2 (en) | 2018-10-16 | 2024-01-30 | Huawei Technologies Co., Ltd. | Trajectory matching based on use of quality indicators empowered by weighted confidence values |
CN110059141A (en) * | 2019-04-22 | 2019-07-26 | 珠海网博信息科技股份有限公司 | A method of relationship analysis is carried out to different acquisition feature by log track |
CN110309383A (en) * | 2019-06-17 | 2019-10-08 | 武汉科技大学 | Ship trajectory clustering analysis method based on improved DBSCAN algorithm |
CN110309383B (en) * | 2019-06-17 | 2021-07-13 | 武汉科技大学 | Ship track clustering analysis method based on improved DBSCAN algorithm |
CN112541646B (en) * | 2019-09-20 | 2024-03-26 | 杭州海康威视数字技术股份有限公司 | Periodic behavior analysis method and device |
CN112541646A (en) * | 2019-09-20 | 2021-03-23 | 杭州海康威视数字技术股份有限公司 | Periodic behavior analysis method and device |
CN111193742A (en) * | 2019-12-31 | 2020-05-22 | 广东电网有限责任公司 | D-S evidence theory-based power communication network anomaly detection method |
CN113128282A (en) * | 2019-12-31 | 2021-07-16 | 深圳云天励飞技术有限公司 | Crowd category dividing method and device and terminal |
CN111291278B (en) * | 2020-01-16 | 2024-01-12 | 深圳市卡牛科技有限公司 | Track similarity calculation method and device, storage medium and terminal |
CN111291278A (en) * | 2020-01-16 | 2020-06-16 | 深圳市前海随手数据服务有限公司 | Method and device for calculating track similarity, storage medium and terminal |
CN111523765B (en) * | 2020-03-25 | 2024-03-22 | 平安科技(深圳)有限公司 | Material demand analysis method, equipment, device and readable storage medium |
CN111523765A (en) * | 2020-03-25 | 2020-08-11 | 平安科技(深圳)有限公司 | Material demand analysis method, equipment, device and readable storage medium |
CN112101132A (en) * | 2020-08-24 | 2020-12-18 | 西北工业大学 | Traffic condition prediction method based on graph embedding model and metric learning |
CN112561948A (en) * | 2020-12-22 | 2021-03-26 | 中国联合网络通信集团有限公司 | Method, device and storage medium for recognizing accompanying track based on space-time track |
CN112561948B (en) * | 2020-12-22 | 2023-11-21 | 中国联合网络通信集团有限公司 | Space-time trajectory-based accompanying trajectory recognition method, device and storage medium |
CN113158415B (en) * | 2021-02-23 | 2023-09-08 | 电子科技大学长三角研究院(衢州) | Vehicle track similarity evaluation method based on error analysis |
CN113158415A (en) * | 2021-02-23 | 2021-07-23 | 电子科技大学长三角研究院(衢州) | Vehicle track similarity evaluation method based on error analysis |
CN113033615B (en) * | 2021-03-01 | 2022-06-07 | 电子科技大学 | Radar signal target real-time association method based on online micro-cluster clustering |
CN113033615A (en) * | 2021-03-01 | 2021-06-25 | 电子科技大学 | Radar signal target real-time association method based on online micro-cluster clustering |
CN113128572B (en) * | 2021-03-30 | 2024-03-19 | 西安理工大学 | Motion prescription validity range calculating method based on probability distribution |
CN113128572A (en) * | 2021-03-30 | 2021-07-16 | 西安理工大学 | Exercise prescription validity range calculation method based on probability distribution |
CN113408640A (en) * | 2021-06-30 | 2021-09-17 | 电子科技大学 | Moving object space-time trajectory clustering method considering multidimensional semantics |
CN113487865B (en) * | 2021-07-02 | 2022-07-22 | 江西锦路科技开发有限公司 | System and method for acquiring information of vehicles running on highway |
CN113487865A (en) * | 2021-07-02 | 2021-10-08 | 江西锦路科技开发有限公司 | System and method for acquiring information of vehicles running on highway |
CN113704371A (en) * | 2021-07-16 | 2021-11-26 | 重庆工商大学 | Method for adaptively detecting and dividing sub-regions in geographic information network |
Also Published As
Publication number | Publication date |
---|---|
CN107679558B (en) | 2019-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107679558A (en) | A kind of user trajectory method for measuring similarity based on metric learning | |
EP3241370B1 (en) | Analyzing semantic places and related data from a plurality of location data reports | |
Soh et al. | Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations | |
CN108268597B (en) | Moving target activity probability map construction and behavior intention identification method | |
US10545247B2 (en) | Computerized traffic speed measurement using sparse data | |
CN106931974B (en) | Method for calculating personal commuting distance based on mobile terminal GPS positioning data record | |
CN106851571B (en) | Decision tree-based rapid KNN indoor WiFi positioning method | |
CN105045858A (en) | Voting based taxi passenger-carrying point recommendation method | |
CN110598917B (en) | Destination prediction method, system and storage medium based on path track | |
Devogele et al. | Optimized discrete fréchet distance between trajectories | |
CN111475746B (en) | Point-of-interest mining method, device, computer equipment and storage medium | |
CN104679810A (en) | Computing Device For Generating Profiles Based On Mobile Device Data | |
CN113590936A (en) | Information pushing method and device | |
Abdullah et al. | Machine learning algorithm for wireless indoor localization | |
Liu et al. | CTSLoc: An indoor localization method based on CNN by using time-series RSSI | |
CN112381078B (en) | Elevated-based road identification method, elevated-based road identification device, computer equipment and storage medium | |
Jia et al. | A fingerprint-based localization algorithm based on LSTM and data expansion method for sparse samples | |
CN108574927B (en) | Mobile terminal positioning method and device | |
Yu et al. | Sparse reconstruction with spatial structures to automatically determine neighbors | |
Dewan et al. | Som-tc: Self-organizing map for hierarchical trajectory clustering | |
Chandio et al. | Towards adaptable and tunable cloud-based map-matching strategy for GPS trajectories | |
Dutta et al. | CLUSTMOSA: Clustering for GPS trajectory data based on multi-objective simulated annealing to develop mobility application | |
Lyu et al. | Movement-aware map construction | |
Li et al. | gsstSIM: A high‐performance and synchronized similarity analysis method of spatiotemporal trajectory based on grid model representation | |
CN112434228B (en) | Method for predicting track position of moving target |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190924 |