CN107679558A - A kind of user trajectory method for measuring similarity based on metric learning - Google Patents

A kind of user trajectory method for measuring similarity based on metric learning Download PDF

Info

Publication number
CN107679558A
CN107679558A CN201710847477.7A CN201710847477A CN107679558A CN 107679558 A CN107679558 A CN 107679558A CN 201710847477 A CN201710847477 A CN 201710847477A CN 107679558 A CN107679558 A CN 107679558A
Authority
CN
China
Prior art keywords
user
matrix
similarity
track
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710847477.7A
Other languages
Chinese (zh)
Other versions
CN107679558B (en
Inventor
邵俊明
刘松灵
杨勤丽
于忠靖
朱庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201710847477.7A priority Critical patent/CN107679558B/en
Publication of CN107679558A publication Critical patent/CN107679558A/en
Application granted granted Critical
Publication of CN107679558B publication Critical patent/CN107679558B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of user trajectory method for measuring similarity based on metric learning, and the similitude between user trajectory is obtained by calculating user trajectory distance with reference to measurement learning method.Firstly generate the place temporal joint probability distribution matrix of each user.The initial similarity between the different user track based on user distribution matrix is calculated followed by KL divergences, and the initial category (user trajectory being divided into different classifications according to similarity matrix, calculate with convenient follow-up similarity measurements flow function) for the method generation user trajectory for passing through spectral clustering.Finally on the basis of initial similarity matrix S and Track Initiation category set C, with reference to measurement learning art, obtain the similitude for possessing user preference pattern and the user trajectory with identical dimensional and characterize vector sum metric function number, on this basis, the distance between user trajectory is calculated, obtains user trajectory similitude.

Description

User track similarity measurement method based on measurement learning
Technical Field
The invention belongs to the technical field of track similarity measurement, and particularly relates to a user track similarity measurement method based on metric learning.
Background
With the development of positioning satellites, personal positioning devices and wireless networks, user trajectory data has shown explosive growth. In consideration of the potential social value of user trajectory data mining, the technical field is more and more concerned by all circles, especially the fields of computer science, geographic information science, social science and the like. Meanwhile, in the industrial field, analysis and mining of user trajectory data create huge commercial values for various fields. For example, a traffic management department can analyze traffic flow data to avoid the urban congestion phenomenon of travel peak and solve the problems of similar urban traffic and urban environment; an enterprise relating to user travel business can solve the problems of user travel path planning, neighbor user recommendation, customer location optimization and the like by carrying out data mining on user trajectory data and establishing an effective model.
In a user trajectory data mining algorithm, measurement of user trajectory similarity is often involved, such as trajectory clustering, trajectory prediction, abnormal trajectory detection and the like. The user track similarity measurement is a core technology in user track data mining and has important theoretical and application values. The current user trajectory similarity measurement is mainly divided into measurement in a space-time space and a feature space.
In the space-time space, because the user trajectory has a time characteristic, the similarity measurement method usually extends the similarity measurement method of a time sequence from a time-attribute sequence to a three-dimensional space-time sequence of a time-space-attribute, such as a maximum public subsequence, a dynamic time warping, a minimum edit distance and the like. The common drawback of this method is that all coordinates and time information in the user track are considered equally, and some key location or time information existing in the user track is ignored. Some similarity measurement methods only consider coordinate information, and the user trajectory has space-time tight coupling, so that the similarity of the user trajectory cannot be effectively measured.
In the feature space, the basic idea is to extract some inherent features of the user trajectory, such as the speed, curvature, length, starting point, etc. of the user trajectory. The method relies on expert knowledge, so that great redundancy exists among the characteristics easily, and key division information between the user track and the user track is lost.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a user trajectory similarity measurement method based on metric learning, which is used for generating a similarity representation of user trajectory global property, user preference pattern and consistent feature dimension.
In order to achieve the above object, the present invention provides a user trajectory similarity measurement method based on metric learning, which is characterized by comprising the following steps:
(1) User mobile data collection and cleaning
Collecting user mobile data, and sorting and cleaning the user mobile data according to analysis requirements: extracting time position information of a key Point hidden in user mobile data by adopting a key Point information extraction technology (namely POI (Point of Interest) to obtain a track representation of a user based on the key Point;
(2) User location-time joint distribution calculation
Firstly, clustering the time and position information of key places of all users by using a clustering algorithm (such as DBSCAN) to obtain a hot spot area, then obtaining the key places accessed by all users at high frequency by combining the position information of the known key places of the city, and extracting P key places which are ranked at the front, namely more accessed key places as places accessed by a user track.
The activity time of the whole user is dynamically divided according to the distribution of the activity time on the time dimension to obtain T time periods, and each user is divided into access places and time periods based on user tracksIts site-time joint probability distribution matrixi =1,2, \ 8230;. M, m is the number of users, and the matrix directly reflects the distribution of each user track in the space dimension and the time dimension;
(3) Obtaining the initial similarity matrix of the user track
Location-time joint probability distribution matrix based on each userCalculating an initial similarity matrix S between the tracks of the user:
wherein the initial similarity matrix S is a symmetric similarity matrix S i,j Representing the similarity between user i and user j, is defined as follows:
wherein, sigma is a function width parameter, determined according to specific implementation conditions, and KL divergence d i,j Is defined as:
wherein w i (p, t) is a site-time joint probability distribution matrixIn time period t, the probability of the user trajectory to appear towards visiting location p, user i, w j (p, t) is a site-time joint probability distribution matrixIn the time period t, the probability that the user track tends to the visiting place pis appears;
(4) Initial category acquisition of trajectory
Summing each row of the initial similarity matrix S, sequentially using the sum as the diagonal elements of the diagonal matrix D according to the row correspondence, then calculating the Laplace matrix L = D-S, and solving the first k minimum eigenvalues of the Laplace matrix L through SVD (singular value decomposition)And corresponding feature vectors
Constructing a matrix M: each feature vector is combinedSequentially serving as a column to form a matrix M with M rows and k columns, wherein each row of the matrix M corresponds to each row in the original initial similarity matrix S, namely a k-dimensional representation of a user track;
finally, on the K-dimensional representation, obtaining category label information of each user track in a K-Means mode to form a track initial category set C;
(5) Trajectory similarity metric learning
Respectively corresponding the initial similarity matrix S and the initial track category set C to two elements in metric learning, namely: the similarity matrix and the marginal information are processed by a metric learning method to obtain a metric function A after learning optimization, and meanwhile, similarity characterization vectors of user tracks in the same feature space can be obtained
Finally, the similarity characterization vector of the user track is combinedAnd measuring a function number A, and calculating by using a Mahalanobis distance algorithm to obtain the distance between user tracks:
distance between user trajectories dist (sv) i ,sv j ) The smaller the similarity, the larger the similarity, and vice versa.
The object of the invention is thus achieved.
The user track similarity measurement method based on measurement learning combines the measurement learning method to obtain the similarity between the user tracks by calculating the user track distance. Firstly, collecting, arranging and cleaning user mobile data, and then extracting P user track in the user mobile data to approach to an access place (hereinafter referred to as an access place for short) by using a clustering method; meanwhile, the whole user activity time is dynamically divided according to the distribution of the user activity time in the time dimension to obtain T time periods, so that a location-time combined probability distribution matrix (hereinafter referred to as a user distribution matrix) of each user is generated. And then, calculating initial similarity among different user tracks based on the user distribution matrix through KL divergence, and generating initial categories of the user tracks through a spectral clustering method (namely, dividing the user tracks into different categories according to the similarity matrix so as to facilitate subsequent similarity measurement function calculation). And finally, on the basis of the initial similarity matrix S and the track initial category set C, combining with a metric learning technology to obtain similarity characteristic vectors of user tracks which have the user preference mode and have the same dimensionalityAnd measuring a function number A, and on the basis, calculating the distance between the user tracks to obtain the similarity of the user tracks.
In the invention, aiming at the problems of non-uniform track spatio-temporal scale and effective feature extraction of the traditional method in the aspect of space-time space, the effectiveness of the user track similarity measurement is improved by obtaining the spatio-temporal distribution feature of the user track and adopting a measurement learning method, so that the purposes of similarity and dissimilarity are achieved. The method can be widely applied to various user trajectory data mining technologies.
Drawings
FIG. 1 is a flow chart of an embodiment of a method for measuring track similarity based on metric learning according to the present invention;
FIG. 2 is a schematic diagram of time period dynamic partitioning in the present invention, wherein T1-T4 represent four time periods, respectively having a track coordinate probability distribution partitioned according to a fixed time period (1 hour) and a track probability distribution subjected to dynamic partitioning;
FIG. 3 is a system diagram of a user trajectory similarity measure method according to an embodiment of the present invention;
fig. 4 is a schematic diagram of the principle of the metric learning technique in the present invention, in which (a) represents the distance between two class samples before the metric learning, and (b) represents the distance between two class samples after the metric learning.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the main content of the present invention.
Fig. 1 is a flowchart of a specific embodiment of the method for measuring track similarity based on metric learning according to the present invention.
In this embodiment, as shown in fig. 1, the trajectory similarity measurement method based on metric learning of the present invention includes the following steps:
step S1: user mobile data collection and cleaning
User movement data generally contains a user number and trajectory information, where the trajectory information is an ordered sequence of binary groups of < location coordinates, timestamps >. For common GPS (global positioning system) trajectories, such as: vehicle positioning GPS tracks, personal cell phone GPS positioning tracks, etc., are coordinate plus time series generated with higher frequency acquisition, and therefore need to be sorted and cleaned.
In the GPS data, since redundant information is excessive in the GPS data, it is necessary to extract time and position information of a key (important) place having a temporal-spatial distribution characteristic. The invention adopts the traditional POI (Point of Interest) extraction method to extract the position information of the key (important) points hidden in the GPS data and extract the time distribution information of the corresponding positions to obtain the time position information of the key points of the user track, namely the track representation of the user based on the key points. In the specific implementation process, other similar extraction methods or expert knowledge may also be used to extract the time and position information of the key location in the position data.
Step S2: user location-time joint distribution computation
In the invention, on the aspect of position POI extraction, a density-based DBSCAN clustering method is adopted to obtain a hot spot area accessed by all user tracks, and the hot spot area implicitly indicates that the area has higher probability of being accessed aiming at all users. And combining the position information of the known key places of the city to obtain the key places visited by all users at high frequency, and extracting P key places which are ranked at the top, namely more visited as places which tend to be visited by the user track. Therefore, the user track length can be obviously reduced, and the original motion characteristic mode of the user is reserved.
And dynamically dividing the whole user activity time according to the distribution of the whole user activity time in the time dimension to obtain T time periods. On the time period division, statistical analysis is performed on the time dimension of the user trajectory, resulting in a time probability distribution as shown in fig. 2. The invention solves the time interval division from the original time interval division (such as 24 hours a day) to the dynamic time interval division through the difference of the occurrence frequency of the track points of different time slices. For example, assume that a user trajectory data set has the following distribution in the time dimension:
< (0 point to 1 point, w = 0.3), (1 point to 2 points, w = 0.05),
(3 point to 4 point, w = 0.35), (4 point to 5 point, w = 0.3) >, and
wherein w is the probability of the user appearing in the time period, and can be divided according to the dynamic time period:
wherein p is t The probability of the occurrence of the user in the time period T, delta is the span of the time period which is used for controlling the division parameter, the value is between 0 and 1 and can be automatically adjusted, and T t Representing a time period t. In the present embodiment, δ =0.1 is taken, and a new time distribution as shown in fig. 2 is obtained. It can be seen that most users appear between 3-5 points and very few users appear between 1-2 points before dynamic time division is performed, which results in that when a user distribution matrix is calculated, the elements of the users between 1-2 points approach to zero, and if the time period is large, the whole user distribution matrix becomes sparse. After dynamic time division, the span of the time period for which the user is sparse changes from the original fixed width (1 hour) to delta/p t Equal to extending the time period of the original user back and forth by 0.5 × (δ/p) at the same time t -1), more visited sites are included, thus solving the problem of sparseness for the user. In this example, it is assumed that after the dynamic partitioning process, the partitioning of the access point in the time dimension becomes:
the probability of the track coordinate appearing between the original 1 point and the original 2 points is obviously improved.
The spatial and temporal features are obtained by the method, and a user distribution matrix of each user is generated:
and step S3: user trajectory initial similarity matrix acquisition
Aiming at the user distribution matrix obtained in the step S2The invention adopts KL divergence mode to calculate the initial similarity matrix S of the user track:
wherein the initial similarity matrix S is a symmetric similarity matrix S i,j Representing the similarity between user i and user j, defined as follows:
wherein, sigma is a function width parameter, determined according to specific implementation conditions, and KL divergence d i,j Is defined as:
wherein w i (p, t) is a site-time joint probability distribution matrixIn time period t, the probability of the user trajectory to appear towards visiting location p, user i, w j (p, t) is a site-time joint probability distribution matrixIn a period of timet, probability that the user track tends to appear at the visiting place pdusej;
it can be considered that a new characterization of the user trajectory in the similarity space is obtained, that is, each row in the matrix corresponds to a user trajectory, and the subsequent metric learning is also based on the similarity matrix. Intuitively understand that if the similarity of two tracks and the similarity of other tracks are the same, the two tracks can be considered to be similar; if two tracks have significant difference from the similarity of the rest tracks, the two tracks are considered to be dissimilar.
And step S4: trajectory initial class acquisition
Once the initial similarity matrix S of all user track data is obtained, clustering division is carried out on the user tracks on the similar space by using a spectral clustering method, and category label information of each user track is obtained.
In this embodiment, the specific method of cluster partitioning is as follows:
4.1 A) and carrying out summation operation on each row of the initial similarity matrix S of all user track data to obtain each row of the similarity matrix of the track data set and d i
The row sums are then taken as the elements on the diagonal of the diagonal matrix D. The physical meaning of the diagonal matrix can be interpreted as a sum of similar weights for each user trajectory and other user trajectories similar thereto, and then we calculate:
L=D-S
wherein L is a laplace matrix. After the Laplace matrix L is obtained through calculation, matrix decomposition is carried out on the Laplace matrix L to sequentially output the first k minimum eigenvaluesAnd corresponding feature vectors
4.2 K), the number of initial categories of the track is selected
Before labeling a user track with labels obtained by clustering division, the number k of specific category labels needs to be initialized. From a practical point of view, it is difficult for one to determine the magnitude of the k value a priori. The invention provides a method for selecting a k value based on a minimum description length so as to perform clustering operation on a user track.
Specifically, n different k values are initialized, and for different k values, a description length of k based on model parameters can be calculated for the similarity matrix of the track data set. From the principle of minimum description length, we can know that when the used k minimizes the value of all similarity vectors of model coding observation data, namely, the trajectory data set, we can consider the current k value as the optimal choice.
Suppose is provided with k 1 ,k 2 Two parameter choices, we calculate:
wherein θ is k 1 Or k 2 ,|C i I is the number of samples contained in the ith cluster, s j Is the jth row, μ, in the trace similarity matrix i Is the mean of the ith cluster. dist is a distance function, in this example using the Euclidean distance metric. If there is Loss 1 >Loss 2 Then, the selection parameter k is stated 2 All data can be better encoded so we will choose this value as the label number parameter for the model.
4.3 ) and K-Means clustering to obtain user track category label information
Constructing a matrix M: each feature vector is combinedSequentially forming a matrix M with M rows and k columns as a column, wherein each row of the matrix M corresponds to each row in the original initial similarity matrix SIs a k-dimensional representation of the user trajectory; and finally, on the K-dimensional representation, obtaining the category label information of each user track in a K-Means mode to form a track initial category set C.
Step S5: trajectory similarity metric learning
In the invention, the representation with uniform characteristic dimension between user tracks and a more accurate and robust similarity measurement function are obtained by using measurement learning.
5.1 Definitions involved in metric learning)
5.1.1 In a specific metric learning task, there are often three sets of sample pairs, which are necessarily connected, necessarily unconnected, and similar difference sets, and are exemplified below for the convenience of the reader of the present invention to understand better;
assuming that there is a sample set < a, B, C >, and it is known that a, B are similar to each other to a high degree and B, C are similar to each other to a low degree, there are a necessarily connected set S = { (a, B) }, a necessarily unconnected set D = { (B, C) }, and a similarity difference set Diff = { ((a, B), C) }. Through the samples in each set, the similarity or the difference between the samples can be obtained, and then the samples are used as constraint conditions to be added into the subsequent learning process. In this example, we have already obtained the category label information of each user track through S4, and a necessarily connected set and a necessarily unconnected set can be established in sequence for the basis. The criterion for allocating the sample pairs to the sets is based on adding the sample pairs to the sets which need to be connected if the two user track label information are consistent, and adding the sample pairs to the sets which need not be connected if the user track label information is different.
5.1.2 Known by a euclidean distance function), known by
The extended distance metric function, namely mahalanobis distance, can be obtained:
wherein the transformation matrix A ∈ R d×d And must be a semi-positive definite matrix. It can be seen that when a = I, the mahalanobis distance degenerates to the euclidean distance. When the constraint a is a diagonal matrix, we learn a metric function with different weights in each feature dimension. If A is decomposed to obtain A 1/2 Then, multiplying a certain sample in the original space, namely, equivalently, transforming each dimension of the sample to obtain the representation of the sample on the new feature space.
5.2 In the same class), although we have embedded the user trajectory into a feature space with one dimension being consistent through the previous steps, there is a bias in the distance measure such that the distance between samples that originally belong to the same class is larger than the distance between samples that belong to different classes. At this time, we have obtained an initial trajectory similarity metric and a constructed constraint set as shown in fig. 3, and using the category information or constraint information, and on the basis of the initial similarity matrix, we perform one of the most critical steps of the present invention, i.e., iteratively optimize the metric function.
The result that the present invention needs to achieve, i.e. learning a new metric function a, is determined as follows. To obtain a new measurement function through calculation and combine with user similarity constraint information, an optimization model is constructed
A≥1
Wherein A ≧ 1 represents that A is a semi-positive definite matrix. And solving the optimization problem by adopting an optimization method of gradient descent and iterative mapping, and finally returning a similarity measurement function matrix A.
As shown in fig. 4, assuming that different graphs respectively represent different user trajectory categories, the physical meaning of the model is that, for two trajectories belonging to the set that must be connected, the distance between them is minimized (here, the characterization in the similarity space is used), i.e., the distance between circles and the distance between triangles in the graph are shortened; while maximizing the user trajectories belonging to the necessarily disjoint set, i.e. enlarging the distance between the circle and the triangle. Therefore, the purpose of optimizing the distance measurement function (corresponding to the distance measurement of the shadow sample and other samples successfully corrected in the diagram) can be achieved by carrying out corresponding weight change on different dimensions in the measurement space.
Specifically, in the present invention, the initial similarity matrix S and the initial trajectory category set C are respectively corresponding to two elements in metric learning: the similarity matrix and the marginal information are processed by using a metric learning method to the whole user track set, so that a metric function A after learning optimization is obtained, and similarity characterization vectors of all user tracks in the same feature space can be obtained
Finally, the similarity characterization vector of the user track is combinedAnd measuring a function number A, and calculating by using a Mahalanobis distance algorithm to obtain the distance between user tracks:
distance between user trajectories dist (sv) i ,sv j ) The smaller the similarity, the larger the similarity, and vice versa.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all inventions utilizing the inventive concept are protected.

Claims (2)

1. A user trajectory similarity measurement method based on metric learning is characterized by comprising the following steps:
(1) User mobile data collection and cleaning
Collecting user mobile data, and sorting and cleaning the user mobile data according to analysis requirements: extracting time position information of the key points hidden in the user mobile data by adopting a key Point information extraction technology (namely POI (Point of Interest), so as to obtain a track representation of the user based on the key points;
(2) User location-time joint distribution calculation
Firstly, clustering key location time and position information of all users by using a clustering algorithm (such as DBSCAN) to obtain a hot spot area, then obtaining key locations accessed by all users at high frequency by combining position information of known key locations of cities, and extracting P key locations ranked ahead, namely more accessed key locations as locations that user tracks tend to access.
The activity time of the whole user is dynamically divided according to the distribution of the activity time in the time dimension to obtain T time periods, and a location-time joint probability distribution matrix of each user is obtained based on the user track trend access location and time period divisioni =1,2, \ 8230;. M, m is the number of users, and the matrix directly reflects the distribution of each user track in the space dimension and the time dimension;
(3) Obtaining the initial similarity matrix of the user track
Location-time joint probability distribution matrix based on each userCalculating an initial similarity matrix S between the tracks of the user:
wherein the initial similarity matrix S is a symmetric similarity matrix S i,j Representing the similarity between user i and user j, is defined as follows:
wherein, sigma is a function width parameter, determined according to specific implementation conditions, and KL divergence d i,j Is defined as:
wherein w i (p, t) is a site-time joint probability distribution matrixIn time period t, probability of occurrence of user i when user track tends to visit place p, w j (p, t) is a site-time joint probability distribution matrixIn the time period t, the probability that the user track tends to the visiting place pis appears;
(4) Initial trajectory category acquisition
Summing each row of the initial similarity matrix S, sequentially using the sum as elements on diagonal lines of a diagonal matrix D according to row correspondence, then calculating a Laplace matrix L = D-S, and solving the first k minimum eigenvalues of the Laplace matrix L through SVD (singular value decomposition)And corresponding feature vectors
Constructing a matrix M: each feature vector is combinedSequentially serving as a column to form a matrix M with M rows and k columns, wherein each row of the matrix M corresponds to each row in the original initial similarity matrix S, namely a k-dimensional representation of a user track;
finally, on the K-dimensional representation, obtaining the category label information of each user track in a K-Means mode to form a track initial category set C;
(5) And learning track similarity measurement
Respectively corresponding the initial similarity matrix S and the initial track category set C to two elements in metric learning, namely: the similarity matrix and the marginal information are processed by a metric learning method to obtain a metric function A after learning optimization, and meanwhile, similarity characterization vectors of user tracks in the same feature space can be obtained
Finally, the similarity characterization vector of the user track is combinedAnd measuring a function number A, and calculating by using a Mahalanobis distance algorithm to obtain the distance between user tracks:
distance between user trajectories dist (sv) i ,sv j ) The smaller, the greater the similarity, the inverseThe smaller the similarity.
2. The method according to claim, wherein the dynamic partitioning is:
wherein p is t The probability of the occurrence of the user in the time period T, delta is the span of the time period which is used for controlling the division parameter, the value is between 0 and 1 and can be automatically adjusted, and T t Representing a time period t.
CN201710847477.7A 2017-09-19 2017-09-19 A kind of user trajectory method for measuring similarity based on metric learning Expired - Fee Related CN107679558B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710847477.7A CN107679558B (en) 2017-09-19 2017-09-19 A kind of user trajectory method for measuring similarity based on metric learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710847477.7A CN107679558B (en) 2017-09-19 2017-09-19 A kind of user trajectory method for measuring similarity based on metric learning

Publications (2)

Publication Number Publication Date
CN107679558A true CN107679558A (en) 2018-02-09
CN107679558B CN107679558B (en) 2019-09-24

Family

ID=61137557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710847477.7A Expired - Fee Related CN107679558B (en) 2017-09-19 2017-09-19 A kind of user trajectory method for measuring similarity based on metric learning

Country Status (1)

Country Link
CN (1) CN107679558B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537254A (en) * 2018-03-23 2018-09-14 浙江工业大学 A kind of stroke lines global clustering method based on drawing time
CN108595539A (en) * 2018-04-04 2018-09-28 烟台海颐软件股份有限公司 A kind of recognition methods of trace analogical object and system based on big data
CN110059141A (en) * 2019-04-22 2019-07-26 珠海网博信息科技股份有限公司 A method of relationship analysis is carried out to different acquisition feature by log track
CN110309383A (en) * 2019-06-17 2019-10-08 武汉科技大学 Ship trajectory clustering analysis method based on improved DBSCAN algorithm
CN111193742A (en) * 2019-12-31 2020-05-22 广东电网有限责任公司 D-S evidence theory-based power communication network anomaly detection method
CN111291278A (en) * 2020-01-16 2020-06-16 深圳市前海随手数据服务有限公司 Method and device for calculating track similarity, storage medium and terminal
CN111328403A (en) * 2018-10-16 2020-06-23 华为技术有限公司 Improved trajectory matching based on quality indicators allowed using weighted confidence values
CN111523765A (en) * 2020-03-25 2020-08-11 平安科技(深圳)有限公司 Material demand analysis method, equipment, device and readable storage medium
CN112101132A (en) * 2020-08-24 2020-12-18 西北工业大学 Traffic condition prediction method based on graph embedding model and metric learning
CN112541646A (en) * 2019-09-20 2021-03-23 杭州海康威视数字技术股份有限公司 Periodic behavior analysis method and device
CN112561948A (en) * 2020-12-22 2021-03-26 中国联合网络通信集团有限公司 Method, device and storage medium for recognizing accompanying track based on space-time track
CN113033615A (en) * 2021-03-01 2021-06-25 电子科技大学 Radar signal target real-time association method based on online micro-cluster clustering
CN113128282A (en) * 2019-12-31 2021-07-16 深圳云天励飞技术有限公司 Crowd category dividing method and device and terminal
CN113128572A (en) * 2021-03-30 2021-07-16 西安理工大学 Exercise prescription validity range calculation method based on probability distribution
CN113158415A (en) * 2021-02-23 2021-07-23 电子科技大学长三角研究院(衢州) Vehicle track similarity evaluation method based on error analysis
CN113408640A (en) * 2021-06-30 2021-09-17 电子科技大学 Moving object space-time trajectory clustering method considering multidimensional semantics
CN113487865A (en) * 2021-07-02 2021-10-08 江西锦路科技开发有限公司 System and method for acquiring information of vehicles running on highway
CN113704371A (en) * 2021-07-16 2021-11-26 重庆工商大学 Method for adaptively detecting and dividing sub-regions in geographic information network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100923723B1 (en) * 2007-03-16 2009-10-27 제주대학교 산학협력단 Method for clustering similar trajectories of moving objects in road network databases
CN102880719A (en) * 2012-10-16 2013-01-16 四川大学 User trajectory similarity mining method for location-based social network
CN103914563A (en) * 2014-04-18 2014-07-09 中国科学院上海微系统与信息技术研究所 Pattern mining method for spatio-temporal track
CN106407519A (en) * 2016-08-31 2017-02-15 浙江大学 Modeling method for crowd moving rule
WO2017070160A1 (en) * 2015-10-20 2017-04-27 Georgetown University Systems and methods for in silico drug discovery
CN106778876A (en) * 2016-12-21 2017-05-31 广州杰赛科技股份有限公司 User classification method and system based on mobile subscriber track similitude

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100923723B1 (en) * 2007-03-16 2009-10-27 제주대학교 산학협력단 Method for clustering similar trajectories of moving objects in road network databases
CN102880719A (en) * 2012-10-16 2013-01-16 四川大学 User trajectory similarity mining method for location-based social network
CN103914563A (en) * 2014-04-18 2014-07-09 中国科学院上海微系统与信息技术研究所 Pattern mining method for spatio-temporal track
WO2017070160A1 (en) * 2015-10-20 2017-04-27 Georgetown University Systems and methods for in silico drug discovery
CN106407519A (en) * 2016-08-31 2017-02-15 浙江大学 Modeling method for crowd moving rule
CN106778876A (en) * 2016-12-21 2017-05-31 广州杰赛科技股份有限公司 User classification method and system based on mobile subscriber track similitude

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GAO, LL等: "Learning in high-dimensional multimedia data: the state of the art", 《MULTIMEDIA SYSTEMS》 *
刘松灵: "基于度量学习的轨迹聚类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537254A (en) * 2018-03-23 2018-09-14 浙江工业大学 A kind of stroke lines global clustering method based on drawing time
CN108595539A (en) * 2018-04-04 2018-09-28 烟台海颐软件股份有限公司 A kind of recognition methods of trace analogical object and system based on big data
CN111328403B (en) * 2018-10-16 2023-09-29 华为技术有限公司 Trajectory matching based on quality index improvement allowed using weighted confidence values
CN111328403A (en) * 2018-10-16 2020-06-23 华为技术有限公司 Improved trajectory matching based on quality indicators allowed using weighted confidence values
US11889382B2 (en) 2018-10-16 2024-01-30 Huawei Technologies Co., Ltd. Trajectory matching based on use of quality indicators empowered by weighted confidence values
CN110059141A (en) * 2019-04-22 2019-07-26 珠海网博信息科技股份有限公司 A method of relationship analysis is carried out to different acquisition feature by log track
CN110309383A (en) * 2019-06-17 2019-10-08 武汉科技大学 Ship trajectory clustering analysis method based on improved DBSCAN algorithm
CN110309383B (en) * 2019-06-17 2021-07-13 武汉科技大学 Ship track clustering analysis method based on improved DBSCAN algorithm
CN112541646B (en) * 2019-09-20 2024-03-26 杭州海康威视数字技术股份有限公司 Periodic behavior analysis method and device
CN112541646A (en) * 2019-09-20 2021-03-23 杭州海康威视数字技术股份有限公司 Periodic behavior analysis method and device
CN111193742A (en) * 2019-12-31 2020-05-22 广东电网有限责任公司 D-S evidence theory-based power communication network anomaly detection method
CN113128282A (en) * 2019-12-31 2021-07-16 深圳云天励飞技术有限公司 Crowd category dividing method and device and terminal
CN111291278B (en) * 2020-01-16 2024-01-12 深圳市卡牛科技有限公司 Track similarity calculation method and device, storage medium and terminal
CN111291278A (en) * 2020-01-16 2020-06-16 深圳市前海随手数据服务有限公司 Method and device for calculating track similarity, storage medium and terminal
CN111523765B (en) * 2020-03-25 2024-03-22 平安科技(深圳)有限公司 Material demand analysis method, equipment, device and readable storage medium
CN111523765A (en) * 2020-03-25 2020-08-11 平安科技(深圳)有限公司 Material demand analysis method, equipment, device and readable storage medium
CN112101132A (en) * 2020-08-24 2020-12-18 西北工业大学 Traffic condition prediction method based on graph embedding model and metric learning
CN112561948A (en) * 2020-12-22 2021-03-26 中国联合网络通信集团有限公司 Method, device and storage medium for recognizing accompanying track based on space-time track
CN112561948B (en) * 2020-12-22 2023-11-21 中国联合网络通信集团有限公司 Space-time trajectory-based accompanying trajectory recognition method, device and storage medium
CN113158415B (en) * 2021-02-23 2023-09-08 电子科技大学长三角研究院(衢州) Vehicle track similarity evaluation method based on error analysis
CN113158415A (en) * 2021-02-23 2021-07-23 电子科技大学长三角研究院(衢州) Vehicle track similarity evaluation method based on error analysis
CN113033615B (en) * 2021-03-01 2022-06-07 电子科技大学 Radar signal target real-time association method based on online micro-cluster clustering
CN113033615A (en) * 2021-03-01 2021-06-25 电子科技大学 Radar signal target real-time association method based on online micro-cluster clustering
CN113128572B (en) * 2021-03-30 2024-03-19 西安理工大学 Motion prescription validity range calculating method based on probability distribution
CN113128572A (en) * 2021-03-30 2021-07-16 西安理工大学 Exercise prescription validity range calculation method based on probability distribution
CN113408640A (en) * 2021-06-30 2021-09-17 电子科技大学 Moving object space-time trajectory clustering method considering multidimensional semantics
CN113487865B (en) * 2021-07-02 2022-07-22 江西锦路科技开发有限公司 System and method for acquiring information of vehicles running on highway
CN113487865A (en) * 2021-07-02 2021-10-08 江西锦路科技开发有限公司 System and method for acquiring information of vehicles running on highway
CN113704371A (en) * 2021-07-16 2021-11-26 重庆工商大学 Method for adaptively detecting and dividing sub-regions in geographic information network

Also Published As

Publication number Publication date
CN107679558B (en) 2019-09-24

Similar Documents

Publication Publication Date Title
CN107679558A (en) A kind of user trajectory method for measuring similarity based on metric learning
EP3241370B1 (en) Analyzing semantic places and related data from a plurality of location data reports
Soh et al. Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations
CN108268597B (en) Moving target activity probability map construction and behavior intention identification method
US10545247B2 (en) Computerized traffic speed measurement using sparse data
CN106931974B (en) Method for calculating personal commuting distance based on mobile terminal GPS positioning data record
CN106851571B (en) Decision tree-based rapid KNN indoor WiFi positioning method
CN105045858A (en) Voting based taxi passenger-carrying point recommendation method
CN110598917B (en) Destination prediction method, system and storage medium based on path track
Devogele et al. Optimized discrete fréchet distance between trajectories
CN111475746B (en) Point-of-interest mining method, device, computer equipment and storage medium
CN104679810A (en) Computing Device For Generating Profiles Based On Mobile Device Data
CN113590936A (en) Information pushing method and device
Abdullah et al. Machine learning algorithm for wireless indoor localization
Liu et al. CTSLoc: An indoor localization method based on CNN by using time-series RSSI
CN112381078B (en) Elevated-based road identification method, elevated-based road identification device, computer equipment and storage medium
Jia et al. A fingerprint-based localization algorithm based on LSTM and data expansion method for sparse samples
CN108574927B (en) Mobile terminal positioning method and device
Yu et al. Sparse reconstruction with spatial structures to automatically determine neighbors
Dewan et al. Som-tc: Self-organizing map for hierarchical trajectory clustering
Chandio et al. Towards adaptable and tunable cloud-based map-matching strategy for GPS trajectories
Dutta et al. CLUSTMOSA: Clustering for GPS trajectory data based on multi-objective simulated annealing to develop mobility application
Lyu et al. Movement-aware map construction
Li et al. gsstSIM: A high‐performance and synchronized similarity analysis method of spatiotemporal trajectory based on grid model representation
CN112434228B (en) Method for predicting track position of moving target

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190924