CN111988744A

CN111988744A - Position prediction method based on user moving mode

Info

Publication number: CN111988744A
Application number: CN202010898332.1A
Authority: CN
Inventors: 苏畅; 严杨志; 谢显中
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2020-08-31
Filing date: 2020-08-31
Publication date: 2020-11-24
Anticipated expiration: 2040-08-31
Also published as: CN111988744B

Abstract

The invention relates to a position prediction method based on a user's movement pattern, and belongs to the field of machine learning. The methods are: using the Apriori algorithm to mine the individual movement patterns of each user, to find out the internal factors that affect the user's check-in; using the dynamic time warping algorithm DTW to calculate the similarity between the individual movement patterns of the users; clustering the individual movement patterns of the users Group the modes, get the central mode of each group, and find the external factors that affect the check-in; use the individual movement mode and the overall movement mode to train the Markov model; train the Markov chain model based on IMP and AMP to predict the user's Next position; consider the influence of external weather, create general weather characteristics; use Gaussian kernel function to calculate the similarity between the weather at the current location and the weather at other locations, and revise the predicted results; set evaluation standards and benchmark methods. The present invention makes the predicted results more realistic.

Description

A Location Prediction Method Based on User Movement Patterns

技术领域technical field

本发明属于机器学习领域，涉及一种基于用户移动模式的位置预测方法。The invention belongs to the field of machine learning, and relates to a position prediction method based on user movement patterns.

背景技术Background technique

随着移动终端的普及，人类的移动数据也更容易获得，基于位置的社交网络平台也收集了大量的用户签到数据，研究人类的移动规律成为了热点，研究人们的移动模式也成为了可能，其中，位置预测则更为普遍。通过位置预测，可以提前知道用户的移动偏好，也能了解人流的移动倾向，不仅可以对用户提供有针对性的服务，也给商家带来利益。现有的研究主要是通过用户的签到历史记录，分析用户的行为，找到用户的移动规律，进而预测地点。其中，大多数考虑的因素有时间、空间、社交等，主要针对用户的偏好，忽略了位置之间的联系。此外，大多数研究都是通过用户个人的移动模式来预测，若用户去到从未去过的地点，则没有可用的数据来训练模型；也有研究人员利用整体的签到数据来训练模型，这样模型就可以适用于所有用户的位置预测，然而，基于整体数据来进行预测过于粗粒度，若用户当前位于同一地点，则最终的预测结果都是相同的地点，与实际情况不符。With the popularization of mobile terminals, it is easier to obtain human movement data. Location-based social networking platforms have also collected a large amount of user check-in data. The study of human movement patterns has become a hot topic, and it has become possible to study people's movement patterns. Among them, location prediction is more common. Through location prediction, it is possible to know the user's mobile preference in advance, as well as the movement tendency of the flow of people, which can not only provide targeted services to users, but also bring benefits to businesses. The existing research mainly analyzes the user's behavior through the user's check-in history, finds the user's movement pattern, and then predicts the location. Among them, most of the factors considered are time, space, social, etc., which mainly focus on the user's preference, ignoring the connection between locations. In addition, most studies use the user's personal movement patterns to predict, if the user goes to a place that has never been before, there is no data available to train the model; some researchers use the overall check-in data to train the model, so that the model It can be applied to the location prediction of all users. However, the prediction based on the overall data is too coarse-grained. If the users are currently in the same place, the final prediction results are all the same place, which is inconsistent with the actual situation.

针对传统的基于离散状态序列的位置预测模型不能很好预测位置的问题，本发明考虑了用户签到轨迹中不同位置之间的关联，从用户历史签到数据中分别挖掘出用户个体的移动模式以及整体的移动模式，即影响用户签到的内因与外因。其中，整体的移动模式则主要通过群体的行为来体现；个体的移动模式则考虑用户个人内在移动模式的动态变化，考虑个性化的位置预测。在挖掘用户的移动模式过程中也将时间因素考虑在内，针对用户移动模式在工作日和周末这两个不同时间段的变化，分别挖掘星期中的不同时间段用户的移动模式，不仅能够反映用户空间位置之间的实际转移情况，也暗含了其移动的潜在时间规律，这样，使得预测的结果更切合实际生活。此外，还创建了天气总特征，采用高斯核函数计算天气相似性来修正结果。Aiming at the problem that the traditional position prediction model based on discrete state sequence cannot predict the position well, the present invention considers the correlation between different positions in the user's check-in track, and mines the user's individual movement pattern and the overall user's movement pattern from the user's historical check-in data. mobile mode, that is, the internal and external factors that affect user check-in. Among them, the overall movement pattern is mainly reflected by the group's behavior; the individual movement pattern considers the dynamic changes of the user's personal movement pattern and individualized location prediction. The time factor is also taken into account in the process of mining the user's movement pattern. According to the change of the user's movement pattern in the two different time periods of weekdays and weekends, the user's movement pattern in different time periods of the week is respectively mined, which can not only reflect The actual transition between user space locations also implies the underlying temporal regularity of their movement, thus making the predicted results more realistic. In addition, an overall weather feature was created, and a Gaussian kernel function was used to calculate the weather similarity to correct the results.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明的目的在于提供一种基于用户移动模式的位置预测方法。In view of this, an object of the present invention is to provide a method for location prediction based on user movement patterns.

为达到上述目的，本发明提供如下技术方案：To achieve the above object, the present invention provides the following technical solutions:

一种基于用户移动模式的位置预测方法，该方法包含以下步骤：A method for location prediction based on user movement patterns, the method includes the following steps:

定义移动模式MP为：用户在连续时间顺序访问的位置集合；用户频繁移动的模式称为用户的移动模式，移动模式表示为MP＝{l₁,l₂,l₃,…,l_n}，n为用户的移动模式中包含的位置个数；The mobile mode MP is defined as: the set of locations that the user sequentially visits in continuous time; the mode in which the user moves frequently is called the user's mobile mode, and the mobile mode is expressed as MP={l ₁ ,l ₂ ,l ₃ ,...,l _n }, n is the number of positions included in the user's mobile mode;

定义个体移动模式IMP为：用户个人的历史访问位置中频繁出现的位置序列，对于给定的用户，其移动模式为历史签到记录中所有移动模式的集合；The individual movement pattern IMP is defined as: the frequently-occurring position sequence in the user's personal historical access location, for a given user, its movement pattern is the set of all movement patterns in the historical check-in record;

定义支持度为：用户的移动模式在其移动轨迹中出现的频率；在用户的历史签到记录中，包含多条移动轨迹，则移动模式的支持度可以计算为：The support degree is defined as: the frequency of the user's movement pattern appearing in his movement trajectory; in the user's historical check-in record, which contains multiple movement trajectories, the support degree of the movement pattern can be calculated as:

定义整体移动模式AMP为：不同分组的所有用户历史签到中频繁出现的移动模式；采用DTW算法计算用户移动模式的的相似性，然后通过聚类划分为多个组，根据每个组中所有用户历史访问位置中的移动模式找到一个中心模式，则为该组所有用户的整体移动模式；The overall mobility pattern AMP is defined as: the mobility patterns that frequently appear in the historical check-ins of all users in different groups; the similarity of user mobility patterns is calculated by using the DTW algorithm, and then divided into multiple groups by clustering, according to all users in each group. The movement patterns in the historical access locations find a central pattern, which is the overall movement pattern of all users in this group;

定义天气总特征：将降雨、温度、风速三种天气特征依据加权融合的方式，组合成为一个新的特征；Define the general weather characteristics: combine the three weather characteristics of rainfall, temperature and wind speed into a new feature according to the method of weighted fusion;

S1：采用Apriori算法挖掘出每个用户的个体移动模式，找出影响用户签到的内因；S1: Use the Apriori algorithm to dig out the individual movement patterns of each user, and find out the internal factors that affect the user's check-in;

S2：利用动态时间规整算法DTW计算用户的个体移动模式之间的相似性；S2: Use the dynamic time warping algorithm DTW to calculate the similarity between the individual movement patterns of users;

S3：通过聚类将用户的个体移动模式进行分组，得到每一组的中心模式，即整体移动模式AMP，找到影响签到的外因；S3: Group the individual movement patterns of users through clustering to obtain the central pattern of each group, that is, the overall movement pattern AMP, and find the external factors that affect check-in;

S4：分别用个体的移动模式与整体的移动模式来训练马尔可夫模型；S4: Use the individual movement patterns and the overall movement patterns to train the Markov model respectively;

S5：基于IMP和AMP训练马尔可夫链模型，组合两者的概率向量，预测用户的下一个位置；S5: Train the Markov chain model based on IMP and AMP, combine the probability vectors of the two, and predict the user's next location;

S6：考虑外在天气的影响，创建天气总特征；S6: Consider the influence of external weather, and create general weather characteristics;

S7：利用高斯核函数计算当前地点的天气与其他地点天气的相似性，对预测的结果进行修正；S7: Use the Gaussian kernel function to calculate the similarity between the weather at the current location and the weather at other locations, and correct the predicted result;

S8：设置评估标准和基准方法。S8: Set evaluation criteria and benchmark methods.

可选的，所述步骤S1具体为：Optionally, the step S1 is specifically:

S11：在给定的时间范围内，通过对Gowalla进行分析，找出用户的签到记录中长度为1的移动模式；S11: In a given time range, by analyzing Gowalla, find out the movement pattern of length 1 in the user's check-in record;

S12：接着依次找出长度为2的移动模式，然后计算其支持度σ是否满足要求，一直这样循环，直到移动模式的长度无法增加为止，得到候选移动模式；S12: Next, find out the movement patterns with a length of 2 in turn, and then calculate whether the support degree σ meets the requirements, and keep repeating this cycle until the length of the movement patterns cannot be increased, and a candidate movement pattern is obtained;

S13：从得到的候选移动模式中，找出支持度满足条件的移动模式，得到用户的个体移动模式。S13: From the obtained candidate movement patterns, find out the movement patterns whose support degree satisfies the condition, and obtain the individual movement patterns of the user.

可选的，所述步骤中，对于两种移动模式的相似性计算不单纯的计算两点之间的欧式距离，而计算Haversine距离，传入两点的坐标得到实际两点之间的地理空间距离，具体如下：Optionally, in the step, for the similarity calculation of the two movement modes, the Euclidean distance between the two points is not simply calculated, but the Haversine distance is calculated, and the coordinates of the two points are passed in to obtain the actual geographic space between the two points. distance, as follows:

其中：in:

|M_p|表示移动模式的长度，即模式中位置的个数；rest(M_p)表示移除第一个位置的移动模式，d(l,l_i)表示两个位置间的真实距离。|M _p | represents the length of the moving pattern, that is, the number of positions in the pattern; rest(M _p ) represents the moving pattern that removes the first location, and d(l,l _i ) represents the true distance between the two locations.

可选的，所述步骤S3具体为：Optionally, the step S3 is specifically:

S31：根据用户个人的移动模式，先初始化多个类，并且设置距离阈值τ；S31: Initialize multiple classes according to the user's personal movement pattern, and set a distance threshold τ;

S32：对于每个用户的每一条移动模式，计算其与每个类的距离，并选择距离最小的类；S32: For each movement pattern of each user, calculate its distance from each class, and select the class with the smallest distance;

S33：然后采用DTW算法计算该移动模式与这个类之间的距离，如果小于阈值τ，则将其加入该类中并更新；否则，则为该移动模式创建一个新类；S33: Then use the DTW algorithm to calculate the distance between the movement pattern and this class, if it is less than the threshold τ, add it to the class and update it; otherwise, create a new class for the movement pattern;

S34：得到聚类的结果，即每个人所属的整体移动模式。S34: Obtain the result of clustering, that is, the overall movement pattern to which each person belongs.

可选的，所述步骤S4具体为：Optionally, the step S4 is specifically:

S41：在对用户的个人移动模式进行聚类之后，获得用户整体的移动模式，结合所得的整体移动模式，将要去的下一个位置为：S41: After clustering the personal movement patterns of the users, obtain the overall movement patterns of the users, and combining the obtained overall movement patterns, the next location to be visited is:

S42：基于个人移动模式，将要去的下一个位置为：S42: Based on the personal movement pattern, the next location to go is:

其中，

表示含有N个位置的移动模式，MP_c表示移动模式类的集合，

表示序列为

的移动模式出现在MP_c中的次数，

表示在MP_c中，位置l_i紧随后面出现的次数。in,

represents a mobility pattern with N positions, MP _c represents a collection of mobility pattern classes,

represents the sequence as

The number of times that the movement pattern appears in MP _c ,

represents the number of occurrences of position _li immediately following in MP _c .

可选的，所述步骤S5对于每个人，都有其个人的移动模式以及整体移动模式，分别将其用于马尔可夫模型的训练；最后都会得到预测的一个概率向量；基于整体的向量为P^AMP＝(l₁,l₂,l₃,…,l_n)，基于个体移动模式的向量为P^IMP＝(l₁,l₂,l₃,…,l_n)，其中n表示预测位置的个数；然后将得到的两个结果组合在一起，得到最终的预测结果；最终的组合如下：Optionally, for each person in the step S5, there is a personal movement pattern and an overall movement pattern, which are respectively used for the training of the Markov model; finally, a predicted probability vector will be obtained; the overall vector is P ^AMP =(l ₁ ,l ₂ ,l ₃ ,...,l _n ), the vector based on individual movement patterns is P ^IMP =(l ₁ ,l ₂ ,l ₃ ,...,l _n ), where n represents the predicted position the number of ; then combine the two obtained results to obtain the final prediction result; the final combination is as follows:

P＝α·P^IMP+(1-α)P^AMP。P=α·P ^IMP +(1−α)P ^AMP .

可选的，所述步骤S6包含：Optionally, the step S6 includes:

S61：创建天气总特征X_weather＝[Temperature,Rain,Windspeed]；S61: Create a general weather feature X _weather = [Temperature, Rain, Windspeed];

S62：对用户签到地点的三种天气进行加权求和，综合考虑三种天气对用户签到地点的影响，得到每个用户签到地点的天气总特征，具体表示如下：S62: Perform a weighted summation of the three weathers at the user's check-in location, and comprehensively consider the effects of the three weathers on the user's check-in location to obtain the general weather characteristics of each user's check-in location, which are specifically expressed as follows:

X_weather＝ω₁Temperature+ω₂Windspeed+ω₃RainX _weather = ω ₁ Temperature+ω ₂ Windspeed+ω ₃ Rain

其中，降雨的权重计算如下：Among them, the weight of rainfall is calculated as follows:

指的是用户签到的其中一个地点l_i在给定降雨区间用户签到的总次数，

相应时间段内该降雨区间出现的总天数；风速和温度的权重计算也一致。

refers to the total number of user check-ins in one of the locations l _i checked in by the user in a given rainfall interval,

The total number of days that the rainfall interval occurs in the corresponding time period; the weight calculation of wind speed and temperature is also the same.

可选的，所述步骤S7中，在通过创建的天气总特征计算出用户的天气偏好之后，采用高斯核函数计算用户当前地点X_l与其他地点

天气的相似度，得到最终的预测结果；具体计算如下：Optionally, in the step S7, after calculating the user's weather preference through the created weather total feature, a Gaussian kernel function is used to calculate the user's current location _X1 and other locations.

The similarity of the weather is used to obtain the final prediction result; the specific calculation is as follows:

其中，X_l表示用户当前所处位置的天气情况，

则为其他地点的天气情况。Among them, X _l represents the weather condition of the user's current location,

The weather conditions at other locations.

可选的，所述步骤S8包含：Optionally, the step S8 includes:

S81：将地点预测的Accuracy和APR作为实验的评估标准；S81: Accuracy and APR of location prediction are used as evaluation criteria for the experiment;

S82：Accuracy：该指标定义了在用户的预测结果列表中，预测正确的地点占总的预测地点的比例；当预测结果与实际一致时p(l)＝1；S82: Accuracy: This indicator defines the proportion of correctly predicted locations in the total predicted locations in the user's prediction result list; p(l)=1 when the prediction result is consistent with the actual;

S83：平均百分比排名APR：预测问题也与排序有一定的关系，用户u_i的签到地点l_j在预测列表PR定义为：S83: Average percentage ranking APR: The prediction problem also has a certain relationship with the ranking. The check-in location l _j of user _ui is defined in the prediction list PR as:

取PR值的和的平均值则得到所有用户的APR值，值越大说明预测的效果越好；公式如下：Take the average of the sum of PR values to get the APR values of all users. The larger the value, the better the prediction effect; the formula is as follows:

S84：为了验证提出的一种基于用户移动模式的位置预测方法的有效性，选取以下的模型与提出的模型进行对比：S84: In order to verify the effectiveness of the proposed method for location prediction based on user movement patterns, the following models are selected for comparison with the proposed model:

NextPlace：是一种经典的位置预测方法，基于到达时间的非线性时间序列分析来预测用户行为，使用时间序列的相似性来进行预测；NextPlace: It is a classic location prediction method that predicts user behavior based on nonlinear time series analysis of arrival time, and uses the similarity of time series to make predictions;

SimPreT：将历史模式与当前用户轨迹关联起来，利用模式相似性来确定用户的下一个位置；SimPreT: Correlate historical patterns with current user trajectories and use pattern similarity to determine the user's next location;

HMM-based：该模型通过构建混合马尔可夫模型，同时考虑实际人类签到数据中的非高斯以及时空特性。HMM-based: This model builds a mixed Markov model while taking into account non-Gaussian as well as spatiotemporal properties in actual human check-in data.

本发明的有益效果在于：本发明考虑了用户签到轨迹中不同位置之间的关联，从用户历史签到数据中分别挖掘出用户个体的移动模式以及整体的移动模式，即影响用户签到的内因与外因。其中，整体的移动模式则主要通过群体的行为来体现；个体的移动模式则考虑用户个人内在移动模式的动态变化，考虑个性化的位置预测。在挖掘用户的移动模式过程中也将时间因素考虑在内，针对用户移动模式在工作日和周末这两个不同时间段的变化，分别挖掘星期中的不同时间段用户的移动模式，不仅能够反映用户空间位置之间的实际转移情况，也暗含了其移动的潜在时间规律，这样，使得预测的结果更切合实际生活。此外，还创建了天气总特征，采用高斯核函数计算天气相似性来修正结果。The beneficial effect of the present invention is that: the present invention considers the association between different positions in the user's check-in track, and respectively excavates the user's individual movement pattern and the overall movement pattern from the user's check-in history data, that is, the internal and external factors that affect the user's check-in. . Among them, the overall movement pattern is mainly reflected by the group's behavior; the individual movement pattern considers the dynamic changes of the user's personal movement pattern and individualized location prediction. The time factor is also taken into account in the process of mining the user's movement pattern. According to the change of the user's movement pattern in the two different time periods of weekdays and weekends, the user's movement pattern in different time periods of the week is respectively mined, which can not only reflect The actual transition between user space locations also implies the underlying temporal regularity of their movement, thus making the predicted results more realistic. In addition, an overall weather feature was created, and a Gaussian kernel function was used to calculate the weather similarity to correct the results.

本发明的其他优点、目标和特征在某种程度上将在随后的说明书中进行阐述，并且在某种程度上，基于对下文的考察研究对本领域技术人员而言将是显而易见的，或者可以从本发明的实践中得到教导。本发明的目标和其他优点可以通过下面的说明书来实现和获得。Other advantages, objects, and features of the present invention will be set forth in the description that follows, and will be apparent to those skilled in the art based on a study of the following, to the extent that is taught in the practice of the present invention. The objectives and other advantages of the present invention may be realized and attained by the following description.

附图说明Description of drawings

为了使本发明的目的、技术方案和优点更加清楚，下面将结合附图对本发明作优选的详细描述，其中：In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be preferably described in detail below with reference to the accompanying drawings, wherein:

图1为本发明的总体流程图；Fig. 1 is the overall flow chart of the present invention;

图2为数据集中的签到地点信息；Figure 2 is the check-in location information in the dataset;

图3为不同移动模式的预测准确率对比；Figure 3 is a comparison of the prediction accuracy of different movement modes;

图4为模型在两个城市中准确率的对比；Figure 4 shows the comparison of the accuracy of the model in the two cities;

图5为模型APR值的对比。Figure 5 shows the comparison of model APR values.

具体实施方式Detailed ways

以下通过特定的具体实例说明本发明的实施方式，本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用，本说明书中的各项细节也可以基于不同观点与应用，在没有背离本发明的精神下进行各种修饰或改变。需要说明的是，以下实施例中所提供的图示仅以示意方式说明本发明的基本构想，在不冲突的情况下，以下实施例及实施例中的特征可以相互组合。The embodiments of the present invention are described below through specific specific examples, and those skilled in the art can easily understand other advantages and effects of the present invention from the contents disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that the drawings provided in the following embodiments are only used to illustrate the basic idea of the present invention in a schematic manner, and the following embodiments and features in the embodiments can be combined with each other without conflict.

其中，附图仅用于示例性说明，表示的仅是示意图，而非实物图，不能理解为对本发明的限制；为了更好地说明本发明的实施例，附图某些部件会有省略、放大或缩小，并不代表实际产品的尺寸；对本领域技术人员来说，附图中某些公知结构及其说明可能省略是可以理解的。Among them, the accompanying drawings are only used for exemplary description, and represent only schematic diagrams, not physical drawings, and should not be construed as limitations of the present invention; in order to better illustrate the embodiments of the present invention, some parts of the accompanying drawings will be omitted, The enlargement or reduction does not represent the size of the actual product; it is understandable to those skilled in the art that some well-known structures and their descriptions in the accompanying drawings may be omitted.

本发明实施例的附图中相同或相似的标号对应相同或相似的部件；在本发明的描述中，需要理解的是，若有术语“上”、“下”、“左”、“右”、“前”、“后”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此附图中描述位置关系的用语仅用于示例性说明，不能理解为对本发明的限制，对于本领域的普通技术人员而言，可以根据具体情况理解上述术语的具体含义。The same or similar numbers in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there are terms “upper”, “lower”, “left” and “right” , "front", "rear" and other indicated orientations or positional relationships are based on the orientations or positional relationships shown in the accompanying drawings, and are only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the indicated device or element must be It has a specific orientation, is constructed and operated in a specific orientation, so the terms describing the positional relationship in the accompanying drawings are only used for exemplary illustration, and should not be construed as a limitation of the present invention. situation to understand the specific meaning of the above terms.

本发明为一种基于用户移动模式的位置预测方法，通过挖掘出的个体与整体的移动模式分别训练马尔可夫模型，进而预测用户的地点，并提出基于用户个体与整体移动模式相似度的位置预测。首先挖掘出每个用户的移动模式；然后利用DTW算法根据相似度进行聚类，得到整体的移动模式；接着分别用个体的移动模式与整体的移动模式来训练马尔可夫模型，进行位置的预测。最后，还创建了天气总特征，采用高斯核函数计算天气相似性来修正结果。The present invention is a position prediction method based on the user's movement pattern. The Markov model is trained separately through the excavated individual and overall movement patterns, so as to predict the user's location, and propose a location based on the similarity between the user's individual and the overall movement pattern. predict. First, the movement patterns of each user are mined; then the DTW algorithm is used to cluster according to the similarity, and the overall movement patterns are obtained; then the Markov models are trained with the individual movement patterns and the overall movement patterns to predict the location. . Finally, an overall weather feature is also created, and a Gaussian kernel function is used to calculate the weather similarity to correct the results.

为了能够更加简洁且清晰的描述，对部分名词定义进行解释：For a more concise and clear description, some definitions of terms are explained:

定义移动模式(MP)为：用户在连续时间顺序访问的位置集合。用户频繁移动的模式称为用户的移动模式，移动模式可以表示为MP＝{l₁,l₂,l₃,…,l_n}，n为用户的移动模式中包含的位置个数。A mobility pattern (MP) is defined as: a set of locations that a user sequentially visits in continuous time. The frequent movement pattern of the user is called the user's movement pattern, and the movement pattern can be expressed as MP={l ₁ , l ₂ , l ₃ , . . . , l _n }, where n is the number of positions included in the user's movement pattern.

定义个体移动模式(IMP)为：用户个人的历史访问位置中频繁出现的位置序列，对于给定的用户，其移动模式为历史签到记录中所有移动模式的集合。The individual movement pattern (IMP) is defined as: the frequently-occurring location sequence in the user's personal historical visit location, for a given user, its movement pattern is the set of all movement patterns in the historical check-in records.

定义支持度为：用户的移动模式在其移动轨迹中出现的频率。在用户的历史签到记录中，包含多条移动轨迹，则移动模式的支持度可以计算为：The support degree is defined as: the frequency of the user's movement pattern in his movement trajectory. In the user's historical check-in record, including multiple movement trajectories, the support degree of the movement mode can be calculated as:

定义整体移动模式(AMP)为：不同分组的所有用户历史签到中频繁出现的移动模式。采用DTW算法计算用户移动模式的的相似性，然后通过聚类划分为多个组，根据每个组中所有用户历史访问位置中的移动模式找到一个中心模式，则为该组所有用户的整体移动模式。The overall mobility pattern (AMP) is defined as: the mobility pattern that frequently appears in the historical check-ins of all users in different groups. The DTW algorithm is used to calculate the similarity of user movement patterns, and then it is divided into multiple groups by clustering, and a central pattern is found according to the movement patterns in the historical access locations of all users in each group, which is the overall movement of all users in this group. model.

定义天气总特征：将降雨、温度、风速三种天气特征依据加权融合的方式，组合成为一个新的特征。Define the general weather characteristics: combine the three weather characteristics of rainfall, temperature, and wind speed into a new feature according to the weighted fusion method.

如图1所示，本发明分为如下步骤：As shown in Figure 1, the present invention is divided into the following steps:

S1：采用Apriori算法挖掘出每个用户的个体移动模式，找出影响用户签到的内因。S1: Use the Apriori algorithm to mine the individual movement patterns of each user, and find out the internal factors that affect the user's check-in.

S2：利用动态时间规整算法(Dynamic Time Warping,DTW)计算用户的个体移动模式之间的相似性。S2: Calculate the similarity between the individual movement patterns of the users by using the Dynamic Time Warping (DTW).

S3：通过聚类将用户的个体移动模式进行分组，得到每一组的中心模式，也即整体移动模式(AMP)，找到影响签到的外因。S3: Group the individual movement patterns of users through clustering, obtain the central pattern of each group, that is, the overall movement pattern (AMP), and find the external factors that affect the check-in.

S4：分别用个体的移动模式与整体的移动模式来训练马尔可夫模型。S4: Train the Markov model with the individual movement patterns and the overall movement patterns, respectively.

S5：基于IMP和AMP训练马尔可夫链模型，组合两者的概率向量，预测用户的下一个位置。S5: Train a Markov chain model based on IMP and AMP, combine the probability vectors of the two, and predict the user's next location.

S6：考虑外在天气的影响，创建天气总特征。S6: Consider the influence of external weather, and create a general weather feature.

S7：利用高斯核函数计算当前地点的天气与其他地点天气的相似性，对预测的结果进行修正。S7: Use the Gaussian kernel function to calculate the similarity between the weather at the current location and the weather at other locations, and correct the predicted result.

其中，步骤S1中，采用Apriori算法来挖掘用户的移动模式，Apriori算法是一种基于关联规则的数据挖掘算法，后来被应用于用户移动模式的挖掘。我们则在移动模式的挖掘过程中加入了时间因素，这样使得挖掘出的用户移动模式具有时间规律，可以知道用户的移动模式随时间变化的情况。首先在给定的时间范围内(工作日，周末)，通过对Gowalla进行分析，找出用户的签到记录中长度为1的移动模式，签到信息如图2所示。接着依次找出长度为2的移动模式，然后计算其支持度σ是否满足要求，一直这样循环，直到移动模式的长度无法增加为止，这样得到的即为候选移动模式。从得到的候选移动模式中，找出支持度满足条件的移动模式，得到用户的个体移动模式。Among them, in step S1, the Apriori algorithm is used to mine the user's movement pattern, and the Apriori algorithm is a data mining algorithm based on association rules, which is later applied to the mining of the user's movement pattern. We added the time factor in the mining process of mobile patterns, so that the excavated user mobile patterns have time regularity, and we can know the changes of users' mobile patterns with time. First, in a given time range (weekdays, weekends), through the analysis of Gowalla, find out the mobile pattern of length 1 in the user's check-in record. The check-in information is shown in Figure 2. Then find out the movement patterns with length 2 in turn, and then calculate whether its support σ meets the requirements, and keep repeating this cycle until the length of the movement patterns cannot be increased, and the candidate movement patterns are obtained in this way. From the obtained candidate movement patterns, find out the movement patterns whose support degree satisfies the condition, and obtain the user's individual movement pattern.

其中，步骤S2中，在于对于两种移动模式的相似性计算我们不是单纯的计算两点之间的欧式距离，而是计算的Haversine距离，传入两点的坐标就可以得到实际两点之间的地理空间距离，这样计算得到的距离也更加的准确。具体如下：Among them, in step S2, for the similarity calculation of the two moving modes, we do not simply calculate the Euclidean distance between the two points, but the calculated Haversine distance, and the actual distance between the two points can be obtained by passing in the coordinates of the two points. Geospatial distance, so the calculated distance is more accurate. details as follows:

其中：in:

|M_p|表示移动模式的长度，也即模式中位置的个数；rest(M_p)表示移除第一个位置的移动模式，d(l,l_i)表示两个位置间的真实距离。|M _p | indicates the length of the movement pattern, that is, the number of positions in the pattern; rest(M _p ) indicates the movement pattern with the first position removed, and d(l,l _i ) indicates the true distance between the two positions .

其中，步骤S3中，根据步骤S1得到用户个人的移动模式，先初始化多个类，并且设置距离阈值τ。对于每个用户的每一条移动模式，计算其与每个类的距离，并选择距离最小的类。然后采用DTW算法计算该移动模式与这个类之间的距离，如果小于阈值τ，则将其加入该类中并更新，否则，则为该移动模式创建一个新类。最后得到聚类的结果，也就是每个人所属的整体移动模式。Wherein, in step S3, the user's personal movement pattern is obtained according to step S1, a plurality of classes are initialized, and a distance threshold τ is set. For each movement pattern of each user, calculate its distance to each class, and select the class with the smallest distance. Then the DTW algorithm is used to calculate the distance between the movement pattern and this class, if it is less than the threshold τ, it is added to the class and updated, otherwise, a new class is created for the movement pattern. Finally, the result of clustering is obtained, that is, the overall movement pattern to which each person belongs.

其中，步骤S4中，根据步骤S3在对用户的个人移动模式进行聚类之后，我们获得了用户整体的移动模式，结合所得的整体移动模式，则其将要去的下一个位置为：Among them, in step S4, after clustering the user's personal movement pattern according to step S3, we obtain the overall movement pattern of the user, and combined with the obtained overall movement pattern, the next position he will go to is:

而基于个人移动模式，则其将要去的下一个位置为：And based on the personal movement pattern, the next location it will go to is:

其中，

表示含有N个位置的移动模式，MP_c表示移动模式类的集合，

表示序列为

的移动模式出现在MP_c中的次数，

表示在MP_c中，位置l_i紧随后面出现的次数。in,

represents the sequence as

The number of times that the movement pattern appears in MP _c ,

其中，步骤S5中，对于每个人，都有其个人的移动模式以及整体移动模式，分别将其用于马尔可夫模型的训练。最后都会得到预测的一个概率向量。基于整体的向量为P^AMP＝(l₁,l₂,l₃,…,l_n)，基于个体移动模式的向量为P^IMP＝(l₁,l₂,l₃,…,l_n)，其中n表示预测位置的个数；然后将得到的两个结果组合在一起，得到最终的预测结果。因此，最终的组合如下：Wherein, in step S5, for each person, there is a personal movement pattern and an overall movement pattern, which are respectively used for the training of the Markov model. In the end, a predicted probability vector will be obtained. The ensemble-based vector is P ^AMP =(l ₁ ,l ₂ ,l ₃ ,...,l _n ), the individual movement pattern-based vector is P ^IMP =(l ₁ ,l ₂ ,l ₃ ,...,l _n ), where n represents the number of predicted positions; then the two obtained results are combined to obtain the final predicted result. So the final combination is as follows:

P＝α·P^IMP+(1-α)P^AMP P=α·P ^IMP +(1-α)P ^AMP

其中，步骤S6中，首先创建天气总特征X_weather＝[Temperature,Rain,Windspeed]。然后对用户签到地点的三种天气进行加权求和，综合考虑三种天气对用户签到地点的影响，得到每个用户签到地点的天气总特征，具体表示如下：Wherein, in step S6, firstly, the total weather feature X _weather =[Temperature, Rain, Windspeed] is created. Then, the weighted summation of the three weathers at the user's check-in location is carried out, and the influence of the three weathers on the user's check-in location is comprehensively considered, and the general weather characteristics of each user's check-in location are obtained, which are specifically expressed as follows:

相应时间段内该降雨区间出现的总天数。风速和温度的权重也类似的计算。

The total number of days that this rainfall interval occurs during the corresponding time period. The weights for wind speed and temperature are calculated similarly.

其中，步骤S7中，根据S6在通过创建的天气总特征计算出用户的天气偏好之后，我们采用高斯核函数来计算用户当前地点(X_l)与其他地点

天气的相似度，得到最终的预测结果。具体计算如下：Among them, in step S7, after calculating the user's weather preference through the created general weather features according to S6, we use a Gaussian kernel function to calculate the user's current location (X _l ) and other locations

Similarity of weather to get the final prediction result. The specific calculation is as follows:

其中，X_l表示用户当前所处位置的天气情况，

The weather conditions at other locations.

其中，步骤S8中，选取了Gowalla数据集两个城市一年的签到记录，分别为伦敦(LON)和洛杉矶(LA)的数据作为试验的数据集。将实验的数据分为测试集和训练集以及验证集，在训练集上训练后用于验证集上进行验证，最后在测试集测试。数据集的信息如图2所示：Among them, in step S8, the check-in records of two cities in the Gowalla data set for one year are selected as the data of London (LON) and Los Angeles (LA) as the test data set. The data of the experiment is divided into test set, training set and validation set. After training on the training set, it is used for verification on the validation set, and finally tested on the test set. The information of the dataset is shown in Figure 2:

将地点预测的Accuracy和APR作为实验的评估标准。定义如下：Accuracy and APR of location prediction were used as evaluation criteria for the experiment. Defined as follows:

Accuracy：该指标定义了在用户的预测结果列表中，预测正确的地点占总的预测地点的比例。当预测结果与实际一致时p(l)＝1。Accuracy: This metric defines the proportion of correctly predicted locations to the total predicted locations in the user's prediction result list. p(l)=1 when the predicted result is consistent with the actual.

平均百分比排名(Average Percentile Rank，APR)：预测问题也与排序有一定的关系，用户u_i的签到地点l_j在预测列表PR定义为：Average Percentile Rank (APR): The prediction problem also has a certain relationship with the ranking. The check-in location l _j of user _ui is defined in the prediction list PR as:

取PR值的和的平均值则得到所有用户的APR值，值越大说明预测的效果越好。公式如下：The APR value of all users is obtained by taking the average of the sum of the PR values. The larger the value, the better the prediction effect. The formula is as follows:

其次，为了验证提出的一种基于用户移动模式的位置预测方法的有效性，选取了以下的模型与提出的模型进行对比：Secondly, in order to verify the effectiveness of the proposed method for location prediction based on user movement patterns, the following models are selected for comparison with the proposed model:

NextPlace：是一种经典的位置预测方法，基于到达时间的非线性时间序列分析来预测用户行为，使用时间序列的相似性来进行预测。NextPlace: It is a classic location prediction method that predicts user behavior based on non-linear time series analysis of arrival time, and uses the similarity of time series to make predictions.

SimPreT：将历史模式与当前用户轨迹关联起来，利用模式相似性来确定用户的下一个位置。SimPreT: Correlate historical patterns with current user trajectories and leverage pattern similarity to determine the user's next location.

HMM-based(Hybrid Markov Model-based)：该模型通过构建混合马尔可夫模型，同时考虑实际人类签到数据中的非高斯以及时空特性。HMM-based (Hybrid Markov Model-based): This model constructs a hybrid Markov model while taking into account the non-Gaussian and spatiotemporal characteristics of actual human check-in data.

图3展示了个体移动模式(IMP)与整体移动模式(AMP)的预测性能对比结果。从图中我们可以看到，当仅考虑整体的移动模式时，预测的性能随着位置的增加逐渐下降；仅基于个体活动模式的方法在早期明显遇到冷启动问题，预测准确率较低，但随着个体历史活动信息的积累，预测准确率大大提高。特别是在后期，它的表现优于基于个体移动模式和整体活动模式的方法。尽管这样，在我们的方法我们仍然同时考虑了个体和整体的移动模式。我们认为，保留用户整体的模式可以使我们的预测方法更加稳健，并可以处理一些仅依靠个人移动模式无法预测的情况。Figure 3 shows the prediction performance of individual mobility patterns (IMP) and overall mobility patterns (AMP). From the figure, we can see that when only the overall movement pattern is considered, the prediction performance gradually decreases with the increase of the location; the method based only on the individual movement pattern obviously encounters the cold start problem in the early stage, and the prediction accuracy is low. However, with the accumulation of individual historical activity information, the prediction accuracy is greatly improved. Especially in late stages, it outperforms methods based on individual movement patterns and overall activity patterns. Nonetheless, in our approach we still consider both individual and overall mobility patterns. We believe that preserving the overall patterns of users makes our prediction method more robust and can handle some cases that cannot be predicted by individual movement patterns alone.

本发明所提出模型与其他模型的准确率对比如图4所示。从图中可以看出本发明提出的模型的准确率远高于NextPlace和SimPreT算法，比HMM-based略高。在LA数据集中分别提高了15％、5％、2.1％；在LON数据集中则分别提高了14％、6.5％、4.2％。说明同时考虑用户个人和整体移动模式，以及同时考虑天气因素有助于提高预测的结果。The comparison of the accuracy of the model proposed by the present invention and other models is shown in FIG. 4 . It can be seen from the figure that the accuracy of the model proposed by the present invention is much higher than that of the NextPlace and SimPreT algorithms, and slightly higher than that of HMM-based. In the LA dataset, the improvements were increased by 15%, 5%, and 2.1%, respectively; in the LON dataset, they were increased by 14%, 6.5%, and 4.2%, respectively. Explain that taking into account both the user's individual and overall movement patterns, as well as taking into account the weather, can help improve forecast results.

如图5所示，在LA中，所提出的模型APR的表现都要优于其他的模型，模型的APR比NextPlace提高了19％，比SimPreT模型提高了7％，比HMM-based模型提高了4％，同时，在LON数据集上，本发明的模型比NextPlace提高了18％，比SimPreT模型提高了9.2％，比HMM-based模型提高了5％。As shown in Figure 5, in LA, the APR performance of the proposed model is better than other models, the APR of the model is 19% higher than that of NextPlace, 7% higher than that of SimPreT model, and higher than that of HMM-based model. At the same time, on the LON dataset, the model of the present invention is 18% higher than NextPlace, 9.2% higher than SimPreT model, and 5% higher than HMM-based model.

最后说明的是，以上实施例仅用以说明本发明的技术方案而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或者等同替换，而不脱离本技术方案的宗旨和范围，其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent replacements, without departing from the spirit and scope of the technical solution, should all be included in the scope of the claims of the present invention.

Claims

1. a position prediction method based on user movement pattern is characterized in that: the method comprises the following steps:

The mobile mode MP is defined as: the set of locations that the user sequentially visits in continuous time; the mode in which the user moves frequently is called the user's mobile mode, and the mobile mode is expressed as MP={l ₁ ,l ₂ ,l ₃ ,...,l _n }, n is the number of positions included in the user's mobile mode;

The individual movement pattern IMP is defined as: the frequently-occurring position sequence in the user's personal historical access location, for a given user, its movement pattern is the set of all movement patterns in the historical check-in record;

The support degree is defined as: the frequency of the user's movement pattern appearing in his movement trajectory; in the user's historical check-in record, which contains multiple movement trajectories, the support degree of the movement pattern can be calculated as:

The overall mobility pattern AMP is defined as: the mobility patterns that frequently appear in the historical check-ins of all users in different groups; the similarity of user mobility patterns is calculated by using the DTW algorithm, and then divided into multiple groups by clustering, according to all users in each group. The movement patterns in the historical access locations find a central pattern, which is the overall movement pattern of all users in this group;

Define the general weather characteristics: combine the three weather characteristics of rainfall, temperature and wind speed into a new feature according to the method of weighted fusion;

S1: Use the Apriori algorithm to dig out the individual movement patterns of each user, and find out the internal factors that affect the user's check-in;

S2: Use the dynamic time warping algorithm DTW to calculate the similarity between the individual movement patterns of users;

S3: Group the individual movement patterns of users through clustering to obtain the central pattern of each group, that is, the overall movement pattern AMP, and find the external factors that affect check-in;

S4: Use the individual movement patterns and the overall movement patterns to train the Markov model respectively;

S5: Train the Markov chain model based on IMP and AMP, combine the probability vectors of the two, and predict the user's next location;

S6: Consider the influence of external weather, and create general weather characteristics;

S7: Use the Gaussian kernel function to calculate the similarity between the weather at the current location and the weather at other locations, and correct the predicted result;

S8: Set evaluation criteria and benchmark methods.

2. a kind of position prediction method based on user movement pattern according to claim 1, is characterized in that: described step S1 is specifically:

S11: In a given time range, by analyzing Gowalla, find out the movement pattern of length 1 in the user's check-in record;

S12: Next, find out the movement patterns with a length of 2 in turn, and then calculate whether the support degree σ meets the requirements, and keep repeating this cycle until the length of the movement patterns cannot be increased, and obtain the candidate movement patterns;

S13: From the obtained candidate movement patterns, find out the movement patterns whose support degree satisfies the condition, and obtain the individual movement patterns of the user.

3. a kind of position prediction method based on user's movement pattern according to claim 2, is characterized in that: in described step, for the similarity calculation of two kinds of movement patterns, it is not simple to calculate the Euclidean distance between two points, To calculate the Haversine distance, pass in the coordinates of two points to get the actual geographic space distance between the two points, as follows:

in:

|M _p | represents the length of the movement pattern, that is, the number of positions in the pattern; rest(M _p ) represents the movement pattern with the first position removed, and d(l,l _i ) represents the true distance between the two positions.

4. a kind of position prediction method based on user movement pattern according to claim 3, is characterized in that: described step S3 is specifically:

S31: Initialize multiple classes according to the user's personal movement pattern, and set a distance threshold τ;

S32: For each movement pattern of each user, calculate its distance from each class, and select the class with the smallest distance;

S33: Then use the DTW algorithm to calculate the distance between the movement pattern and this class, if it is less than the threshold τ, add it to the class and update it; otherwise, create a new class for the movement pattern;

S34: Obtain the result of clustering, that is, the overall movement pattern to which each person belongs.

5. a kind of position prediction method based on user movement pattern according to claim 4, is characterized in that: described step S4 is specifically:

S41: After clustering the personal movement patterns of the users, obtain the overall movement patterns of the users, and combining the obtained overall movement patterns, the next location to be visited is:

S42: Based on the personal movement pattern, the next location to go is:

in,

represents the sequence as

The number of times that the movement pattern appears in MP _c ,

6. a kind of position prediction method based on user's movement pattern according to claim 5, is characterized in that: described step S5 has its personal movement pattern and overall movement pattern for each person, and it is used for Maldives respectively. Training of the Kov model; a predicted probability vector will be obtained in the end; the vector based on the whole is P ^AMP = (l ₁ ,l ₂ ,l ₃ ,...,l _n ), and the vector based on the individual movement pattern is P ^IMP =( l ₁ ,l ₂ ,l ₃ ,…,l _n ), where n represents the number of prediction positions; then combine the two obtained results to obtain the final prediction result; the final combination is as follows:

P=α·P ^IMP +(1−α)P ^AMP .

7. The method for position prediction based on user movement patterns according to claim 6, wherein the step S6 comprises:

S61: Create a general weather feature X _weather = [Temperature, Rain, Windspeed];

S62: Perform a weighted summation of the three weathers at the user's check-in location, and comprehensively consider the effects of the three weathers on the user's check-in location, to obtain the general weather characteristics of each user's check-in location, which are specifically expressed as follows:

X _weather = ω ₁ Temperature+ω ₂ Windspeed+ω ₃ Rain

Among them, the weight of rainfall is calculated as follows:

8. A kind of position prediction method based on user's movement pattern according to claim 7, it is characterized in that: in described step S7, after the weather preference of user is calculated by the weather general feature created, adopt Gaussian kernel function to calculate User's current location X _l and other locations

Among them, X _l represents the weather condition of the user's current location,

The weather conditions at other locations.

9. The method for position prediction based on user movement patterns according to claim 8, wherein the step S8 comprises:

S81: Accuracy and APR of location prediction are used as evaluation criteria for the experiment;

S82: Accuracy: This indicator defines the proportion of correctly predicted locations in the total predicted locations in the user's prediction result list; p(l)=1 when the prediction result is consistent with the actual;

S83: Average percentage ranking APR: The prediction problem also has a certain relationship with the ranking. The check-in location l _j of user _ui is defined in the prediction list PR as:

Take the average of the sum of PR values to get the APR values of all users. The larger the value, the better the prediction effect; the formula is as follows:

S84: In order to verify the effectiveness of the proposed method for location prediction based on user movement patterns, the following models are selected for comparison with the proposed model:

NextPlace: It is a classic location prediction method that predicts user behavior based on nonlinear time series analysis of arrival time, and uses the similarity of time series to make predictions;

SimPreT: Correlate historical patterns with current user trajectories, and use pattern similarity to determine the user's next location;

HMM-based: This model builds a mixed Markov model while taking into account the non-Gaussian as well as spatiotemporal properties of actual human check-in data.