CN111738447A

CN111738447A - A mobile social network user relationship inference method based on spatiotemporal relationship learning

Info

Publication number: CN111738447A
Application number: CN202010572405.8A
Authority: CN
Inventors: 陶玉婷; 常姗; 朱弘恣; 王佳程; 杜坷坷
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2020-10-02
Anticipated expiration: 2040-06-22
Also published as: CN111738447B

Abstract

The invention provides a method for inferring the user relationship of a mobile social network based on spatio-temporal relationship learning, which considers the mobility and sociability of individuals at the same time. Considering the effectiveness of the social network structure on social connection prediction, the model firstly constructs a preliminary social graph based on the mobility of users, then extracts the social network structure characteristics of the user pairs from the preliminarily constructed social graph, and finally carries out friend relation inference by integrating the characteristics of the mobility and the sociability. Once the training of the model is completed, different scenes can be well migrated to predict the friendship between users. Experiments on two real-world data sets have shown that our method is consistently superior to existing methods. Furthermore, our model is also valid for relationships with little check-in data and no face-to-face events.

Description

A mobile social network user relationship inference method based on spatiotemporal relationship learning

技术领域technical field

本发明涉及一种基于时空关系学习的移动社交网络用户关系推断方法，利用用户的移动信息。The invention relates to a method for inferring user relationship in a mobile social network based on time-space relationship learning, which utilizes the user's mobile information.

背景技术Background technique

近些年来，随着Facebook、Twitter和微博这类移动社交网络应用的普及，用户可以及时地发布其正在访问的感兴趣地点(一家网红餐厅、一个旅游景点等)与好友共享。虽然这类社交方式给人们的交友带来了极大的方便，但存在泄露用户社交关系的风险。移动社交网络的用户也逐渐意识到这一点，例如，一项针对Facebook用户的大规模研究表明：Facebook用户隐藏好友列表的比例从2010年的17.2％上升到2011年的56.2％。但很少有用户知道利用具有时空关系的签到记录可以推断其朋友，从而准确地揭露用户之间隐藏的社交关系。In recent years, with the popularity of mobile social networking applications such as Facebook, Twitter and Weibo, users can timely publish the places of interest they are visiting (an Internet celebrity restaurant, a tourist attraction, etc.) to share with friends. Although this type of social networking brings great convenience to people's making friends, there is a risk of revealing the user's social relationship. Users of mobile social networks are also becoming aware of this. For example, a large-scale study of Facebook users showed that the proportion of Facebook users hiding their friend lists increased from 17.2% in 2010 to 56.2% in 2011. However, few users know that their friends can be inferred from the check-in records with temporal and spatial relationships, thereby accurately revealing the hidden social relationships between users.

现有的基于时空关系的社交关系推断方法主要分为两类：第一类是基于特征选择的启发式方法，通过观察选择碰面次数、共享位置数、共享位置流行程度等有效特征作为衡量用户之间是否具有朋友关系的衡量标准。但这些方法对所需的移动数据提出了很多假设和限制，这在很大程度上缩小了它们的适用范围。例如，几乎所有现有的有效方法都只能在两个个体共享位置的情况下使用。另一类方法是基于特征学习的方法，通过机器学习的方法将用户的移动特征向量化，利用向量之间的相似程度作为推断用户对是否具有朋友关系的标准。但该类方法针对个体建模且丢失移动数据中的时间信息，因此无法直接获得较为准确的关系推断。Existing social relationship inference methods based on spatiotemporal relationships are mainly divided into two categories: the first is a heuristic method based on feature selection, which selects effective features such as the number of encounters, the number of shared locations, and the popularity of shared locations as a measure of user popularity. A measure of friendship. But these methods impose many assumptions and limitations on the required movement data, which greatly narrows their applicability. For example, almost all existing effective methods can only be used when two individuals share a location. Another type of method is the method based on feature learning, which uses the method of machine learning to vectorize the user's mobile features, and uses the similarity between the vectors as the criterion to infer whether the user has a friend relationship. However, this kind of method is aimed at individual modeling and loses the time information in the mobile data, so it cannot directly obtain more accurate relationship inference.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种能够利用用户移动信息准确推断用户社交关系的模型。The purpose of the present invention is to provide a model that can accurately infer the user's social relationship by using the user's movement information.

为了达到上述目的，本发明的技术方案是提供了一种基于时空关系学习的移动社交网络用户关系推断方法，其特征在于，包括以下步骤：In order to achieve the above object, the technical solution of the present invention is to provide a method for inferring user relationship in a mobile social network based on spatiotemporal relationship learning, which is characterized in that it includes the following steps:

步骤1、提取用户对之间的交互行为特征，利用该特征推断两个用户之间是否具有朋友关系，包括以下步骤：Step 1. Extract the interactive behavior feature between pairs of users, and use the feature to infer whether there is a friend relationship between the two users, including the following steps:

步骤101、将移动数据所有用户签到涉及的兴趣点POIs按照经纬度划分到I×J的网格中，同时将时间划分为M个时间片段，构建一个I×J×M的时空矩阵STD，其中，时间维度上将时间分为长度τ的等长时间片，空间维度上将空间均匀地划分为大小相等的网格，递归地将每个网格划分为四个相等的网格，直到兴趣点POIs的数量在每个网格小于阈值σ；Step 101: Divide the POIs involved in the check-in of all users of the mobile data into an I×J grid according to the latitude and longitude, and at the same time divide the time into M time segments to construct an I×J×M space-time matrix STD, wherein, The time dimension is divided into equal time slices of length τ, the space dimension is evenly divided into grids of equal size, and each grid is recursively divided into four equal grids until POIs of interest points The number of in each grid is less than the threshold σ;

步骤102、将每个用户对(u_a,u_b)的轨迹均投影到时空矩阵STD中，用户的每一个签到都可以投影到一个特定的方格中，对于每个方格，计算：该时间段内用户u_a访问过的兴趣点数n_a；用户u_b访问过的兴趣点数n_b；用户u_a和用户u_b共同访问过的兴趣点数n_a,b，由此获得用户对(u_a,u_b)的时空矩阵

式中三元组

表示第m时间段内用户u_a和u_b在第i行第j列的位置网格中移动信息的统计量；Step 102: Project the trajectory of each user pair (u _a , u _b ) into the space-time matrix STD, and each user's check-in can be projected into a specific square. For each square, calculate: the The number of interest points _na visited by the user _u _a in the time period; the number of interest points n _b _visited _by the user _ub _a , u _b ) space-time matrix

The triples in the formula

Represents the statistics of the movement information of users u _a and u _b in the position grid of the i-th row and the j-th column in the m-th time period;

步骤103、将每对用户对(u_a,u_b)的时空矩阵O_(a,b)编码成一个低维向量，并利用该低维向量计算用户u_a和用户u_b存在朋友关系的概率，获得初始的社交关系图G＝(U,E)，U是图中的顶点的集合，表示所有具有移动信息的用户；E是边的集合，表示两个用户之间具有朋友关系，其中，时空矩阵O_(a,b)的大小通过参数σ和τ来调整；Step 103: Encode the space-time matrix O _(a,b) of each pair of users (u _a , u _b ) into a low-dimensional vector, and use the low-dimensional vector to calculate the probability that user u _a and user _ub have a friend relationship , obtain the initial social relationship graph G=(U, E), U is the set of vertices in the graph, representing all users with mobile information; E is the set of edges, representing the friendship between two users, where, The size of the space-time matrix O _(a,b) is adjusted by the parameters σ and τ;

步骤2、为每个用户提取一个k可达的子图来描述用户之间的网络结构，对于给定的初始的社交关系图G＝(U,E)，定义提取用户对(u_a,u_b)的k-可达子图

的步骤如下：Step 2. Extract a k-reachable subgraph for each user to describe the network structure between users. For a given initial social relationship graph G=(U, E), define the extracted user pair (u _a , u The k-reachable subgraph of _b )

The steps are as follows:

步骤201、设置路径长度为2，将

初始化为空图；Step 201, set the path length to 2, set the

initialized to an empty map;

步骤202、在初始的社交关系图G中找出(u_a,u_b)之间所有长度为2条的路径，并将找到的所有路径表示为

然后删除社交关系图G中和

中均出现的点和边删除，除了用户u_a和用户u_b本身；Step 202: Find all paths with lengths of 2 between (u _a , u _b ) in the initial social relationship graph G, and express all the paths found as

Then delete the social graph G and neutralize

The points and edges that appear in all are deleted, except for user u _a and user u _b itself;

步骤203、逐步增加路径长度，重复步骤202，直到路径长度超过k；Step 203: Gradually increase the path length, and repeat step 202 until the path length exceeds k;

步骤3、根据初始的社交关系图G，对于每对用户对(u_a,u_b)的k-可达子图

进行编码，基于累加的原则对相同长度路径的编码，对不同长度的路径的编码结果进行拼接，实现对用户对k-可达子图的向量化，获得用户对的综合特征向量；Step 3. According to the initial social relationship graph G, for each pair of users (u _a , u _b ) k-reachable subgraph

Encoding, encoding paths of the same length based on the principle of accumulation, and splicing the encoding results of paths of different lengths, realizing the vectorization of the k-reachable subgraph of the user, and obtaining the comprehensive feature vector of the user pair;

步骤4、利用分类器根据用户对的综合特征向量进行0/1分类，其中1是朋友，0为不是朋友，获得最新的预测社交图。Step 4. Use the classifier to perform 0/1 classification according to the comprehensive feature vector of the user pair, where 1 is a friend and 0 is not a friend, and the latest predicted social graph is obtained.

步骤5、利用最新的社交图，更新用户对的结构特征，进而重新进行预测，直到预测结果不再发生变化，即获得最终预测的社交网络图。Step 5: Use the latest social graph to update the structural features of the user pair, and then re-predict until the prediction result no longer changes, that is, the final predicted social network graph is obtained.

优选地，步骤103中，将时空矩阵O_(a,b)输入一个具有R个隐藏层的自动编码器，该自动编码器将其编码成d维的向量，获得重构的时空矩阵

使其与编码器的输入O_(a,b)接近，该训练过程的优化目标为：Preferably, in step 103, the space-time matrix O _{(a, b) is} input into an auto-encoder with R hidden layers, and the auto-encoder encodes it into a d-dimensional vector to obtain a reconstructed space-time matrix

To make it close to the input O _(a,b) of the encoder, the optimization objective of this training process is:

式中，

表示混合网络中自编码器网络的损失函数，即尽可能地使得解码后的重构时空矩阵

与编码前的时空矩阵O_(a,b)相同。；In the formula,

Represents the loss function of the autoencoder network in the hybrid network, that is, the reconstructed spatiotemporal matrix after decoding as much as possible

The same as the space-time matrix O _(a,b) before encoding. ;

自动编码器采用有监督训练实现编码过程的重构和区别，即为自动编码器添加一个分类网络来监控其编码过程，该过程的损失函数

为：The autoencoder uses supervised training to reconstruct and distinguish the encoding process, that is, add a classification network to the autoencoder to monitor its encoding process, and the loss function of the process

for:

式中，

表示预测结果，即分类网络的输出结果；y表示样本标签；n表示训练样本的个数，即训练数据集中涉及到的用户对的个数。In the formula,

represents the prediction result, that is, the output result of the classification network; y represents the sample label; n represents the number of training samples, that is, the number of user pairs involved in the training data set.

为了获取更具有识别力的向量表示，对整合混合网络作出以下约束：In order to obtain a more discriminative vector representation, the following constraints are imposed on the integrated hybrid network:

式中，

表示整个混合网络的总体损失函数；In the formula,

represents the overall loss function of the entire hybrid network;

一旦训练完成，编码器将从自动编码器网络中取出，用于对用户集中的任何一对用户时空关系矩阵进行编码和初步关系推断。Once trained, the encoder is taken from the auto-encoder network for encoding and preliminary relation inference for any pair of user spatiotemporal relation matrices in the user set.

本发明提供了一种新的社交关系推理模型，该模型同时考虑了个体之间的移动性和社交性。考虑社交网结构对社会联系预测的有效性，该模型首先基于用户的移动性构建一个初步的社交图，然后从初步构建的社交图中抽取用户对的社交网络结构特征，最后综合移动性和社交性两个方面的特征进行朋友关系推断。我们的模型一旦训练完成，可以较好地迁移不同的场景对用户之间的朋友关系进行预测。通过对两个真实世界数据集的实验证明我们的方法始终优于现有的方法。此外，我们的模型对于具有少量签到数据和不具备碰面事件的关系也有效。The present invention provides a new social relationship inference model, which simultaneously considers mobility and sociality between individuals. Considering the effectiveness of social network structure for social connection prediction, the model first constructs a preliminary social graph based on the user's mobility, then extracts the social network structure features of user pairs from the preliminary constructed social graph, and finally integrates the mobility and social Friend relationship inferences were carried out based on the characteristics of two aspects of sex. Once our model is trained, it can better transfer different scenarios to predict friend relationships between users. Experiments on two real-world datasets demonstrate that our method consistently outperforms existing methods. Furthermore, our model is also effective for relations with little check-in data and no encounter events.

附图说明Description of drawings

图1为用户移动关系映射朋友关系的示意图；Fig. 1 is a schematic diagram of user mobile relationship mapping friend relationship;

图2为社交关系推断模型的系统结构图；Figure 2 is a system structure diagram of a social relationship inference model;

图3为交互行为特征抽取过程示意图；3 is a schematic diagram of an interactive behavior feature extraction process;

图4为k-可达子图编码过程的示意图；4 is a schematic diagram of a k-reachable subgraph encoding process;

图5(a)为Brightkite数据集中空间粒度大小对朋友关系推断准确性的影响结果图；Figure 5(a) is a graph showing the effect of spatial granularity on the accuracy of friend relationship inference in the Brightkite dataset;

图5(b)为Gawalla数据集中空间粒度大小对朋友关系推断准确性的影响结果图；Figure 5(b) is a graph showing the effect of spatial granularity on the accuracy of friend relationship inference in the Gawalla dataset;

图6(a)为Brightkite数据集中时间粒度对朋友关系推断准确性的影响结果图；Figure 6(a) is a graph showing the effect of time granularity on the accuracy of friend relationship inference in the Brightkite dataset;

图6(b)为Gawalla数据集中时间粒度对朋友关系推断准确性的影响结果图；Figure 6(b) is a graph showing the effect of time granularity on the accuracy of friend relationship inference in the Gawalla dataset;

图7(a)为Brightkite数据集中特征向量维度对朋友关系推断准确性的影响结果图；Figure 7(a) is a graph showing the effect of the feature vector dimension on the accuracy of friend relationship inference in the Brightkite dataset;

图7(b)为Gawalla数据集中特征向量维度对朋友关系推断准确性的影响结果图；Figure 7(b) is a graph showing the effect of the feature vector dimension on the accuracy of friend relationship inference in the Gawalla dataset;

图8(a)为Brightkite数据集中迭代次数对朋友关系推断准确性的影响结果图；Figure 8(a) is a graph showing the effect of the number of iterations on the accuracy of friend relationship inference in the Brightkite dataset;

图8(b)为Gawalla数据集中迭代次数对朋友关系推断准确性的影响结果图；Figure 8(b) is a graph showing the effect of the number of iterations on the accuracy of friend relationship inference in the Gawalla dataset;

图9(a)为Brightkite数据集中用户签到数对朋友关系推断准确性的影响结果图；Figure 9(a) is a graph showing the effect of the number of user check-ins on the accuracy of friend relationship inference in the Brightkite dataset;

图9(b)为Gawalla数据集中用户签到数对朋友关系推断准确性的影响结果图；Figure 9(b) is a graph showing the effect of the number of user check-ins on the accuracy of friend relationship inference in the Gawalla dataset;

图10(a)为Brightkite数据集中用户对共享位置数对朋友关系推断准确影响结果图；Figure 10(a) shows the result of the accurate influence of users on the number of shared locations on the inference of friend relationships in the Brightkite dataset;

图10(b)为Gawalla数据集中用户对共享位置数对朋友关系推断准确影响结果图。Figure 10(b) shows the result of the accurate influence of the number of shared locations on the friend relationship inference in the Gawalla dataset.

具体实施方式Detailed ways

下面结合具体实施例，进一步阐述本发明。应理解，这些实施例仅用于说明本发明而不用于限制本发明的范围。此外应理解，在阅读了本发明讲授的内容之后，本领域技术人员可以对本发明作各种改动或修改，这些等价形式同样落于本申请所附权利要求书所限定的范围。The present invention will be further described below in conjunction with specific embodiments. It should be understood that these examples are only used to illustrate the present invention and not to limit the scope of the present invention. In addition, it should be understood that after reading the content taught by the present invention, those skilled in the art can make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined by the appended claims of the present application.

本发明通过利用用户间移动轨迹的相似性和社交网络的传播性，推断用户间是否具有朋友关系，图1为对该现实问题的抽象。该推断模型分为两个阶段：1、构建初始社交网络图：基于朋友间的移动轨迹一般具有更高相似性这一观察，构建初始社交网络图。2、获得隐藏的社交关系：基于朋友间网络拓扑结构更相似的现象，挖掘具有相似喜好但不具备相似移动轨迹的隐藏朋友关系。The present invention infers whether there is a friend relationship between users by utilizing the similarity of movement trajectories between users and the dissemination of social networks. Figure 1 is an abstraction of this real problem. The inference model is divided into two stages: 1. Construction of an initial social network graph: Based on the observation that movement trajectories among friends generally have higher similarity, an initial social network graph is constructed. 2. Obtaining hidden social relationships: Based on the phenomenon that the network topology between friends is more similar, mining hidden friend relationships that have similar preferences but do not have similar moving trajectories.

图2展示了本发明提供的一种基于时空关系学习的移动社交网络用户关系推断方法的具体实现内容，主要包括以下两个阶段：FIG. 2 shows the specific implementation content of a method for inferring user relationships in a mobile social network based on spatiotemporal relationship learning provided by the present invention, which mainly includes the following two stages:

第一阶段：构建初步的社交图Stage 1: Building a preliminary social graph

该阶段，我们提取用户对之间的交互行为特征，利用该特征推断两个用户之间是否具有朋友关系。通常，交互行为特征可以通过用户对之间交互的统计属性来刻画，例如碰面次数、共同访问地点数。然而，朋友可能不存在共同访问事件或碰面事件，而且并不是所有的共同访问在推断用户的朋友关系方面都具有相同的重要性。因此，我们提出一种更全面的方法来学习复杂的交互行为特征。我们将两个用户的共同行为和个人行为均嵌入到交互行为特征中。此外，我们认为不同的地点和时段具有不同的预测重要性，并设计了一个混合模型来同时生成压缩的交互行为特征向量和初始分类结果，其中每个位置影响范围和时间间隔的均为模型的参数。该方法可以保证压缩后的交互行为特征尽可能地保留用户原有的动作特征，且该特征在推断朋友关系的任务中具有重要指示作用。第一阶段包括以下步骤：At this stage, we extract the interactive behavior features between user pairs, and use this feature to infer whether there is a friend relationship between two users. Usually, the characteristics of interaction behavior can be characterized by the statistical properties of the interaction between pairs of users, such as the number of encounters and the number of joint visits. However, there may be no common visit events or encounter events for friends, and not all common visits are of equal importance in inferring a user's friend relationship. Therefore, we propose a more comprehensive approach to learn complex interaction behavior features. We embed both the common and individual behaviors of the two users into the interaction behavior features. Furthermore, we argue that different locations and time periods have different predictive importance, and design a hybrid model to simultaneously generate compressed interaction behavior feature vectors and initial classification results, where each location's influence range and time interval are the model's parameter. This method can ensure that the compressed interactive behavior feature retains the user's original action feature as much as possible, and this feature has an important indicative role in the task of inferring friend relationships. The first stage includes the following steps:

步骤1、构建时空矩阵Step 1. Build the space-time matrix

图3展示了构建用户对时空矩阵的示意图。首先将移动数据所有用户签到涉及的兴趣点(POIs)按照经纬度划分到I×J的网格中，同时将时间划分为M个时间片段，构建一个I×J×M的时空矩阵(STD)。其中，时间维度上分为长度τ的等长时间片；空间维度上将空间均匀地划分为大小相等的网格，考虑到POIs的密度在地理区域内存在较大的差异，我们递归地将每个网格划分为四个相等的网格，直到POIs的数量在每个网格小于阈值σ。每个用户对(u_a,u_b)，其轨迹均可投影到STD中，用户的每一个签到都可以投影到一个特定的方格中，对于每个方格，我们计算三个指示因子，分别为该时间段内用户u_a访问过的POI数n_a，用户u_b访问过的POI数n_b以及用户u_a和用户u_b共同访问过的POI数n_a,b。由此获得用户对(u_a,u_b)的时空矩阵

式中三元组

表示第m时间段内用户u_a和u_b在第i行第j列的位置网格中移动信息的统计量。Figure 3 shows a schematic diagram of constructing a user pair spatiotemporal matrix. Firstly, the points of interest (POIs) involved in the check-in of all users of the mobile data are divided into an I×J grid according to the latitude and longitude, and the time is divided into M time segments to construct an I×J×M space-time matrix (STD). Among them, the time dimension is divided into equal time slices of length τ; the space dimension is evenly divided into grids of equal size. Considering the large difference in the density of POIs in geographical areas, we recursively divide each The grids are divided into four equal grids until the number of POIs in each grid is less than a threshold σ. For each user pair (u _a , u _b ), its trajectory can be projected into the STD, and each user's check-in can be projected into a specific square, for each square, we calculate three indicator factors, are the number n _a of POIs visited by user u _a , the number n _b of POIs visited by user u _b , and the number n _a,b _of POIs visited jointly by user u _a and user ub in the time period, respectively. From this, the space-time matrix of the user pair (u _a , u _b ) is obtained

The triples in the formula

It represents the statistics of the movement information of users u _a and u _b in the position grid of the i-th row and the j-th column during the m-th time period.

步骤2、时空矩阵的编码Step 2. Encoding of space-time matrix

该步骤将每对用户对(u_a,u_b)的时空矩阵O_(a,b)编码成一个低维向量，并利用该向量计算用户u_a和用户u_b存在朋友关系的概率，其中时空矩阵的大小可以通过参数σ和τ来调整。This step encodes the space-time matrix O _(a,b) _of each pair of users (u _a , ub ) into a low-dimensional vector, and uses this vector to calculate the probability that user u _a and user _ub have a friend relationship, where space-time The size of the matrix can be adjusted by the parameters σ and τ.

我们将时空矩阵输入一个自动编码器，具有R个隐藏层的编码器将其编码成d维的向量，即编码器的第R层输出结果。具有R个隐藏层的解码器根据d维向量输出重构的时空矩阵

使其与编码器的输入O_(a,b)接近。该训练过程的优化目标为：We feed the spatiotemporal matrix into an autoencoder, and the encoder with R hidden layers encodes it into a d-dimensional vector, the output of the Rth layer of the encoder. The decoder with R hidden layers outputs the reconstructed spatiotemporal matrix from the d-dimensional vector

Make it close to the input O _(a,b) of the encoder. The optimization objective of this training process is:

式中，

与编码前的时空矩阵O_(a,b)相同，U表示训练样本中的所有用户。In the formula,

Same as the spatiotemporal matrix O _(a,b) before encoding, U represents all users in the training sample.

自动编码器的训练是无监督的，这意味着它只学习输入结构，而不具备特征选择或倾向的功能，因此最终获得的d维特征向量仅仅保留用户对原始的移动信息，不能保证在朋友关系推断任务中具有效用。为了避免这个问题，我们采用有监督训练实现编码过程的重建和区别，即我们为自动编码器添加一个分类网络(输出

表示该对用户对(u_a,u_b)具有朋友关系的概率)来监控其编码过程。该过程的损失函数

为：The training of the autoencoder is unsupervised, which means that it only learns the input structure and does not have the function of feature selection or inclination, so the finally obtained d-dimensional feature vector only retains the user's original movement information, which cannot be guaranteed in friends. useful in relational inference tasks. To avoid this problem, we use supervised training to reconstruct and differentiate the encoding process, i.e. we add a classification network to the autoencoder (output

represents the probability that the pair of users (u _a , u _b ) has a friend relationship) to monitor its encoding process. The loss function of the process

for:

式中，

为了获取更具有识别力的向量表示，我们对整合混合网络作出以下约束：To obtain a more discriminative vector representation, we impose the following constraints on the ensemble hybrid network:

式中，

表示混合网络的综合损失函数。In the formula,

Represents the synthetic loss function of the hybrid network.

第二阶段：获得隐藏的社交关系Stage 2: Gaining Hidden Social Relationships

在第二阶段，我们考虑用户对间的非直接链路特征，即网络结构特征。由于移动数据不具备社交拓扑图，我们根据第一阶段的预测结果(一个半真实的社会图)提取结构特征。考虑到简单的启发式结构特征不具备普遍适用性，我们提出了一种新的描述目标用户间网络结构特征——K可达子图，并对该子图编码得到结构特征向量。最后将编码后的结构向量与相应的交互特征相结合，提高推断的准确性。此外，我们设计了一个迭代过程，可以同时细化结构特征向量和得到的最终的社交关系图。In the second stage, we consider the non-direct link features between user pairs, i.e. network structure features. Since the mobile data does not have a social topology map, we extract structural features based on the prediction results of the first stage (a semi-real social map). Considering that simple heuristic structural features are not universally applicable, we propose a new feature to describe the network structure among target users—K reachable subgraph, and encode the subgraph to obtain a structural feature vector. Finally, the encoded structure vector is combined with the corresponding interaction features to improve the accuracy of inference. Furthermore, we design an iterative process that simultaneously refines the structural feature vectors and the resulting final social relationship graph.

第二阶段包括以下步骤：The second stage includes the following steps:

步骤3、抽取局部图结构——K可达子图Step 3. Extract the local graph structure - K reachable subgraph

Katz是常用的用于测量用户之间的网络接近度，它考虑用户对在社交图中的全局视图，即考虑用户对之间的所有可能路径。然而，我们认为这是不合理的，因为较长的路径不仅计算复杂还对关系推断产生负面影响。因此，我们在网络接近度上采用局部视点，即为每个用户提取一个k可达的子图来描述用户之间的网络结构。对于给定的初始的社交关系图G＝(U,E)，我们定义提取用户对(u_a,u_b)的k-可达子图

的步骤如下：Katz is commonly used to measure the network proximity between users, which considers the global view of user pairs in the social graph, that is, considers all possible paths between user pairs. However, we think this is unreasonable because longer paths are not only computationally complex but also negatively impact relational inference. Therefore, we adopt a local viewpoint on network proximity, that is, extract a k-reachable subgraph for each user to describe the network structure between users. For a given initial social relationship graph G=(U, E), we define a k-reachable subgraph that extracts user pairs (u _a , u _b )

The steps are as follows:

步骤301、设置路径长度为2，将

初始化为空图；Step 301, set the path length to 2, set the

initialized to an empty map;

步骤302、在第一阶段构建的初始的社交关系图G中找出(u_a,u_b)之间所有长度为2条的路径，并将找到的所有路径表示为

然后删除社交关系图G中和

中均出现的点和边删除(除了用户u_a和用户u_b本身)。Step 302: Find out all paths with a length of 2 between (u _a , u _b ) in the initial social relationship graph G constructed in the first stage, and express all the paths found as

Then delete the social graph G and neutralize

The vertices and edges that appear in all are deleted (except for user u _a and user u _b itself).

步骤303、逐步增加路径长度，重复步骤302，直到路径长度超过k。Step 303 , gradually increase the path length, and repeat step 302 until the path length exceeds k.

步骤4、迭代更新预测社交图Step 4. Iteratively update the predicted social graph

根据第一阶段构建的初始的社交关系图G，我们对于每对用户对(u_a,u_b)的k-可达子图

进行编码。基于累加的原则对相同长度路径的编码，对不同长度的路径的编码结果进行拼接，实现对用户对k-可达子图的向量化。图4详细地说明了K可达子图编码过程。According to the initial social relationship graph G constructed in the first stage, we have k-reachable subgraphs for each user pair (u _a , u _b )

to encode. Based on the principle of accumulation, the coding results of paths of the same length are spliced, and the vectorization of the k-reachable subgraph by the user is realized. Figure 4 illustrates the K-reachable subgraph encoding process in detail.

利用分类器根据用户对的综合特征向量(综合第一阶段获得的交互行为特征和第二阶段获得的局部结构特征向量)进行0/1分类(其中1是朋友，0为不是朋友)，获得最新的预测社交图。Use the classifier to perform 0/1 classification (where 1 is a friend, 0 is not a friend) according to the comprehensive feature vector of the user pair (synthesizing the interactive behavior features obtained in the first stage and the local structure feature vector obtained in the second stage), and obtain the latest The predicted social graph of .

利用最新的社交图，更新用户对的结构特征，进而重新进行预测，直到预测结果不在发生变化，即获得最终预测的社交网络图。Using the latest social graph, update the structural features of the user pair, and then re-predict until the prediction result does not change, that is, the final predicted social network graph is obtained.

本发明的验证过程如下：The verification process of the present invention is as follows:

实验采用Brightkite和Gowalla两大公开的真实数据，这两个数据集均是基于位置的社交网络，用户可以通过它发布自己的时间-位置信息，分享给自己的好友。这类数据集分为签到数据集和用户的社交关系图两个部分，表1展示了实验数据集的统计信息。The experiment uses two public real data, Brightkite and Gowalla, both of which are location-based social networks, through which users can publish their time-location information and share them with their friends. This kind of dataset is divided into two parts: the check-in dataset and the user's social relationship graph. Table 1 shows the statistical information of the experimental dataset.

表1.数据集统计Table 1. Dataset Statistics

实验与基于移动距离、共享位置数、随机游走和图嵌入四种流行的推断方法进行对比。The experiments are compared with four popular inference methods based on moving distance, number of shared locations, random walks, and graph embeddings.

·基于共享位置数的关系推断：该方法提取并利用一对用户访问过的共地点的数量作为特征来推断用户对之间是否存在朋友关系。实验采用svm分类器进行有监督学习。• Relationship inference based on the number of shared locations: This method extracts and utilizes the number of shared locations visited by a pair of users as a feature to infer whether there is a friend relationship between a pair of users. The experiment adopts svm classifier for supervised learning.

·基于移动距离的关系推断。该方法根据每个用户的全部移动信息计算其活动的中心位置，将每对用户其中心位置的欧式距离作为特征推断他们是否具备朋友关系。实验采用svm分类器进行有监督学习。• Relationship inference based on travel distance. The method calculates the center position of each user's activity according to all the mobile information, and uses the Euclidean distance of the center position of each pair of users as a feature to infer whether they have a friend relationship. The experiment adopts svm classifier for supervised learning.

·基于随机游走的关系推断。该方法将用户的移动数据映射成一个用户-位置二部图，根据一定的规则为每个用户在二部图中游走出其邻居，并将这些邻居信息压缩成向量，最后通过对比用户对之间特征向量的相似程度进行朋友关系推断。• Relational inference based on random walks. This method maps the user's mobile data into a user-location bipartite graph, walks out its neighbors in the bipartite graph for each user according to certain rules, compresses the neighbor information into vectors, and finally compares the user pairs between The similarity of feature vectors is used for friend relationship inference.

·基于图嵌入的关系推断。该方法根据用户的移动信息构建碰头图，即两个用户共同出现过(在一定的时间间隔内出现在相同的位置)即有边，在碰头图上为每个用户游走出一定数量的朋友，并将这些朋友节点编码成向量，最后通过对比用户对之间特征向量的相似程度进行朋友关系推断。• Relational inference based on graph embeddings. This method constructs a huddle graph according to the user's mobile information, that is, two users have appeared together (appearing in the same position within a certain time interval), that is, there is an edge, and each user walks out a certain number of friends on the huuuuuuuuuu. , and encode these friend nodes into vectors, and finally infer the friend relationship by comparing the similarity of feature vectors between user pairs.

实验中，我们使用f1-score作为评价指标，该指标对类的分布不敏感，即使类分布不平衡时，也可以较为准确地刻画推断结果的准确性。其定义为：In the experiment, we use the f1-score as the evaluation index, which is not sensitive to the distribution of the class, and can more accurately describe the accuracy of the inference results even when the class distribution is unbalanced. It is defined as:

f1-score＝(2×精确率×召回率)÷(精确率+召回率)f1-score=(2×precision rate×recall rate)÷(precision rate+recall rate)

其中精度精确率是真阳性的数量除以预测结果为阳性的数量，而召回率(又称真阳性率)是真阳性的数量除以标签为阳性的数量。f1-score的取值在0到1之间，取值越大，说明推断的准确性越高。where precision precision is the number of true positives divided by the number of predicted positives, and recall (aka true positive rate) is the number of true positives divided by the number of label positives. The value of f1-score is between 0 and 1. The larger the value, the higher the accuracy of the inference.

在我们的实验中，我们采用全连通的自编码器网络，网络结构中每一层的节点数是前一层节点数的一半，其中最后一层为用户自主设置。另一方面，在解码器的结构中，我们使用与编码器相同的反向网络结构。实验根据输入编码器的时空矩阵的大小，调整其层数。我们分别使用一个简单的KNN和SVM作为阶段1中的分类器和阶段2中的二元分类器，使用RBF作为支持向量机的核函数。所有网络的学习速率设置为0.005，迭代终止的条件为两轮预测的社交网络图边的变化小于1％。In our experiments, we use a fully connected autoencoder network, the number of nodes in each layer of the network structure is half of the number of nodes in the previous layer, and the last layer is set by the user. On the other hand, in the structure of the decoder, we use the same inverse network structure as the encoder. The experiment adjusts the number of layers according to the size of the spatiotemporal matrix of the input encoder. We use a simple KNN and SVM as the classifier in stage 1 and binary classifier in stage 2, respectively, and use RBF as the kernel function of the support vector machine. The learning rate of all networks is set to 0.005, and the iterations are terminated when the change of the social network graph edges between two rounds of prediction is less than 1%.

1)空间粒度σ的影响1) The influence of spatial granularity σ

我们实验了σ为500、750、1000、1250和1500五种情况下，推断结果的准确性。如图5(a)、(b)所示，在Brightkite数据集(图5(a))中，当σ增加从500到1000时，其f1-score增加了4.56％,然后随着σ的增加准确性逐渐下降。类似的趋势为Gawalla数据集(图5(b))，当σ＝750时，f1-score取得最大值，这是由于Gawalla数据集中的POIs比Brightkite数据集中的POIs更分散，这意味着Gawalla中一个网格所覆盖的地理区域比Brightkite要大,所以Gawalla数据集的最佳σ值小于Brightkite数据。We experimented the accuracy of the inference results when σ was 500, 750, 1000, 1250 and 1500. As shown in Fig. 5(a), (b), in the Brightkite dataset (Fig. 5(a)), when σ increases from 500 to 1000, its f1-score increases by 4.56%, and then as σ increases Accuracy gradually decreases. A similar trend is observed for the Gawalla dataset (Fig. 5(b)), when σ = 750, the f1-score achieves the maximum value, which is due to the fact that the POIs in the Gawalla dataset are more scattered than those in the Brightkite dataset, which means that in Gawalla A grid covers a larger geographic area than Brightkite, so the optimal σ value for Gawalla dataset is smaller than Brightkite dataset.

2)时间粒度τ的影响2) The influence of time granularity τ

我们实验了τ为1天、7天、14天、30天和60天五种情况下，推断结果的准确性。如图6(a)、(b)所示，Brightkite数据集(图6(a)),当τ＝7天时，F1-Score为最大值。类似的实验结果在Gawalla数据集(图6(b))也得到了印证。这是由于人类的活动往往表现出周期性，一般以每周为基础。We experimented the accuracy of the inference results when τ was 1 day, 7 days, 14 days, 30 days and 60 days. As shown in Fig. 6(a), (b), the Brightkite dataset (Fig. 6(a)), when τ = 7 days, the F1-Score is the maximum value. Similar experimental results are also confirmed in the Gawalla dataset (Fig. 6(b)). This is due to the fact that human activity tends to exhibit periodicity, generally on a weekly basis.

3)编码维度d的影响3) The influence of encoding dimension d

我们实验了d为16、32、64、128和256五种情况下，推断结果的准确性。如图7(a)、(b)所示，Brightkite数据集(图7(a))和Gawalla数据集(图7(b))的准确性变化具有相同的趋势，这是由于时空关系向量的维数越高，所包含的信息就越多，从而使得攻击性能越好；但高维时空关系向量也会产生过多的噪声，降低了攻击的准确性。We experimented the accuracy of the inference results when d was 16, 32, 64, 128 and 256. As shown in Fig. 7(a), (b), the accuracy changes of Brightkite dataset (Fig. 7(a)) and Gawalla dataset (Fig. 7(b)) have the same trend, which is due to the variation of the spatiotemporal relationship vector The higher the dimension, the more information it contains, which makes the attack performance better; however, the high-dimensional spatiotemporal relationship vector will also generate too much noise, which reduces the accuracy of the attack.

4)迭代次数的影响4) Influence of the number of iterations

根据以上对参数的探究，我们使用每个参数的最佳值对两个数据集进行实验。Based on the above exploration of parameters, we conduct experiments on two datasets using the best value of each parameter.

图8(a)、(b)描述了Brightkite数据集(图8(a))和Gawalla数据集(图8(b))，迭代的次数对我们推断准确性的影响。可以看出随着迭代次数的提高，推断准确性保持着提高，在Gowalla和Brightkite中，满足终止条件所需的迭代次数分别为4和5。Figure 8(a), (b) depict the Brightkite dataset (Fig. 8(a)) and Gawalla dataset (Fig. 8(b)), the effect of the number of iterations on our inference accuracy. It can be seen that the inference accuracy keeps improving as the number of iterations increases, in Gowalla and Brightkite the number of iterations required to satisfy the termination condition is 4 and 5, respectively.

5)用户签到数的影响5) The impact of the number of user check-ins

图9(a)、(b)绘制了用户对共享位置数对推断结果准确性的影响结果。可以看出，随着用户对共享位置数的增加，推断结果的准确性也不断提升。实验结果同时表明我们的推断模型在Brightkite数据集(图9(a))和Gawalla数据集(图9(b))都比基于特征抽取的方法(基于移动距离的关系推断)在f1-score这一指标上高出了10％左右。Figure 9(a), (b) plots the effect of users on the number of shared locations on the accuracy of inference results. It can be seen that with the increase of the number of shared locations of users, the accuracy of the inference results also continues to improve. The experimental results also show that our inference model in both Brightkite dataset (Fig. 9(a)) and Gawalla dataset (Fig. 9(b)) outperforms the feature extraction-based method (relation inference based on moving distance) in the f1-score. One indicator is about 10% higher.

6)用户对共享位置数的影响6) The impact of users on the number of shared locations

图10(a)、(b)绘制了用户签到数对推断结果准准确性的影响结果。Brightkite数据集(图10(a))和Gawalla数据集(图10(b))的实验结果表明，我们的方法对于不同签到数的用户都具有一定的鲁棒性。显然，用户签到数越多，用户行为模式的建模就越精确，用户关系推断就越准确。Figure 10(a) and (b) plot the effect of the number of user check-ins on the quasi-accuracy of the inference results. Experimental results on Brightkite dataset (Fig. 10(a)) and Gawalla dataset (Fig. 10(b)) show that our method is robust to users with different check-in numbers. Obviously, the more users check in, the more accurate the modeling of user behavior patterns and the more accurate the user relationship inference will be.

Claims

1. a mobile social network user relationship inference method based on spatiotemporal relationship learning, is characterized in that, comprises the following steps:

Step 1. Extract the interactive behavior feature between pairs of users, and use the feature to infer whether there is a friend relationship between the two users, including the following steps:

Step 101: Divide the POIs involved in the check-in of all users of the mobile data into an I×J grid according to the latitude and longitude, and at the same time divide the time into M time segments to construct an I×J×M space-time matrix STD, wherein, The time dimension is divided into equal time slices of length τ, the space dimension is evenly divided into grids of equal size, and each grid is recursively divided into four equal grids until POIs of interest points The number of in each grid is less than the threshold σ;

Step 102: Project the trajectory of each user pair (u _a , u _b ) into the space-time matrix STD, and each user's check-in can be projected into a specific square. For each square, calculate: the The number of interest points _na visited by the user _u _a in the time period; the number of interest points n _b _visited _by the user _ub _a , u _b ) space-time matrix

The triples in the formula

Step 103: Encode the space-time matrix O _(a,b) of each pair of users (u _a , u _b ) into a low-dimensional vector, and use the low-dimensional vector to calculate the probability that user u _a and user _ub have a friend relationship , obtain the initial social relationship graph G=(U, E), U is the set of vertices in the graph, representing all users with mobile information; E is the set of edges, representing the friendship between two users, where, The size of the space-time matrix O _(a,b) is adjusted by the parameters σ and τ;

Step 2. Extract a k-reachable subgraph for each user to describe the network structure between users. For a given initial social relationship graph G=(U, E), define the extracted user pair (u _a , u The k-reachable subgraph of _b )

The steps are as follows:

Step 201, set the path length to 2, set the

initialized to an empty map;

Step 202: Find all paths with lengths of 2 between (u _a , u _b ) in the initial social relationship graph G, and express all the paths found as

Then delete the social graph G and neutralize

Step 203: Gradually increase the path length, and repeat step 202 until the path length exceeds k;

Step 3. According to the initial social relationship graph G, for each pair of users (u _a , u _b ) k-reachable subgraph

Step 4. Use the classifier to perform 0/1 classification according to the comprehensive feature vector of the user pair, where 1 is a friend and 0 is not a friend, and the latest predicted social graph is obtained.

Step 5: Using the latest social graph, update the structural features of the user pair, and then re-predict until the prediction result does not change, that is, the final predicted social network graph is obtained.

2. a kind of mobile social network user relationship inference method based on space-time relationship learning as claimed in claim 1, is characterized in that, in step 103, the space-time matrix O _{(a, b) is} inputted into an automatic with R hidden layers. The encoder, the autoencoder encodes it into a d-dimensional vector, obtaining the reconstructed spatiotemporal matrix

In the formula,

The same as the space-time matrix O _(a,b) before encoding, U represents all users in the training sample;

The training of the auto-encoder uses supervised training to reconstruct and distinguish the encoding process, that is, add a classification network to the auto-encoder to monitor its encoding process, and the loss function of the process

for:

In the formula,

represents the prediction result, that is, the output result of the classification network; y represents the sample label; n represents the number of training samples, that is, the number of user pairs involved in the training data set. .

In order to obtain a more discriminative vector representation, the following constraints are imposed on the integrated hybrid network:

In the formula,

Represents the synthetic loss function of the hybrid network.

Once trained, the encoder is taken from the auto-encoder network for encoding and preliminary relation inference for any pair of user spatiotemporal relation matrices in the user set.