CN115277156B

CN115277156B - User identity privacy protection method for resisting neighbor attack in social network

Info

Publication number: CN115277156B
Application number: CN202210867729.3A
Authority: CN
Inventors: 许力; 章红艳; 许佳钰; 李啸林; 周赵斌
Original assignee: Fujian Normal University
Current assignee: Fujian Normal University
Priority date: 2022-07-22
Filing date: 2022-07-22
Publication date: 2023-05-23
Anticipated expiration: 2042-07-22
Also published as: CN115277156A

Abstract

The invention relates to a user identity privacy protection method for resisting neighbor attack in a social network, which comprises the following steps: when graph data of the social network is subject to 1-neighbor attack, the protection of privacy information of the user privacy identity is realized by adopting a graph modification technology; modifying the 1-neighbor graphs in the same cluster according to the graph editing distance to make the graphs indistinguishable in probability; the usability of the graph data is improved while the privacy protection of the user identity in the social network is realized.

Description

A user identity privacy protection method against neighbor attacks in social networks

技术领域Technical Field

本发明涉及社会网络隐私保护领域，具体涉及一种社交网络中抵抗邻居攻击的用户身份隐私保护方法。The present invention relates to the field of social network privacy protection, and in particular to a method for protecting user identity privacy in a social network against neighbor attacks.

背景技术Background Art

社交网络中用户填写姓名、职业、电话号码、电子邮件、身份证号码等信息，保存在数据库中，然而，这些数据中除了个人的信息外，还体现了一定的社会关系。这些数据中包含了很多用户的隐私信息，因此，在社交网络数据发布前必须使用匿名技术保护用户隐私。In social networks, users fill in their names, occupations, phone numbers, email addresses, ID numbers and other information, which are stored in the database. However, in addition to personal information, these data also reflect certain social relationships. These data contain a lot of user privacy information, so anonymization technology must be used to protect user privacy before social network data is released.

朴素用户隐私保护方法是移除用户的身份、属性等，但Backstrom等指出这种朴素隐私保护技术在面对1*-邻居攻击时能够重新识别出用户的身份，不能很好地保护用户的隐私。图结构修改能够有效地保护用户的隐私，其通过在数据发布前的原始图中，添加或删除节点和边的方法改变图的结构，在修改后的图(称为匿名图)中达到用户身份隐私或属性隐私的目的。使用图修改技术对用户隐私保护，首先对节点划分，划分精确度直接影响图信息损失量可能导致图数据可用性降低，必须寻求更精确的划分标准。The naive user privacy protection method is to remove the user's identity and attributes, but Backstrom et al. pointed out that this naive privacy protection technology can re-identify the user's identity when facing a 1*-neighbor attack, and cannot protect the user's privacy well. Graph structure modification can effectively protect user privacy. It changes the structure of the graph by adding or deleting nodes and edges in the original graph before data is released, and achieves the purpose of user identity privacy or attribute privacy in the modified graph (called anonymous graph). When using graph modification technology to protect user privacy, first divide the nodes. The division accuracy directly affects the amount of graph information loss and may reduce the availability of graph data. It is necessary to seek more accurate division criteria.

发明内容Summary of the invention

有鉴于此，本发明的目的在于提供一种社交网络中抵抗邻居攻击的用户身份隐私保护方法，通过对图结构的修改使得修改后的匿名图达到k-匿名，从而能够有效提保护用户的身份隐私。In view of this, the purpose of the present invention is to provide a user identity privacy protection method in a social network against neighbor attacks, by modifying the graph structure so that the modified anonymous graph achieves k-anonymity, thereby effectively protecting the user's identity privacy.

为实现上述目的，本发明采用如下技术方案：To achieve the above object, the present invention adopts the following technical solution:

一种社交网络中抵抗邻居攻击的用户身份隐私保护方法，包括以下步骤：A method for protecting user identity privacy in a social network against neighbor attacks comprises the following steps:

步骤1)建立社交网络模型，将其表示为图G＝(V,E)，其中V是图的顶点集，表示社交网络中的用户；E是边集，表示社交网络中的用户之间的关系；Step 1) Establish a social network model and represent it as a graph G = (V, E), where V is the vertex set of the graph, representing the users in the social network; E is the edge set, representing the relationship between users in the social network;

根据度量d(v),lc(v)将用户节点初划分成T个簇，其中d(v)表示用户节点v 的度，其含义为社交网络中与该用户具有联系的用户数量；lc(v)表示用户节点v在网络中的局部聚类系数，其含义为节点v的邻居之间联系的紧密程度，在划分结束后，将这些簇按照每个簇的最大节点度降序排列；The user nodes are initially divided into T clusters according to the metrics d(v) and lc(v), where d(v) represents the degree of the user node v, which means the number of users connected to the user in the social network; lc(v) represents the local clustering coefficient of the user node v in the network, which means the closeness of the connection between the neighbors of the node v. After the division is completed, these clusters are arranged in descending order according to the maximum node degree of each cluster;

步骤2)预设一个用户隐私需求阈值k，若某个簇C_i中用户数少于阈值k，则计算该簇的平均度与相邻的前后两簇C_i-1,C_i+1的平均度的差值，将该簇合并到差值小的簇中，重复该过程直到所有簇的中用户的数量都大于k；Step 2) Preset a user privacy requirement threshold k. If the number of users in a cluster _Ci is less than the threshold k, calculate the difference between the average degree of the cluster and the average degrees of the two adjacent clusters Ci _-1 and Ci ₊₁ , merge the cluster into the cluster with the smaller difference, and repeat the process until the number of users in all clusters is greater than k;

步骤3)，在簇合并完成后，针对用户节点个数大于2k的簇，对其进行簇分裂操作使得每一个簇中用户的数量为[k,2k)的某个取值；具体为：Step 3), after the cluster merging is completed, for clusters with more than 2k user nodes, a cluster splitting operation is performed so that the number of users in each cluster is a value of [k, 2k); specifically:

S3-1，对于每个簇中的用户节点，按度数降序排序，构建用户节点的1*-邻居图；S3-1, for each user node in each cluster, sort them in descending order of degree and construct the 1*-neighborhood graph of the user node;

S3-2，构造用户节点的1*-邻居结构特征矩阵

其中

分别表示用户的节点v在社交网络中的度分布、内度分布、外度分布及间隙度分布；S3-2, construct the 1*-neighborhood structure feature matrix of the user node

in

They represent the degree distribution, inner degree distribution, outer degree distribution and gap degree distribution of the user's node v in the social network respectively;

S3-3，根据公式

计算同一簇中任意两个节点之间的结构相似度，其中

分别表示用户节点度分布、内度分布、外度分布及间隙度分布的不相关程度,k₁、k₂、k₃、k₄分别表示各个相似度所占比重，且满足k₁+k₂+k₃+k₄＝1；S3-3, according to the formula

Calculate the structural similarity between any two nodes in the same cluster, where

They represent the irrelevance of user node degree distribution, inner degree distribution, outer degree distribution and gap degree distribution respectively. k ₁ , k ₂ , k ₃ , k ₄ represent the proportion of each similarity respectively and satisfy k ₁ +k ₂ +k ₃ +k ₄ =1.

S3-4，利用K-means聚类算法将节点划分为T个簇；S3-4, use K-means clustering algorithm to divide the nodes into T clusters;

步骤4)，根据每个簇中用户节点的1*-邻居图计算用户每对节点间的相似度，并据此构造出一个带权二部图，在二部图上计算出图编辑距离，据此找到目标图编辑路径P；Step 4), calculate the similarity between each pair of user nodes according to the 1*-neighborhood graph of the user nodes in each cluster, and construct a weighted bipartite graph based on it, calculate the graph edit distance on the bipartite graph, and find the target graph edit path P based on it;

步骤5)，根据步骤4)找到的图编辑路径P，修改簇中节点的1*-邻居图，使得他们同构。Step 5), based on the graph editing path P found in step 4), modify the 1*-neighbor graphs of the nodes in the cluster so that they are isomorphic.

2.根据权利要求1所述的一种抵抗1*-邻居攻击的用户身份隐私保护方法，其特征在于：所述1*-邻居图为原始图G的一个子图，定义为：2. A user identity privacy protection method for resisting 1*-neighborhood attack according to claim 1, characterized in that: the 1*-neighborhood graph is a subgraph of the original graph G, defined as:

G(v)＝(V(v),E(v),D(v))G(v)＝(V(v),E(v),D(v))

其中V(v)是包括用户节点v本身及其邻居节点的集合，E(v)是V(v)中节点的边即邻居之间的关系，D(v)是节点v的邻居在社交网络中邻居的数量构成的集合即V(v)中所有节点的度构成的集合。Where V(v) is the set including the user node v itself and its neighbor nodes, E(v) is the edge of the nodes in V(v), that is, the relationship between neighbors, and D(v) is the set consisting of the number of neighbors of node v in the social network, that is, the set consisting of the degrees of all nodes in V(v).

进一步的，所述步骤2)具体为：Furthermore, the step 2) is specifically as follows:

S2-1，对于节点数小于k的簇，将其记为

其中上标1表示该簇是第一次划分后得到的结果，其簇内节点的平均度记为

计算

其前后相邻的两个簇

的节点平均度，分别记为

S2-1, for clusters with less than k nodes, record them as

The superscript 1 indicates that the cluster is the result of the first partition, and the average degree of the nodes in the cluster is recorded as

calculate

The two adjacent clusters

The average node degree of

S2-2，若

满足公式

则将

添加到

中，否则将

添加到

中；S2-2, if

Satisfy the formula

Then

Add to

Otherwise,

Add to

middle;

S2-3，重复执行上述步骤，直到所有的簇中的节点数都超过k。S2-3, repeat the above steps until the number of nodes in all clusters exceeds k.

进一步的，所述步骤4)具体为：Furthermore, the step 4) is specifically as follows:

S4-1，如果两个用户节点的l*-邻居图中邻居节点数不相等，则在用户邻居节点数少的图中添加用户节点使得两个图中节点数相等；S4-1, if the number of neighbor nodes in the l*-neighborhood graph of two user nodes is not equal, then add the user node to the graph with fewer user neighbor nodes so that the number of nodes in the two graphs is equal;

S4-2，构造用户节点的匹配代价矩阵，并以用户节点的匹配代价作为边权值构造一个带权二部图；S4-2, construct the matching cost matrix of the user node, and construct a weighted bipartite graph using the matching cost of the user node as the edge weight;

S4-3，利用二部图计算用户节点间的图编辑距离以得到匹配的节点以及图编辑路径。S4-3, using a bipartite graph to calculate the graph edit distance between user nodes to obtain matching nodes and graph edit paths.

进一步的，所述步骤5)具体为：Further, the step 5) is specifically as follows:

S5-1，构造图G的邻接矩阵记为A＝(a_ij)_n×n，其中当节点v_i和v_j间存在边时， a_ij＝1，否则，a_ij＝0；S5-1, construct the adjacency matrix of graph G, denoted as A = (a _ij ) _{n × n} , where a _ij = 1 when there is an edge between nodes v _i and v _j , otherwise, a _ij = 0;

S5-2，计算A²及A³，

及

若

则令

若

则令

计算

S5-2, calculate A ² and A ³ ,

and

like

Then

like

Then

calculate

S5-3，对于社交网络中每个用户节点v，根据S4-3计算得到的匹配节点u，计算出节点v需要修改的度并记为

将社交网络中每个用户节点需修改的度按照降序排列，得到的度修改序列记为

其中， d_v表示用户节点v的邻居个数；S5-3, for each user node v in the social network, according to the matching node u calculated in S4-3, calculate the degree of node v that needs to be modified and record it as

Arrange the degree of each user node in the social network that needs to be modified in descending order, and the degree modification sequence is recorded as

Among them, d _v represents the number of neighbors of user node v;

S5-4，按照D^M修改图结构。S5-4, modify the graph structure according to D ^M.

进一步的，所述S3-2具体为：Furthermore, the S3-2 is specifically:

S3-2-1，计算用户节点v的1*-邻居图G(v)中邻居节点的度分布

是用户节点v_i的度，表示v_i在原始图G中邻居的个数，

N(v_i)为用户节点v所有邻居的集合；S3-2-1, calculate the degree distribution of neighbor nodes in the 1*-neighborhood graph G(v) of user node v

is the degree of user node _vi , indicating the number of neighbors of _vi in the original graph G,

N(v _i ) is the set of all neighbors of user node v;

S3-2-2，计算用户节点v的1*-邻居图G(v)中邻居节点的内度分布

是用户节点的内度，表示用户节点v_i在1*-邻居图G(v)中邻居的个数，

S3-2-2, calculate the inner degree distribution of neighbor nodes in the 1*-neighborhood graph G(v) of user node v

is the inner degree of the user node, which indicates the number of neighbors of the user node _vi in the 1*-neighborhood graph G(v),

步骤3-2-3，计算用户节点v的1*-邻居图G(v)中邻居节点的出度分布

是v_i的出度，表示用户节点v_i在1*-邻居图G(v)之外邻居的个数，

Step 3-2-3, calculate the out-degree distribution of neighbor nodes in the 1*-neighborhood graph G(v) of user node v

is the out-degree of _vi , indicating the number of neighbors of user node _vi outside the 1*-neighborhood graph G(v),

步骤3-2-4，计算用户节点v的1*-邻居图G(v)中邻居节点的间隙度分布

其中

Step 3-2-4, calculate the gap degree distribution of neighbor nodes in the 1*-neighborhood graph G(v) of user node v

in

S3-2-5，社交网络中每个用户节点的特征矩阵记为

S3-2-5, the feature matrix of each user node in the social network is recorded as

为用户节点v在社交网络中邻居的个数。is the number of neighbors of user node v in the social network.

进一步的，所述S3-3具体为：Furthermore, the S3-3 is specifically:

S3-3-1，对于同一簇中的用户节点v及u，分别利用JS散度计算他们的度分布、内度分布、出度分布、间隙度分布的不相关程度，分别记为：

所述JS散度定义为：S3-3-1, for user nodes v and u in the same cluster, use JS divergence to calculate the degree of irrelevance of their degree distribution, inner degree distribution, out-degree distribution, and gap degree distribution, respectively, which are recorded as:

The JS divergence is defined as:

其中P＝{p₁,p₂,…,p_t}，Q＝{q₁,q₂,…,q_t}分别为同一概率空间中的两个概率分布，

Where P = {p ₁ ,p ₂ ,…,p _t }, Q = {q ₁ ,q ₂ ,…,q _t } are two probability distributions in the same probability space.

S3-3-2，计算用户节点v及u的相似度向量

则用户节点u和v的相似度为

k₁+k₂+k₃+ k₄＝1。S3-3-2, calculate the similarity vector of user nodes v and u

Then the similarity between user nodes u and v is

k ₁ +k ₂ +k ₃ + k ₄ =1.

进一步的，所述S4-2具体为：Furthermore, the S4-2 is specifically:

S4-2-1，对于同一簇中的任意一对顶点v和u，G(v)＝(V₁,E₁)和G(u)＝ (V₂,E₂)分别是它们的1*-邻居图，对于任意节点v_i∈G(v)，计算其与G(u)中所有节点的匹配代价

S4-2-1, for any pair of vertices v and u in the same cluster, G(v) = (V ₁ , E ₁ ) and G(u) = (V ₂ , E ₂ ) are their 1*-neighborhood graphs respectively. For any node _vi ∈ G(v), calculate its matching cost with all nodes in G(u)

S4-2-2，构造所述代价矩阵

S4-2-2, construct the cost matrix

S4-2-3，构造带权二部图

V₁、V₂分别为顶点集且两者中节点数量相等，记为x，

为边集，

为边权值矩阵，w_ij＝c_ij。S4-2-3, construct a weighted bipartite graph

V ₁ and V ₂ are vertex sets with the same number of nodes, denoted by x.

For edge sets,

is the edge weight matrix, w _ij = c _ij .

进一步的，所述S4-3具体为：Furthermore, the S4-3 is specifically:

S4-3-1，选择最大度的节点作为匹配种子节点对；S4-3-1, select the node with the maximum degree as the matching seed node pair;

S4-3-2，利用蒙特卡洛方法求二部图B的的最优匹配；S4-3-2, use the Monte Carlo method to find the optimal match of the bipartite graph B;

S4-3-3，找到最优匹配所对应的图编辑路径P＝{v₁→u_t1,v₂→u_t2,…,v_x→ u_tx}，其中，u_t1、u_t2、u_tm分别为v₁、v₂、v_m的匹配节点。S4-3-3, find the graph editing path P corresponding to the optimal match = { _v1 →u _t1 , _v2 →u _t2 , ..., _vx →u _tx }, where u _t1 , u _t2 , u _tm are the matching nodes of _v1 , _v2 , and _vm respectively.

进一步的，所述S5-4具体为：Furthermore, the S5-4 is specifically:

S5-4-1，若

表示用户节点v_i需增加

条边，则分别在在节点的两跳和三跳邻居节点间寻找需要增加边的节点，并连边，若连边数量小于

则添加假节点并与v_i连边最终使得连边总数等于

具体为:S5-4-1, if

Indicates that user node _vi needs to increase

edges, then find the nodes that need to add edges between the two-hop and three-hop neighbor nodes of the node and connect them. If the number of connected edges is less than

Then add a fake node and connect it to v _i so that the total number of connected edges equals

Specifically:

S5-4-1-1，在节点v_i的两跳节点搜寻需要增加度的节点，预设为鼸_i，若

则在v_i和v_j之间增加一条边，

S5-4-1-1, search for the node whose degree needs to be increased in the two-hop nodes of node v _i , which is preset as 鼸_i . If

Then add an edge between _vi and _vj ,

S5-4-1-2，若不存在需要增加度的两跳节点，则在三跳节点中搜寻需要增加度的节点，预设为鼸_i，若

则在v_i和v_j之间增加一条边，

S5-4-1-2, if there is no two-hop node that needs to increase the degree, search for the node that needs to increase the degree among the three-hop nodes, and preset it as 鼸_i .

Then add an edge between _vi and _vj ,

S5-4-1-3，重复上述步骤，直到

或者不存在需要增加度的两跳及三跳节点；S5-4-1-3, repeat the above steps until

Or there are no two-hop or three-hop nodes that need to increase the degree;

S5-4-1-4，若

终止步骤；S5-4-1-4, if

Termination step;

S5-4-1-5，若

且不存在需要增加度的两跳及三跳节点，则增加相应个数的假节点，并与v_i相连；S5-4-1-5, if

If there are no two-hop or three-hop nodes whose degree needs to be increased, then add the corresponding number of fake nodes and connect them to _vi ;

S5-4-2，若

表示用户节点v_i需删除条

边，在其邻居中寻找同样需要删除边的邻居，将它们之间的连边删除，当删除边数等于

时停止，若删除边数不足

则将节点相邻边按照边介数中心性从低到高删除，直到删除总边数等于

具体为：S5-4-2, if

Indicates that the user node v _i needs to delete the entry

edge, find the neighbors that also need to delete edges among their neighbors, and delete the edges between them. When the number of deleted edges is equal to

Stop when the number of edges to be deleted is insufficient.

Then delete the adjacent edges of the node according to the edge betweenness centrality from low to high, until the total number of deleted edges equals

Specifically:

S5-4-2-1，在节点v_i的邻居中依次寻找需要减少度数的节点，并把它们加入到一个候选集合CS中，并按照用户节点的度降序排列；S5-4-2-1, search for nodes whose degrees need to be reduced in the neighbors of node v _i , add them to a candidate set CS, and arrange them in descending order according to the degree of the user node;

S5-4-2-2，依次从删除CS中节点与v_i之间的连边；S5-4-2-2, delete the edges between the nodes in CS and _vi in turn;

S5-4-2-3，若

终止步骤；S5-4-2-3, if

Termination step;

S5-4-2-4，若

在v_i的剩下的相邻边中依次按照边介数中心性从小到大删除相应的边，直到

所述边介数中心性为社交网络中所有用户之间的最短路径经过该边的路径数与网络中所有最节点之间短路径数量之比。S5-4-2-4, if

Among the remaining adjacent edges of _vi , delete the corresponding edges in order of edge betweenness centrality from small to large, until

The edge betweenness centrality is the ratio of the number of paths through which the shortest paths between all users in the social network pass through the edge to the number of shortest paths between all nodes in the network.

本发明与现有技术相比具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

本发明在社会网络的图数据遭受1*-邻居攻击时，采用图修改技术实现了用户隐私身份隐私信息的保护；根据图编辑距离对同一簇中的1*-邻居图进行修改，使它们达到概率不可区分；在实现社交网络中用户身份隐私保护的同时，提高图数据的可用性。本发明所提供的一种社会网络中抵抗1*-邻居攻击的用户身份隐私保护方法具有较好的应用和推广作用。The present invention uses graph modification technology to protect user privacy information when graph data of a social network is attacked by 1*-neighbors; the 1*-neighbor graphs in the same cluster are modified according to the graph edit distance to make them probabilistically indistinguishable; while protecting user identity privacy in a social network, the usability of graph data is improved. The user identity privacy protection method against 1*-neighbor attacks in a social network provided by the present invention has good application and promotion effects.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明方法流程图；Fig. 1 is a flow chart of the method of the present invention;

图2是本发明一实施例中原始karate图及图中标号为1的节点的1*-邻居图示意图；FIG2 is a schematic diagram of an original karate graph and a 1*-neighborhood graph of a node labeled 1 in the graph according to an embodiment of the present invention;

图3是本发明一实施例中二部图示意图。FIG. 3 is a schematic diagram of a second diagram in an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图及实施例对本发明做进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

请参照图1，本发明提供一种社交网络中抵抗1*-邻居攻击的用户身份隐私保护方法，包括以下步骤：Referring to FIG. 1 , the present invention provides a method for protecting user identity privacy in a social network against 1*-neighbor attacks, comprising the following steps:

步骤1)，对于给定图G＝(V,E),根据度量：d(v)，lc(v)将节点划分成若干个簇，其中d(v),lc(v)分别表示节点v的度及其的局部聚类系数。在划分结束后，将这些簇按照每个簇的最大节点度降序排列；Step 1), for a given graph G = (V, E), divide the nodes into several clusters according to the metrics: d(v), lc(v), where d(v), lc(v) represent the degree of node v and its local clustering coefficient respectively. After the division, the clusters are sorted in descending order according to the maximum node degree of each cluster;

步骤2)，在节点粗划分后，某些簇中节点数少于一个给定的隐私需求k，根据簇的平均度与相邻两簇的平均度的差值，将该簇合并到差值小的簇中，以确保所有组的大小都大于k；Step 2), after the nodes are roughly divided, if the number of nodes in some clusters is less than a given privacy requirement k, the cluster is merged into the cluster with the smaller difference according to the difference between the average degree of the cluster and the average degree of the two adjacent clusters to ensure that the size of all groups is greater than k;

步骤2)具体方法为：Step 2) The specific method is:

S2-1，对于节点数小于k的簇，我们将其记为

计算

其前后相邻的两个簇

的节点平均度，分别记为

S2-1, for clusters with less than k nodes, we denote them as

calculate

The two adjacent clusters

The average node degree of

S2-2，若

满足公式

则将

添加到

中，否则将

添加到

中；S2-2, if

Satisfy the formula

Then

Add to

Otherwise,

Add to

middle;

步骤3)，在簇合并完成后，某些簇中节点个数大于2k个，需对其进行簇分裂操作使得每一个簇的大小为[k,2k)；Step 3), after the cluster merging is completed, if the number of nodes in some clusters is greater than 2k, cluster splitting operations need to be performed so that the size of each cluster is [k, 2k);

步骤3)具体为：Step 3) is specifically:

S3-2，构造用户节点的1*-邻居结构特征矩阵

其中

分别表示用户的节点v在社交网络中的度分布、内度分布、外度分布及间隙度分布；S3-2-1，计算用户节点v的1*-邻居图G(v)中邻居节点的度分布

是用户节点v_i的度，表示v_i在原始图G中邻居的个数，

N(v_i)为用户节点v所有邻居的集合；S3-2, construct the 1*-neighborhood structure feature matrix of the user node

in

They represent the degree distribution, inner degree distribution, outer degree distribution and gap degree distribution of the user's node v in the social network respectively; S3-2-1, calculate the degree distribution of neighbor nodes in the 1*-neighborhood graph G(v) of the user node v

N(v _i ) is the set of all neighbors of user node v;

S3-2-2，计算用户节点v的1*-邻居图G(v)中邻居节点的内度分布

S3-2-5，社交网络中每个用户节点的特征矩阵记为

S3-3，根据公式

计算同一簇中任意两个节点之间的结构相似度，其中

所述JS散度定义为：

其中P＝ {p₁,p₂,…,p_t}，Q＝{q₁,q₂,…,q_t}分别为同一概率空间中的两个概率分布，

S3-3-1, for user nodes v and u in the same cluster, use JS divergence to calculate the degree of irrelevance of their degree distribution, inner degree distribution, out-degree distribution, and gap degree distribution, respectively, which are recorded as:

The JS divergence is defined as:

S3-3-2，计算用户节点v及u的相似度向量

则用户节点u和v的相似度为

Then the similarity between user nodes u and v is

k ₁ +k ₂ +k ₃ + k ₄ =1.

S3-4，利用K-means聚类算法将节点划分为T个簇。S3-4, use K-means clustering algorithm to divide the nodes into T clusters.

步骤4)，根据每个簇中节点的1*-邻居图计算节点间的相似度，构造出一个带权二部图，并在二部图上计算出图编辑距离，并找到图编辑路径P；Step 4), calculate the similarity between nodes according to the 1*-neighborhood graph of nodes in each cluster, construct a weighted bipartite graph, calculate the graph edit distance on the bipartite graph, and find the graph edit path P;

步骤4具体方法为：Step 4:

S4-2，构造节点的匹配代价矩阵，并以节点的匹配代价作为边权值构造一个带权二部图；S4-2, construct the node matching cost matrix, and construct a weighted bipartite graph using the node matching cost as the edge weight;

S4-2-2，构造所述代价矩阵

S4-2-2, construct the cost matrix

S4-2-3，构造带权二部图

V₁、V₂分别为顶点集且两者中节点数量相等，记为x，

为边集，

为边权值矩阵，w_ij＝c_ij。S4-2-3, construct a weighted bipartite graph

V ₁ and V ₂ are vertex sets with the same number of nodes, denoted by x.

For edge sets,

is the edge weight matrix, w _ij = c _ij .

S4-3，利用二部图计算节点的图编辑距离并得到匹配的节点和图编辑路径。S4-3, using the bipartite graph to calculate the graph edit distance of the nodes and obtain the matching nodes and graph edit paths.

S4-3-3，找到最优匹配所对应的图编辑路径P＝{v₁→u_t1,v₂→u_t2,…,v_x→S4-3-3, find the graph editing path P corresponding to the best match = {v ₁ →u _t1 ,v ₂ →u _t2 ,…,v _x →

u_tx}，其中，u_t1、u_t2、u_tm分别为v₁、v₂、v_m的匹配节点。u _tx }, where u _t1 , u _t2 , and u _tm are the matching nodes of v ₁ , v ₂ , and v _m respectively.

步骤5)方法为：Step 5) The method is:

S5-2，计算A²及A³，

及

若

则令

若

则令

计算

S5-2, calculate A ² and A ³ ,

and

like

Then

like

Then

calculate

Among them, d _v represents the number of neighbors of user node v;

S5-4-1，若

表示用户节点v_i需增加

则添加假节点并与v_i连边最终使得连边总数等于

S5-4-1, if

Indicates that user node v _i needs to increase

则在v_i和v_j之间增加一条边，

Then add an edge between _vi and _vj ,

则在v_i和v_j之间增加一条边，

Then add an edge between _vi and _vj ,

S5-4-1-3，重复上述步骤，直到

Or there are no two-hop or three-hop nodes that need to increase the degree;

S5-4-1-4，若

终止步骤；S5-4-1-4, if

Termination step;

S5-4-1-5，若

且不存在需要增加度的两跳及三跳节点，则增加相应个数的假节点，并与v_i相连。S5-4-1-5, if

If there are no two-hop or three-hop nodes whose degree needs to be increased, then a corresponding number of fake nodes are added and connected to _vi .

S5-4-2，若

表示用户节点v_i需删除条

时停止，若删除边数不足

S5-4-2, if

Indicates that the user node v _i needs to delete the entry

Stop when the number of edges to be deleted is insufficient.

S5-4-2-3，若

终止步骤；S5-4-2-3, if

Termination step;

S5-4-2-4，若

以上所述仅为本发明的较佳实施例，凡依本发明申请专利范围所做的均等变化与修饰，皆应属本发明的涵盖范围。The above description is only a preferred embodiment of the present invention. All equivalent changes and modifications made according to the scope of the patent application of the present invention should fall within the scope of the present invention.

Claims

1. A user identity privacy protection method for resisting neighbor attack in a social network is characterized by comprising the following steps:

step 1) establishing a social network model, and representing the social network model as a graph G= (V, E), wherein V is a top point set of the graph and represents users in the social network; e is an edge set representing relationships between users in the social network;

according to a measure d (v), lc (v) divides the user nodes into T clusters, wherein d (v) represents the degree of the user nodes v and the meaning is the number of users connected with the user in the social network; lc (v) represents a local clustering coefficient of the user node v in the network, which means that the connection between neighbors of the node v is tight, and after the division is finished, the clusters are arranged in descending order according to the maximum node degree of each cluster;

step 2) presetting a user privacy requirement threshold k, if a certain cluster C _i If the number of users in the cluster is less than the threshold k, calculating the average degree of the cluster and the adjacent front and rear clusters C _i-1 ,C _i+1 Combining the clusters into clusters with small differences, repeating the process until the number of users in all clusters is greater than k;

step 3), after the cluster merging is completed, carrying out cluster splitting operation on clusters with the number of user nodes being more than 2k so that the number of users in each cluster is a certain value of [ k,2 k); the method comprises the following steps:

s3-1, for user nodes in each cluster, sorting according to a degree descending order, and constructing a 1-neighbor graph of the user nodes;

s3-2, constructing a 1-neighbor structure feature matrix of the user node

Wherein the method comprises the steps of

Respectively representing the degree distribution, the internal degree distribution, the external degree distribution and the gap degree distribution of the node v of the user in the social network;

s3-3, according to the formula

Calculating the structural similarity between any two nodes in the same cluster, wherein +.>

Respectively representing the uncorrelated degree, k of the user node degree distribution, the internal degree distribution, the external degree distribution and the gap degree distribution ₁ 、k ₂ 、k ₃ 、k ₄ Respectively represent the proportion of each similarity and satisfy k ₁ +k ₂ +k ₃ +k ₄ ＝1；

The S3-3 specifically comprises the following steps:

s3-3-1, for user nodes v and u in the same cluster, calculating the uncorrelated degree of the degree distribution, the internal degree distribution, the output degree distribution and the gap degree distribution by using JS divergences, wherein the uncorrelated degree is respectively recorded as:

the JS divergence is defined as:

wherein p= { P ₁ ,p ₂ ,…,p _t }，Q＝{q ₁ ,q ₂ ,…,q _t Respectively two probability distributions in the same probability space,

s3-3-2, calculating similarity vectors of user nodes v and u

The similarity of user nodes u and v is +.>

k ₁ +k ₂ +k ₃ +k ₄ ＝1；

S3-4, dividing the nodes into T clusters by using a K-means clustering algorithm;

step 4), calculating the similarity between each pair of user nodes according to the 1-neighbor graph of the user nodes in each cluster, constructing a weighted bipartite graph according to the similarity, calculating graph editing distances on the bipartite graph, and finding a target graph editing path P according to the similarity;

step 5), editing the path P according to the diagram found in the step 4), and modifying the 1-neighbor diagram of the nodes in the cluster so that the nodes are isomorphic.

2. The method for protecting privacy of user identity against neighbor attack according to claim 1, wherein: the 1 x-neighbor graph is a subgraph of the original graph G, defined as:

G(v)＝(V(v),E(v),D(v))

where V (V) is a set comprising the user node V itself and its neighbors, E (V) is the relationship between the edges of the nodes in V (V), i.e. the neighbors, and D (V) is the set of the number of neighbors of the node V in the social network, i.e. the set of the degrees of all the nodes in V (V).

3. The method for protecting user identity privacy against neighbor attack according to claim 1, wherein the step 2) specifically comprises:

s2-1, for clusters with node number less than k, it is noted as

Wherein the superscript 1 indicates that the cluster is the result obtained after the first division, and the average degree of the nodes in the cluster is marked as +.>

Calculate->

Two clusters adjacent to each other in front of and behind the same->

Is of (2)Point average degree, respectively marked as +.>

S2-2, if

Satisfy the formula->

Will->

Added to

In, otherwise will->

Added to->

In (a) and (b);

s2-3, repeatedly executing the steps until the number of nodes in all clusters exceeds k.

4. The method for protecting privacy of user identity against neighbor attack according to claim 1, wherein: the step 4) is specifically as follows:

s4-1, if the number of neighbor nodes in the l-neighbor graphs of the two user nodes is not equal, adding the user nodes in the graph with few user neighbor nodes so that the number of nodes in the two graphs is equal;

s4-2, constructing a matching cost matrix of the user nodes, and constructing a weighted bipartite graph by taking the matching cost of the user nodes as an edge weight;

s4-3, calculating graph editing distances among the user nodes by utilizing the bipartite graph to obtain matched nodes and graph editing paths.

5. The method for protecting privacy of user identity against neighbor attack according to claim 1, wherein: the step 5) specifically comprises the following steps:

s5-1, the adjacency matrix of the structural diagram G is denoted as A= (a) _ij ) _n×n Wherein when node v _i And v _j When an edge exists between the two adjacent layers,

a _ij =1, otherwise, a _ij ＝0；

S5-2, calculation A ² A is a ³ ，

Is->

If->

Order of principle

If->

Make->

Calculate->

S5-3, for each user node v in the social network, calculating the degree of modification required by the node v according to the matched node u calculated in the S4-3 and recording as

The degree of each user node in the social network to be modified is arranged according to descending order, and the obtained degree modification sequence is marked as +.>

Wherein d _v Representing the number of neighbors of the user node v;

s5-4 according to D ^M The graph structure is modified.

6. The method for protecting privacy of user identity against neighbor attack according to claim 1, wherein: the S3-2 specifically comprises the following steps:

s3-2-1, calculating the degree distribution of neighbor nodes in the 1-neighbor graph G (v) of the user node v

Is user node v _i The degree of (v) represents v _i Number of neighbors in original graph G, < >>

N (v) is a set of all neighbors of the user node v;

s3-2-2, calculating the internal degree distribution of the neighbor nodes in the 1-neighbor graph G (v) of the user node v

Is the user node's internal degree, representing user node v _i The number of neighbors in the 1 x-neighbor graph G (v), the +.>

Step 3-2-3, calculating the degree distribution of neighbor nodes in the 1 x-neighbor graph G (v) of the user node v

Is v _i Is indicative of the degree of egress of the user node v _i In 1 x-oNumber of neighbors outside graph G (v), +.>

Step 3-2-4, calculating the gap degree distribution of the neighbor nodes in the 1-neighbor graph G (v) of the user node v

Wherein->

S3-2-5, marking the characteristic matrix of each user node in the social network as

N _v Is the number of neighbors of user node v in the social network.

7. The method for protecting privacy of user identity against neighbor attack according to claim 4, wherein: the S4-2 specifically comprises the following steps:

s4-2-1, for any pair of vertices V and u, G (V) = (V) in the same cluster ₁ ,E ₁ ) And G (u) = (V) ₂ ,E ₂ ) Respectively their 1-neighbor graphs, for any node v _i E G (v), calculating the matching cost of E G (v) and all nodes in G (u)

S4-2-2, constructing the cost matrix

S4-2-3, constructing a weighted bipartite graph

V ₁ 、V ₂ Vertex sets with equal number of nodes, denoted as x, < >>

For edge set, add>

W＝(w _ij ) _m×m Is an edge weight matrix, w _ij ＝c _ij 。

8. The method for protecting privacy of user identity against neighbor attack according to claim 4, wherein: the S4-3 specifically comprises the following steps:

s4-3-1, selecting a node with the maximum degree as a matched seed node pair;

s4-3-2, utilizing a Monte Carlo method to obtain the optimal matching of the bipartite graph B;

s4-3-3, finding a graph editing path P= { v corresponding to the optimal matching ₁ →u _t1 ,v ₂ →u _t2 ,…,v _x →u _tx }, where u _t1 、u _t2 、u _tx V respectively ₁ 、v ₂ 、v _x Is a matching node of (c).

9. The method for protecting user identity privacy against neighbor attack according to claim 5, wherein: the S5-4 specifically comprises the following steps:

s5-4-1, if

Representing user node v _i Add->

The edges are respectively found between two-hop neighbor nodes and three-hop neighbor nodes of the node, the nodes needing to be added with edges are connected, and if the number of the connected edges is less than +.>

Then add the dummy node and associate with v _i The edges are finally such that the total number of edges is equal to +.>

The method comprises the following steps:

s5-4-1-1 at node v _i Searching nodes needing increasing degree by two-hop nodes, and presetting as u _i If (if)

Then at v _i And v _j An edge is added between the two parts, and the part is added with->

S5-4-1-2, if there is no two-hop node needing to be added, searching the node needing to be added in the three-hop node, and presetting as u _i If (if)

S5-4-1-3, repeating the above steps until

Or there is no two-hop and three-hop node requiring an increase in the degree;

s5-4-1-4, if

Terminating;

s5-4-1-5, if

If there are no two-hop and three-hop nodes requiring increasing degree, then the corresponding number of false nodes is increased and the false nodes are matched with v _i Are connected;

s5-4-2, if

Representing user node v _i Deletion->

Strip edge, find the neighbors needing deleting edge too in its neighbors, delete the edge connecting between them, when delete edge number equal to +.>

Stopping when deleting the edges, if the number of deleted edges is insufficient +.>

Deleting the adjacent edges of the nodes from low to high according to the edge medium centrality until the total edge deletion number is equal to +.>

The method comprises the following steps:

s5-4-2-1 at node v _i Sequentially searching nodes needing to be reduced in degree, adding the nodes into a candidate set CS, and arranging the nodes in descending order according to the degree of the user node;

s5-4-2-2, deleting nodes and v in CS in turn _i A connecting edge between the two;

s5-4-2-3, if

Terminating;

s5-4-2-4, if

At v _i Corresponding edges are deleted from small to large in turn according to the edge median centrality among the remaining adjacent edges of (1) until +.>

The edge betweenness centrality is the number of paths of the shortest path among all users in the social network passing through the edge and all nodes in the networkThe ratio of the number of short paths. />