CN115878906B

CN115878906B - A social graph generation method and system that protects personal similarity

Info

Publication number: CN115878906B
Application number: CN202211595849.9A
Authority: CN
Inventors: 胡春强; 陈佳俊; 杨正益; 邓绍江; 蔡斌; 向涛; 桑军
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-10-10
Anticipated expiration: 2042-12-13
Also published as: CN115878906A

Abstract

The invention provides a social graph generation method for protecting personal similarity, which comprises the following steps: acquiring an initial social graph, calculating attribute similarity between associated nodes in the initial social graph, forming an initial similarity sequence according to all attribute similarity in the initial social graph, and carrying out Laplacian noise processing on the initial similarity sequence to obtain a noise-added similarity sequence; post-processing is carried out on the noise-added similarity sequence to obtain a final similarity sequence; and writing the final similarity sequence into the initial social graph to generate a private social graph. The social graph generated by the method can effectively protect the node attributes and prevent attribute inference of the associated node parts in the using process.

Description

A social graph generation method and system that protects personal similarity

技术领域Technical field

本发明属于技术领域，具体涉及一种保护个人相似度的社交图生成方法及系统。The invention belongs to the technical field, and specifically relates to a social graph generation method and system for protecting personal similarity.

背景技术Background technique

社交图是由许多节点构成的一种社会结构图，节点通常是指个人或组织，节点的属性可以表示个体具有的属性，例如个人的属性包括身高、体重、年龄等；连接节点之间的边表示两个节点之间的关系强度，且通过边连接的节点为关联节点；社交图把偶然相识的泛泛之交到紧密结合的家庭关系中的人们或组织通过关联起来。因为数据分析的需要，常常需要发布或共享社交图，现有技术中节点的属性通常非常敏感，容易受到各种链接的攻击。A social graph is a social structure graph composed of many nodes. Nodes usually refer to individuals or organizations. The attributes of nodes can represent the attributes of individuals. For example, personal attributes include height, weight, age, etc.; the edges connecting nodes Represents the strength of the relationship between two nodes, and the nodes connected by edges are associated nodes; the social graph associates casual acquaintances with people or organizations in closely knit family relationships. Because of the need for data analysis, it is often necessary to publish or share social graphs. The attributes of nodes in the existing technology are usually very sensitive and vulnerable to attacks from various links.

为了解决上述问题，现有技术通常使用差分隐私技术保护节点的属性；差分隐私技术假设关于个人的所有信息都是由个人自己提供的，但当相似性发生时，情况就不同了。在社会学研究和心理学理论中提到，相似性作为对称关系的主要表征是普遍存在的，在社交网络中用户之间的直接关系表现出同质性，即当两个节点之间存在边时，具有相似度的节点会自然形成更强的连接，因此在使用一个节点属性时，关联节点的一些隐私信息都不可避免地被披露。In order to solve the above problems, existing technologies usually use differential privacy technology to protect the attributes of nodes; differential privacy technology assumes that all information about an individual is provided by the individual himself, but when similarity occurs, the situation is different. It is mentioned in sociological research and psychological theory that similarity is ubiquitous as the main representation of symmetrical relationships. In social networks, the direct relationship between users shows homogeneity, that is, when there is an edge between two nodes , nodes with similar degrees will naturally form stronger connections, so when using a node attribute, some private information of the associated nodes will inevitably be disclosed.

因而，亟需能够保护个人相似度的社交图，在该社交图的使用过程中能够有效地保护节点属性。Therefore, there is an urgent need for a social graph that can protect personal similarity and that can effectively protect node attributes during the use of the social graph.

发明内容Contents of the invention

本发明旨在解决现有技术中存在的技术问题，提供一种保护个人相似度的社交图生成方法及系统，使生成的社交图在使用过程中能够有效地保护节点属性，防止关联节点件的属性推断。The present invention aims to solve the technical problems existing in the prior art and provide a social graph generation method and system that protects personal similarity, so that the generated social graph can effectively protect node attributes and prevent associated node components during use. Property inference.

为了实现本发明的上述目的，根据本发明的第一个方面，本发明提供了一种保护个人相似度的社交图生成方法，包括以下步骤：获取初始社交图，计算初始社交图中关联节点之间的属性相似度，根据初始社交图中的所有属性相似度组成初始相似度序列，对初始相似度序列进行拉普拉斯加噪处理获得加噪相似度序列；对加噪相似度序列进行后置处理获得最终相似度序列；后置处理的具体步骤如下：遍历加噪相似度序列中的每一个属性相似度，若属性相似度小于0或大于初始相似度序列中的最大属性相似度，则将该属性相似度赋值为初始相似度序列中的最大属性相似度；若属性相似度大于0且小于初始相似度序列中的最大属性相似度，则该属性相似度保持原值；将最终相似度序列写入初始社交图中生成隐私社交图。In order to achieve the above objects of the present invention, according to the first aspect of the present invention, the present invention provides a social graph generation method that protects personal similarity, including the following steps: obtaining an initial social graph, and calculating the number of associated nodes in the initial social graph. Based on the attribute similarity between all the attributes in the initial social graph, an initial similarity sequence is formed, and the initial similarity sequence is subjected to Laplacian noise processing to obtain a noisy similarity sequence; the noisy similarity sequence is processed Post-processing to obtain the final similarity sequence; the specific steps of post-processing are as follows: traverse each attribute similarity in the noisy similarity sequence, if the attribute similarity is less than 0 or greater than the maximum attribute similarity in the initial similarity sequence, then Assign the attribute similarity to the maximum attribute similarity in the initial similarity sequence; if the attribute similarity is greater than 0 and less than the maximum attribute similarity in the initial similarity sequence, the attribute similarity remains at the original value; the final similarity The sequence is written into the initial social graph to generate a private social graph.

进一步地，拉普拉斯加噪处理的具体步骤如下：将初始相似度序列分割成多个集合；且初始相似度序列中的任一属性相似度在多个集合中仅出现一次；对每个集合分别进行拉普拉斯加噪处理。Further, the specific steps of Laplacian noise processing are as follows: divide the initial similarity sequence into multiple sets; and any attribute similarity in the initial similarity sequence only appears once in multiple sets; for each The sets are separately processed by Laplacian noise.

进一步地，多每个集合分别进行拉普拉斯加噪处理的步骤具体如下：设定第一隐私参数，遍历所有集合，若集合中的属性相似度的总数大于第一隐私参数，则给该集合中的每个属性相似度加上第一噪声；若集合中的属性相似度的总数小于第一隐私参数，则给该集合中的每个属性相似度加上第二噪声。Further, the steps for performing Laplacian noise processing on each set are as follows: set the first privacy parameter, traverse all sets, and if the total number of attribute similarities in the set is greater than the first privacy parameter, give the The first noise is added to each attribute similarity in the set; if the total number of attribute similarities in the set is less than the first privacy parameter, the second noise is added to each attribute similarity in the set.

进一步地，第一噪声与第二噪声均通过概率密度函数计算获得；概率密度函数如下：其中，P表示概率，x表示所添加的噪声值，x≤0，λ表示噪声尺度。Further, both the first noise and the second noise are obtained by calculating the probability density function; the probability density function is as follows: Among them, P represents the probability, x represents the added noise value, x≤0, and λ represents the noise scale.

进一步地，第一噪声的噪声尺度的计算公式如下：λ₁＝kp/ò；第二噪声的噪声尺度的计算公式如下：λ₂＝rp/ò；其中，λ₁表示第一噪声的噪声尺度，λ₂表示第二噪声的噪声尺度，k表示第一隐私参数，r表示集合中的属性相似度的总数，p表示初始相似度序列中的最大属性相似度，ò表示第二隐私参数；第二隐私参数通过设定获得。Further, the calculation formula of the noise scale of the first noise is as follows: λ ₁ =kp/ò; the calculation formula of the noise scale of the second noise is as follows: λ ₂ =rp/ò; where λ ₁ represents the noise scale of the first noise , λ ₂ represents the noise scale of the second noise, k represents the first privacy parameter, r represents the total number of attribute similarities in the set, p represents the maximum attribute similarity in the initial similarity sequence, ò represents the second privacy parameter; 2. Privacy parameters are obtained through settings.

进一步地，计算初始社交图中关联节点之间的属性相似度的步骤具体为：获取社交属性图中关联节点的属性向量，通过关联节点的属性向量和距离公式计算关联节点之间的距离，根据关联节点与关联距离生成该关联节点的属性相似度。Further, the steps for calculating the attribute similarity between associated nodes in the initial social graph are as follows: obtaining the attribute vector of the associated node in the social attribute graph, calculating the distance between the associated nodes through the attribute vector of the associated node and the distance formula, according to The associated node and associated distance generate the attribute similarity of the associated node.

进一步地，距离公式具体为：其中，i表示节点i，j表示节点j，节点i和节点j为关联节点，l_ij表示关联节点i和j的关联距离，x_i表示节点i的属性向量，x_j表示节点j的属性向量，S表示x_i与x_j之间的协方差矩阵。Furthermore, the distance formula is specifically: Among them, i represents node i, j represents node j, node i and node j are associated nodes, l _ij represents the association distance between associated nodes i and j, x _i represents the attribute vector of node i, and x _j represents the attribute vector of node j. , S represents the covariance matrix between x _i and x _j .

进一步地，属性相似度的表示为(i,j,l_ij)，其中i表示节点i，j表示节点j，l_ij表示关联节点i和j的关联距离。Further, the attribute similarity is expressed as (i, j, l _ij ), where i represents node i, j represents node j, and l _ij represents the association distance between associated nodes i and j.

进一步地，将最终相似度序列写入初始社交图中生成隐私社交图的步骤具体如下：将初始社交图分为节点集、边集和节点属性集；使用最终相似度序列代替初始社交图中的节点属性集生成隐私社交图。Further, the steps for writing the final similarity sequence into the initial social graph to generate a private social graph are as follows: Divide the initial social graph into node sets, edge sets and node attribute sets; use the final similarity sequence to replace the Node attribute sets generate private social graphs.

为了实现本发明的上述目的，根据本发明的第二个方面，本发明提供了一种保护个人相似度的社交图生成系统，在运行过程中实现了上述任一一种保护个人相似度的社交图生成方法的步骤；该系统包括获取模块、计算模块、加噪模块、后置模块和生成模块；获取模块用于获取初始社交图；计算模块用于计算初始社交图中关联节点的属性相似度，并组成初始相似度序列；加噪模块用于对初始相似度序列进行加噪处理，获得加噪相似度序列；后置模块用于对加噪相似度序列进行后置处理，获得最终相似度序列；生成模块用于将最终相似度序列写入初始社交图中生成隐私社交图。In order to achieve the above objects of the present invention, according to the second aspect of the present invention, the present invention provides a social graph generation system that protects personal similarity, and realizes any of the above social graphs that protect personal similarity during operation. The steps of the graph generation method; the system includes an acquisition module, a calculation module, a noise adding module, a post-processing module and a generation module; the acquisition module is used to obtain the initial social graph; the calculation module is used to calculate the attribute similarity of the associated nodes in the initial social graph , and form an initial similarity sequence; the noise-adding module is used to perform noise processing on the initial similarity sequence to obtain the noise-added similarity sequence; the post-processing module is used to perform post-processing on the noise-added similarity sequence to obtain the final similarity Sequence; the generation module is used to write the final similarity sequence into the initial social graph to generate a privacy social graph.

本发明的基本原理及有益效果：本方案通过计算属性相似度，隐藏节点的具体属性值，对社交图中关联节点的相似度进行量化后保留节点之间的相似度；为了隐藏该相似度，本方案使用了差分隐私的定义，基于加噪处理和后置处理手段对相似度进行保护，使节点的相似度可用而不可见，避免关联节点之间通过相似度进行属性推断，提高了对社交图的保护强度，最小化对数据效用的破坏。Basic principles and beneficial effects of the present invention: This solution hides the specific attribute values of nodes by calculating attribute similarity, quantifies the similarity of associated nodes in the social graph and then retains the similarity between nodes; in order to hide the similarity, This solution uses the definition of differential privacy and protects similarity based on noise processing and post-processing methods, making the similarity of nodes available but not visible, avoiding attribute inference between associated nodes through similarity, and improving social security. The protection strength of the graph minimizes the damage to the data utility.

附图说明Description of the drawings

图1是本发明一种保护个人相似度的社交图生成方法的步骤示意图；Figure 1 is a schematic diagram of the steps of a social graph generation method for protecting personal similarity according to the present invention;

图2是本发明使用查询模型提取初始相似度序列的步骤示意图；Figure 2 is a schematic diagram of the steps of using the query model to extract the initial similarity sequence according to the present invention;

图3是本发明初始相似度序列分割过程的步骤示意图；Figure 3 is a schematic diagram of the steps of the initial similarity sequence segmentation process of the present invention;

图4是四个真实的社交网络数据集的KS-distance比较图；Figure 4 is a KS-distance comparison diagram of four real social network data sets;

图5是四个真实的社交网络数据集的Mallows-distance比较图；Figure 5 is a Mallows-distance comparison diagram of four real social network data sets;

图6是本发明一种保护个人相似度的社交图生成系统的结构示意图。Figure 6 is a schematic structural diagram of a social graph generation system for protecting personal similarity in the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are only used to explain the present invention and cannot be understood as limiting the present invention.

在本发明的描述中，需要理解的是，术语“纵向”、“横向”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。In the description of the present invention, it should be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", The orientations or positional relationships indicated by "horizontal", "top", "bottom", "inner", "outer", etc. are based on the orientations or positional relationships shown in the drawings, and are only for the convenience of describing the present invention and simplifying the description, and are not Any indication or implication that the referred device or element must have a specific orientation, be constructed and operate in a specific orientation should not be construed as a limitation on the invention.

在本发明的描述中，除非另有规定和限定，需要说明的是，术语“安装”、“相连”、“连接”应做广义理解，例如，可以是机械连接或电连接，也可以是两个元件内部的连通，可以是直接相连，也可以通过中间媒介间接相连，对于本领域的普通技术人员而言，可以根据具体情况理解上述术语的具体含义。In the description of the present invention, unless otherwise specified and limited, it should be noted that the terms "installation", "connection" and "connection" should be understood in a broad sense. For example, it can be a mechanical connection or an electrical connection, or both. The internal connection between components may be directly connected or indirectly connected through an intermediate medium. For those of ordinary skill in the art, the specific meaning of the above terms can be understood according to the specific situation.

如附图1所示，本发明提供了一种保护个人相似度的社交图生成方法，包括以下步骤：As shown in Figure 1, the present invention provides a social graph generation method that protects personal similarity, including the following steps:

获取初始社交图，计算初始社交图中关联节点之间的属性相似度，Obtain the initial social graph and calculate the attribute similarity between the associated nodes in the initial social graph,

根据初始社交图中的所有属性相似度组成初始相似度序列，对初始相似度序列进行拉普拉斯加噪处理获得加噪相似度序列；对加噪相似度序列进行后置处理获得最终相似度序列；An initial similarity sequence is formed based on the similarity of all attributes in the initial social graph. The initial similarity sequence is subjected to Laplacian noise processing to obtain the noise-added similarity sequence; the noise-added similarity sequence is post-processed to obtain the final similarity. sequence;

后置处理的具体步骤如下：遍历加噪相似度序列中的每一个属性相似度，若属性相似度小于0或大于初始相似度序列中的最大属性相似度，则将该属性相似度赋值为初始相似度序列中的最大属性相似度；若属性相似度大于0且小于初始相似度序列中的最大属性相似度，则该属性相似度保持原值；The specific steps of post-processing are as follows: traverse each attribute similarity in the noisy similarity sequence. If the attribute similarity is less than 0 or greater than the maximum attribute similarity in the initial similarity sequence, assign the attribute similarity to the initial value. The maximum attribute similarity in the similarity sequence; if the attribute similarity is greater than 0 and less than the maximum attribute similarity in the initial similarity sequence, the attribute similarity remains at the original value;

将最终相似度序列写入初始社交图中生成隐私社交图。The final similarity sequence is written into the initial social graph to generate a private social graph.

具体地，带有节点属性的社交图可以表示为一个三元组G(V,E,X)，其中，V＝(v₁,…,v_n)表示节点集，E＝(e₁,…,e_n)表示边集，X＝(x₁,…,x_n)表示节点属性集；其中x₁表示节点v₁的属性向量；Specifically, the social graph with node attributes can be expressed as a triplet G(V,E,X), where V=(v ₁ ,…,v _n ) represents the node set and E=(e ₁ ,… , _en ) represents the edge set, X=(x ₁ ,...,x _n ) represents the node attribute set; where x ₁ represents the attribute vector of node v ₁ ;

具体地，本实施例引入差分隐私技术对初始社交图进行拉普拉斯加噪的原理如下：Specifically, the principle of this embodiment introducing differential privacy technology to perform Laplacian noise on the initial social graph is as follows:

如果一种算法满足差分隐私算法，其输入为一张图，那么修改输入的图中的一条边，对输出结果的影响是十分有限的。如果修改输入的图中的k条边，对输出结果的影响十分有限，则该算法满足k边差分隐私；If an algorithm satisfies the differential privacy algorithm and its input is a graph, then modifying an edge in the input graph will have a very limited impact on the output result. If k edges in the input graph are modified, the impact on the output result is very limited, then the algorithm satisfies k-edge differential privacy;

对规定的两个社交图G₁(V₁,E₁,X₁)和G₂(V₂,E₂,X₂)，如果V₁＝V₂，|E₂|＝|E₁|-k，则称G₁和G₂为k边相邻；For the two specified social graphs G ₁ (V ₁ ,E ₁ ,X ₁ ) and G ₂ (V ₂ ,E ₂ ,X ₂ ), if V ₁ =V ₂ , |E ₂ |=|E ₁ |- k, then G ₁ and G ₂ are said to be adjacent on edge k;

设随机算法M满足差分隐私，range(M)为算法M有可能的输出构成的集合，则对于任意两个k边相邻的社交图G₁和G₂以及range(M)的任何子集S满足以下公式：Assume that the random algorithm M satisfies differential privacy, and range (M) is a set of possible outputs of algorithm M. Then for any two k-edge adjacent social graphs G ₁ and G ₂ and any subset S of range (M) Satisfy the following formula:

Pr(M(G₁)∈S)≤e^ò×Pr(M(G₂)∈S)Pr(M(G ₁ )∈S)≤e ^ò ×Pr(M(G ₂ )∈S)

其中，ò表示设定的第二隐私参数。Among them, ò represents the set second privacy parameter.

具体地，计算初始社交图中关联节点之间的属性相似度的步骤具体为：Specifically, the steps for calculating attribute similarity between associated nodes in the initial social graph are as follows:

获取社交属性图中关联节点的属性向量，通过关联节点的属性向量和距离公式计算关联节点之间的距离，根据关联节点与关联距离生成该关联节点的属性相似度。Obtain the attribute vector of the associated node in the social attribute graph, calculate the distance between the associated nodes through the attribute vector of the associated node and the distance formula, and generate the attribute similarity of the associated node based on the associated node and the associated distance.

进一步，距离公式具体为：Furthermore, the distance formula is specifically:

其中，i表示节点i，j表示节点j，节点i和节点j为关联节点，l_ij表示关联节点i和j的关联距离，x_i表示节点i的属性向量，x_j表示节点j的属性向量，S表示x_i与x_j之间的协方差矩阵。Among them, i represents node i, j represents node j, node i and node j are associated nodes, l _ij represents the association distance between associated nodes i and j, x _i represents the attribute vector of node i, and x _j represents the attribute vector of node j. , S represents the covariance matrix between x _i and x _j .

具体地，将边的关联节点之间的距离作为该边的标签，写入初始社交图中，形成新的社交图，记为G'(V,E,L)，L表示初始社交图中的属性相似度序列，即初始相似度序列。如附图2所示，将新的社交图放入查询模型中提取出初始相似度序列L。Specifically, the distance between the associated nodes of an edge is used as the label of the edge, and is written into the initial social graph to form a new social graph, denoted as G'(V,E,L), where L represents the initial social graph. Attribute similarity sequence, that is, the initial similarity sequence. As shown in Figure 2, the new social graph is put into the query model to extract the initial similarity sequence L.

进一步，属性相似度的表示为(i,j,l_ij)，其中i表示节点i，j表示节点j，l_ij表示关联节点i和j的关联距离。Further, the attribute similarity is expressed as (i, j, l _ij ), where i represents node i, j represents node j, and l _ij represents the association distance between associated nodes i and j.

具体地，拉普拉斯加噪处理的具体步骤如下：Specifically, the specific steps of Laplacian noise processing are as follows:

将初始相似度序列分割成多个集合；且初始相似度序列中的任一属性相似度在多个集合中仅出现一次；对每个集合分别进行拉普拉斯加噪处理。本实施例提出了一种截断拉普拉斯机制T：给定一个单调递增的函数F，即对任意的邻域数据集G₁,G₂∈G，F(G₁)≤F(G₂)，那么机制T(G)＝F(G₁)+lap^-(ΔF/ò)满足ò的差分隐私，其中lap^-(ΔF/ò)表示所添加的拉普拉斯噪声，该噪声为负值，ΔF/ò决定了添加的拉普拉斯噪声的大小，ΔF表示查询模型的敏感度，ò表示设定的第二隐私参数。Divide the initial similarity sequence into multiple sets; any attribute similarity in the initial similarity sequence only appears once in multiple sets; perform Laplacian noise processing on each set separately. This embodiment proposes a truncated Laplacian mechanism T: given a monotonically increasing function F, that is, for any neighborhood data set G ₁ , G ₂ ∈G, F(G ₁ )≤F(G ₂ ), then the mechanism T(G)=F(G ₁ )+lap ^- (ΔF/ò) satisfies the differential privacy of ò, where lap ^- (ΔF/ò) represents the added Laplacian noise, which is negative The value, ΔF/ò determines the size of the added Laplacian noise, ΔF represents the sensitivity of the query model, and ò represents the set second privacy parameter.

具体地，分割初始相似度序列的过程如附图3所示，将初始相似度序列L根据其关联的节点分割为多个集合任意元素包括关联节点与关联节点之间的距离，即属性相似度；比如l_ij∈L，l_ij是关联节点i和j的距离，一旦将l_ij划分给了/>则其不会再划分给/>集合的数量与每个集合中的初始相似度序列L的数量根据初始社交图中边的数量设定，本实施例中，集合的数量与每个集合中的初始相似度序列L的数量如附图2和附图3所示，且初始相似度序列L中的任意元素在集合/>中仅出现一次。Specifically, the process of dividing the initial similarity sequence is shown in Figure 3. The initial similarity sequence L is divided into multiple sets according to its associated nodes. Any element includes the distance between associated nodes, that is, attribute similarity; for example, l _ij ∈L, l _ij is the distance between associated nodes i and j. Once l _ij is divided to/> then it will not be allocated to/> The number of sets and the number of initial similarity sequences L in each set are set according to the number of edges in the initial social graph. In this embodiment, the number of sets and the number of initial similarity sequences L in each set are as follows: As shown in Figure 2 and Figure 3, and any element in the initial similarity sequence L is in the set /> Appears only once in .

具体地，多每个集合分别进行拉普拉斯加噪处理的步骤具体如下：设定第一隐私参数，遍历所有集合，若集合中的属性相似度的总数大于第一隐私参数，则给该集合中的每个属性相似度加上第一噪声；若集合中的属性相似度的总数小于第一隐私参数，则给该集合中的每个属性相似度加上第二噪声。Specifically, the steps for performing Laplacian noise processing on each set are as follows: set the first privacy parameter, traverse all sets, and if the total number of attribute similarities in the set is greater than the first privacy parameter, give the The first noise is added to each attribute similarity in the set; if the total number of attribute similarities in the set is less than the first privacy parameter, the second noise is added to each attribute similarity in the set.

进一步，第一噪声与第二噪声均通过概率密度函数计算获得；概率密度函数如下：Furthermore, both the first noise and the second noise are calculated through the probability density function; the probability density function is as follows:

其中，P表示概率，x表示所添加的噪声值，x≤0，λ表示噪声尺度。通过概率密度函数可知，截断拉普拉斯的均值为0，方差为λ²，且只增加拉普拉斯分布的一侧的噪声，而截断另一侧的噪声；与传统的完成差分隐私的拉普拉斯机制(均值为0，方差为2λ²)相比，本实施例提出的截断拉普拉斯机制在提供同等隐私保护水平的同时，具有更小的方差，使社交图的数据在保护后波动性更小，社交图在使用过程中的数据效用更高。Among them, P represents the probability, x represents the added noise value, x≤0, and λ represents the noise scale. It can be seen from the probability density function that the mean value of truncated Laplacian is 0 and the variance is λ ² , and only increases the noise on one side of the Laplacian distribution, while truncating the noise on the other side; it is different from the traditional method of completing differential privacy. Compared with the Laplacian mechanism (mean value is 0, variance is 2λ ² ), the truncated Laplacian mechanism proposed in this embodiment has a smaller variance while providing the same level of privacy protection, making the data of the social graph in After protection, the volatility is smaller, and the social graph has higher data utility during use.

进一步，第一噪声的噪声尺度的计算公式如下：Furthermore, the calculation formula of the noise scale of the first noise is as follows:

λ₁＝kp/òλ ₁ =kp/ò

第二噪声的噪声尺度的计算公式如下：The calculation formula of the noise scale of the second noise is as follows:

λ₂＝rp/òλ ₂ =rp/ò

其中，λ₁表示第一噪声的噪声尺度，λ₂表示第二噪声的噪声尺度，k表示第一隐私参数，r表示集合中的属性相似度的总数，p表示初始相似度序列中的最大属性相似度，ò表示第二隐私参数；第二隐私参数通过设定获得。Among them, λ ₁ represents the noise scale of the first noise, λ ₂ represents the noise scale of the second noise, k represents the first privacy parameter, r represents the total number of attribute similarities in the set, and p represents the maximum attribute in the initial similarity sequence. Similarity, ò represents the second privacy parameter; the second privacy parameter is obtained by setting.

由上述的噪声尺寸公式得知，若不对初始相似度进行分割，则所有属性相似度的加噪尺寸均为kp/ò；对初始相似度序列先进行分割再加噪使加噪后的尺寸分为了kp/ò和rp/ò(k>r)，因此将初始相似度序列划分为多个集合可以避免添加太大的噪声，提高社会图的数据效用。From the above noise size formula, it can be known that if the initial similarity is not segmented, the noise size of all attribute similarities is kp/ò; the initial similarity sequence is segmented first and then noise is added to make the size segmentation after noise In order to kp/ò and rp/ò(k>r), therefore dividing the initial similarity sequence into multiple sets can avoid adding too much noise and improve the data utility of the social graph.

具体地，通过上述公式获得加噪相似度序列对加噪相似度序列进行后置处理，获得最终相似度序列/>使用最终相似度序列/>代替初始社交图中的节点属性集X生成隐私社交图/> Specifically, the noisy similarity sequence is obtained through the above formula Perform post-processing on the noisy similarity sequence to obtain the final similarity sequence/> Use final similarity sequence/> Replace the node attribute set X in the initial social graph to generate a private social graph/>

在具体的实施过程中，根据本方案所述的一种保护个人相似度的社交图生成方法在四个真实的社交网络数据集上进行实验；四个真实的社交网络数据集包括Karate、Footbal、LastFM-Asia和Facebook；为了对根据本实施例所述的一种保护个人相似度的社交图生成方法生成的社交图进行隐私效果评价，我们使用两个指标来评估其准确性和隐私。首先，我们使用Kolmogorov-Smirnoff距离(KS-distance)来验证干扰前后的属性相似度是否服从相同的分布；然后利用本实施例所述的距离公式准确描述两个分布的尾部之间的差异。图4和图5显示了随着隐私预算的变化KS-distance和距离公式计算出的距离相应的变化趋势，其中默认k＝15。根据附图4和附图5可以得出，使用本实施例所述的距离公式计算出的距离对于不同尺寸的图更稳定，并且计算出的距离随着图中边数量的增加而增加。In the specific implementation process, experiments were conducted on four real social network data sets according to a social graph generation method that protects personal similarity described in this plan; the four real social network data sets include Karate, Footbal, LastFM-Asia and Facebook; In order to evaluate the privacy effect of the social graph generated according to a social graph generation method that protects personal similarity described in this embodiment, we use two indicators to evaluate its accuracy and privacy. First, we use the Kolmogorov-Smirnoff distance (KS-distance) to verify whether the attribute similarity before and after interference obeys the same distribution; then we use the distance formula described in this embodiment to accurately describe the difference between the tails of the two distributions. Figures 4 and 5 show the corresponding changing trends of KS-distance and the distance calculated by the distance formula as the privacy budget changes, where k=15 by default. According to Figures 4 and 5, it can be concluded that the distance calculated using the distance formula described in this embodiment is more stable for graphs of different sizes, and the calculated distance increases as the number of edges in the graph increases.

如附图6所示，为了实现本发明的上述目的，根据本发明的第二个方面，本发明提供了一种保护个人相似度的社交图生成系统，在运行过程中实现了上述任一一种保护个人相似度的社交图生成方法的步骤；该系统包括获取模块、计算模块、加噪模块、后置模块和生成模块；获取模块用于获取初始社交图；计算模块用于计算初始社交图中关联节点的属性相似度，并组成初始相似度序列；加噪模块用于对初始相似度序列进行加噪处理，获得加噪相似度序列；后置模块用于对加噪相似度序列进行后置处理，获得最终相似度序列；生成模块用于将最终相似度序列写入初始社交图中生成隐私社交图。As shown in Figure 6, in order to achieve the above objects of the present invention, according to the second aspect of the present invention, the present invention provides a social graph generation system that protects personal similarity, and realizes any of the above during operation. The steps of a social graph generation method that protects personal similarity; the system includes an acquisition module, a calculation module, a noise adding module, a post-processing module and a generation module; the acquisition module is used to obtain the initial social graph; the calculation module is used to calculate the initial social graph attribute similarity of the associated nodes in the middle, and form an initial similarity sequence; the noise-adding module is used to perform noise processing on the initial similarity sequence to obtain the noise-added similarity sequence; the post-module is used to post-process the noise-added similarity sequence. Set processing to obtain the final similarity sequence; the generation module is used to write the final similarity sequence into the initial social graph to generate a privacy social graph.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, reference to the terms "one embodiment," "some embodiments," "an example," "specific examples," or "some examples" or the like means that specific features are described in connection with the embodiment or example. , structures, materials or features are included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

尽管已经示出和描述了本发明的实施例，本领域的普通技术人员可以理解：在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those of ordinary skill in the art will appreciate that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principles and purposes of the invention. The scope of the invention is defined by the claims and their equivalents.

Claims

1. The social graph generation method for protecting personal similarity is characterized by comprising the following steps of:

acquiring an initial social graph, calculating attribute similarity between associated nodes in the initial social graph, forming an initial similarity sequence according to all attribute similarity in the initial social graph, and carrying out Laplacian noise processing on the initial similarity sequence to obtain a noise-added similarity sequence; post-processing is carried out on the noise-added similarity sequence to obtain a final similarity sequence;

the post-treatment comprises the following specific steps: traversing each attribute similarity in the noise-added similarity sequence, and if the attribute similarity is smaller than 0 or larger than the maximum attribute similarity in the initial similarity sequence, assigning the attribute similarity as the maximum attribute similarity in the initial similarity sequence; if the attribute similarity is greater than 0 and less than the maximum attribute similarity in the initial similarity sequence, the attribute similarity keeps the original value;

writing the final similarity sequence into the initial social graph to generate a private social graph;

the specific steps of the Laplacian noise treatment are as follows:

dividing the initial similarity sequence into a plurality of sets; and any attribute similarity in the initial similarity sequence only appears once in the plurality of sets; respectively carrying out Laplacian noise treatment on each set;

the steps of Laplacian noise treatment are specifically as follows:

setting a first privacy parameter, traversing all the sets, and adding first noise to each attribute similarity in the set if the total number of attribute similarity in the set is larger than the first privacy parameter; if the total number of attribute similarity in the set is smaller than the first privacy parameter, adding second noise to each attribute similarity in the set;

the first noise and the second noise are obtained through probability density function calculation; the probability density function is as follows:

wherein P represents probability, x represents added noise value, x is less than or equal to 0, and lambda represents noise scale.

2. The social graph generating method for protecting personal similarity as recited in claim 1, wherein the noise scale of the first noise is calculated as follows:

λ ₁ ＝kp/ò

the noise scale of the second noise is calculated as follows:

λ ₂ ＝rp/ò

wherein lambda is ₁ A noise scale, lambda, representing the first noise ₂ Representing the noise scale of the second noise, k representing the first privacy parameter, r representing the total number of attribute similarities in the set, p representing the maximum attribute similarity in the initial similarity sequence, co representing the second privacy parameter; the second privacy parameter is obtained by setting.

3. The social graph generating method for protecting personal similarity according to claim 1 or 2, wherein the step of calculating attribute similarity between associated nodes in the initial social graph comprises the following steps:

and obtaining attribute vectors of the associated nodes in the social attribute graph, calculating the distance between the associated nodes through the attribute vectors of the associated nodes and a distance formula, and generating attribute similarity of the associated nodes according to the associated nodes and the associated distance.

4. The social graph generating method for protecting personal similarity as claimed in claim 3, wherein the distance formula is specifically:

wherein i represents node i, j represents node j, node i and node j are associated nodes, l _ij Representing the association distance, x, of the associated nodes i and j _i Attribute vector, x representing node i _j Attribute vector representing node j, S representing x _i And x _j Covariance matrix between them.

5. A social graph generating method for protecting personal similarity as recited in claim 3, wherein the attribute similarity is expressed as (i, j, l) _ij ) Where i represents node i, j represents node j, l _ij Representing the associated distance of the associated nodes i and j.

6. The method for generating a social graph protecting personal similarity according to claim 1, 2, 4 or 5, wherein the step of writing the final similarity sequence into the initial social graph to generate the private social graph is specifically as follows:

dividing the initial social graph into a node set, an edge set and a node attribute set; a private social graph is generated using the final similarity sequence in place of the set of node attributes in the initial social graph.

7. A social graph generating system for protecting personal similarity, characterized in that the steps of a social graph generating method for protecting personal similarity according to any one of claims 1-6 are implemented in the running process; the system comprises an acquisition module, a calculation module, a noise adding module, a rear module and a generation module;

the acquisition module is used for acquiring an initial social graph;

the computing module is used for computing attribute similarity of the associated nodes in the initial social graph and forming an initial similarity sequence;

the noise adding module is used for carrying out noise adding processing on the initial similarity sequence to obtain a noise adding similarity sequence;

the post module is used for carrying out post processing on the noise-added similarity sequence to obtain a final similarity sequence;

the generating module is used for writing the final similarity sequence into the initial social graph to generate a private social graph.