WO2021114921A1 - 基于隐私保护的关系网络构建方法及装置 - Google Patents

基于隐私保护的关系网络构建方法及装置 Download PDF

Info

Publication number
WO2021114921A1
WO2021114921A1 PCT/CN2020/124282 CN2020124282W WO2021114921A1 WO 2021114921 A1 WO2021114921 A1 WO 2021114921A1 CN 2020124282 W CN2020124282 W CN 2020124282W WO 2021114921 A1 WO2021114921 A1 WO 2021114921A1
Authority
WO
WIPO (PCT)
Prior art keywords
composite
node
relationship network
nodes
candidate
Prior art date
Application number
PCT/CN2020/124282
Other languages
English (en)
French (fr)
Inventor
张屹綮
肖凯
王维强
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021114921A1 publication Critical patent/WO2021114921A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • One or more embodiments of this specification relate to the field of computer technology, and in particular to a method and device for constructing a relationship network based on privacy protection.
  • the relationship network is often used to describe the relationship between multiple entities. For example, taking users as entities, each node in the relationship network corresponds to a user, and the edges between nodes correspond to the connection relationship between users, which can describe an interpersonal relationship network.
  • some group activity data may be involved. For example, the output of aggregated account data through the relationship network is an effective means to combat batch attacks and organized black property attacks. If this kind of group activity data involves user-private relationship data such as friend data, transfer data, and device environment operation data, the user-private relationship data is very easy to be de-analyzed or even leaked.
  • the privacy protection-based relationship network construction method and device described in one or more embodiments of this specification can be used to solve one or more of the problems mentioned in the background art section.
  • a method for constructing a relationship network based on privacy protection wherein the relationship network based on privacy protection is formed by a plurality of composite nodes, and the multiple composite nodes describe the association relationship by connecting edges, and A single composite node corresponds to multiple original nodes in the candidate relationship network, and each original node corresponds to each user, and the connection edge between the original nodes describes the association relationship between the corresponding users; the method includes: obtaining the candidate relationship network; The original nodes in the candidate relationship network are divided into multiple composite nodes according to the preset composite node capacity, where the number of original nodes corresponding to a single composite node does not exceed the composite node capacity; for the multiple composite nodes , Detecting whether there is a connecting edge between the two; based on the detection result, using a differential privacy method to add edges and weights to the multiple composite nodes, thereby constructing a relationship network based on privacy protection.
  • the candidate relationship network is obtained in the following manner: obtaining user identities based on multiple candidate users provided by a third business party; based on the user identity, filtering out the multiple candidates from the initial relationship network
  • the original node corresponding to the user and its neighbor nodes within a predetermined order are used as candidate nodes; the relationship network formed by the candidate nodes is used as the candidate relationship network.
  • the dividing the original nodes in the candidate relationship network into multiple composite nodes according to a preset composite node capacity includes: determining the number of original nodes in the candidate relationship network; The number of nodes and the capacity of the composite node determine a first number, where the first number is the maximum number of composite nodes that can be divided when the number of original nodes corresponding to each composite node is equal to the capacity of the composite node; Among the original nodes in the candidate relationship network, the first number of original nodes are randomly selected as the reference nodes of each composite node; for each reference node, a second number of original nodes are determined from the candidate relationship network. , And the corresponding reference node together as the corresponding composite node, and the second number is 1 unit smaller than the first number.
  • the multiple composite nodes include a first composite node and a second composite node, the first composite node corresponds to a first original node, and the second composite node corresponds to a second original node, so
  • detecting whether there is a connecting edge between the two includes: determining the first composite node in the case that there is a connecting edge between the first original node and the second original node There is a connecting edge between the second composite node.
  • the detection result includes a set of connected edges between each composite node, and the number of connected edges in the set of connected edges. Based on the detection result, a differential privacy method is used for the multiple composites. Adding edges and weights to nodes includes adding noise at the first privacy cost to the number of connected edges.
  • the noise under the first privacy cost satisfies a Laplacian distribution whose scaling parameter is the reciprocal of the first privacy cost.
  • the noise at the first privacy cost is: generating a first random value through a predetermined random algorithm, and when the independent variable of the Laplacian distribution is the first random value, the Laplacian The value of the dependent variable of the Sri Lankan distribution.
  • the adding edges and weights to the multiple composite nodes by using a differential privacy method based on the detection result further includes: selecting a third number of connected edges from the connected edge set; constructing each composite node A fourth number of noise connected edges, where the noise connected edges are connected edges outside the set of connected edges.
  • the fifth number is obtained by adding noise under the first privacy cost to the number of connected edges, the maximum number of connected edges between each composite node is the sixth number, and the third number is equal to the The ratio of the fourth quantity is consistent with the ratio of the fifth quantity to the sixth quantity.
  • the set of connected edges includes a first connected edge, and the connected edges in the set of connected edges respectively correspond to a given initial weight, and the third number is selected from the set of connected edges.
  • the connecting edge includes: for the first connecting edge, adding noise that meets the bilateral geometric distribution based on the cumulative probability of the second privacy cost on a given initial weight to obtain the corresponding first noise weight, and the second The privacy cost is the difference between the predetermined overall privacy cost and the first privacy cost; when the first noise weight is greater than the first weight threshold, the first connection edge is selected as the privacy protection-based relationship network Connecting edges, and using the first noise weight as the weight of the first connecting edge.
  • the given initial weight is 1, and noise is added to the first connecting edge in the following manner: a predetermined random algorithm is used to generate random values in a predetermined interval for the set bilateral distribution; The value of the independent variable of the geometric distribution when the random value is obtained; the weight after adding noise to the first connecting edge is the sum of the initial weight and the value of the independent variable.
  • the first weight threshold is to obtain a first proportion of connections in the case of unilateral filtering of each connected edge in the connected edge set according to the high-pass filter under the second privacy cost
  • the threshold of the independent variable of the edge where the first ratio is the ratio of the following first term to the second term: the first term is based on the number of connected edges added with noise under the first privacy cost
  • the fifth number; the second term is the maximum number of connected edges between each composite node.
  • the fourth number is determined according to the filtering ratio of the high-pass filter under the second privacy cost
  • the second privacy cost is the difference between the predetermined overall privacy cost and the first privacy cost
  • the ratio of the difference between the fourth number and the following items is consistent with the filtering ratio of the high-pass filter under the second privacy cost: the maximum number of connected edges between each composite node, based on the number of connected edges added to the first The number of connected edges obtained after noise at a privacy cost.
  • the multiple composite nodes include a third composite node and a fourth composite node, and there is no connection edge in the connected edge set between the third composite node and the fourth composite node.
  • Said constructing a fourth number of noisy connection edges for each composite node includes: adding a second connection edge with an initial weight of 0 to the third composite node and the fourth composite node; generating for the second connection edge The cumulative probability under the second privacy cost meets the noise weight of the exponential distribution; in the case that the noise weight generated for the second connection edge is greater than 0, the second connection edge is determined as the added connection Edge, the generated noise weight is the weight of the second connected edge.
  • the second connecting edge generates a noise weight that satisfies the exponential distribution under the second privacy cost in the following manner: a predetermined random algorithm is used to generate a random value with a predetermined probability interval; In the case where the exponential distribution under the second privacy cost takes the random value, the value of the independent variable is used as the noise weight generated for the second connected edge.
  • a method for determining a user community among a plurality of candidate users comprising: obtaining a privacy protection-based relationship network generated for the plurality of candidate users by using the method described in the first aspect Use a predetermined group recognition model to process the relationship network based on privacy protection to obtain multiple composite node sets; determine at least one candidate composite node set from the multiple composite node sets for the data party of the initial relationship network to follow a single candidate Each candidate composite node in the composite node set determines a corresponding target user community from the multiple candidate users.
  • the processing of the relationship network based on privacy protection by using the predetermined group recognition model to obtain a plurality of composite node sets includes: taking the relationship network based on privacy protection as the initial current relationship network, and in the initial current relationship network
  • Each composite node is regarded as a community; the following modularity maximization steps are performed: each composite node is moved to the community where the adjacent composite node is located, and the modularity of the current relationship network with the community as the node is calculated , And choose a movement method that maximizes the modularity; merge the composite nodes in the same community after the move into the same community, and iteratively execute the modularity maximization step until the modularity of the current relationship network no longer changes ; For each community, a corresponding set of composite nodes is generated.
  • the modularity of the current relationship network is obtained by summing the node degrees of each community.
  • the node degree of the first community in the current relationship network is the difference between the first and second items of the following:
  • the term is the ratio of the total number of connected edges in the first community to the total number of connected edges in the current relationship network;
  • the second term is the total degree of each composite node clustered into the first community and The square of the ratio of 2 times the total number of connected edges in the current relationship network.
  • the modularity maximization step is determined by one of the following methods: greedy algorithm, simulated annealing algorithm, random walk algorithm, statistical principle algorithm, label propagation algorithm, InfoMap algorithm, Louvain algorithm.
  • the determining at least one candidate composite node set from the plurality of composite node sets includes: determining a composite node set whose number of composite nodes is greater than a predetermined number threshold as a candidate composite node set; thereby making the initial relationship
  • the data party of the network determines the corresponding target user community from the multiple candidate users according to each candidate composite node in a single candidate composite node set in the following way: according to preset mapping rules, each candidate composite node is mapped separately To multiple initial users of the initial relationship network; selecting a user of the multiple candidate users from the multiple initial users, and identifying the selected user as a target user community corresponding to the single candidate composite node set.
  • the execution subject of the method is the data party of the initial relationship network
  • the multiple composite node sets include a first composite node set
  • the at least one candidate composite node set is determined from the multiple composite node sets.
  • the node set includes: according to a preset mapping rule, each compound node in the first compound node set is respectively mapped to a plurality of initial users of the initial relationship network; and detecting whether there is a predetermined number of the plurality of initial users Or a predetermined proportion of initial users, the registration time is shorter than a predetermined time threshold; if it exists, the first composite node set is determined as the candidate composite node set.
  • an apparatus for constructing a relationship network based on privacy protection wherein the relationship network based on privacy protection is constituted by multiple composite nodes, and the multiple composite nodes describe the association relationship by connecting edges, and a single composite
  • the nodes correspond to multiple original nodes in the candidate relationship network, and each original node corresponds to each user, and the connection edge between the original nodes describes the association relationship between the corresponding users;
  • the device includes: an obtaining unit configured to obtain the candidate A relationship network; a node construction unit configured to divide the original nodes in the candidate relationship network into multiple composite nodes according to a preset composite node capacity, wherein the number of original nodes corresponding to a single composite node does not exceed the composite node Capacity; a detection unit configured to detect whether there is a connecting edge between the two of the multiple composite nodes; the edge construction unit is configured to add edges and weights to the multiple composite nodes using a differential privacy method based on the detection result , So as to build a relationship network based on privacy
  • an apparatus for determining a user community among a plurality of candidate users comprising: an obtaining unit configured to obtain privacy-based protection generated by the apparatus of the third aspect for the plurality of candidate users
  • the processing unit is configured to use a predetermined group recognition model to process the privacy protection-based relationship network to obtain multiple composite node sets;
  • the determining unit is configured to determine at least one candidate composite node from the multiple composite node sets Collection, so that the data party of the initial relationship network determines the corresponding target user community from the multiple candidate users according to each candidate composite node in a single candidate composite node set.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect or the second aspect.
  • a computing device including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, it implements the first aspect or The second aspect of the method.
  • the embodiments of this specification provide a method and device for constructing a relationship network based on privacy protection.
  • various users are pre-aggregated and noise is added to form a relationship network that satisfies differential privacy, thereby effectively protecting user relationship privacy.
  • a privacy-protected relationship network is used for user community discovery, it is not limited to a specific data holder. Any data processor with computing capabilities can identify candidate composite nodes in the relationship network through the community recognition model, and The data holder via the initial relationship network queries the user ID contained in the user community to provide it to the corresponding business party. In this way, the convenience of group identification can be increased on the basis of ensuring data security.
  • Fig. 1 shows a schematic diagram of an implementation architecture of an embodiment of the present specification.
  • Fig. 2 shows a schematic diagram of an implementation scenario of an embodiment of the present specification.
  • Fig. 3 shows a schematic diagram of a construction process of a relationship network based on privacy protection according to an embodiment.
  • Fig. 4 shows a schematic flow chart of determining a user community among multiple candidate users according to an embodiment.
  • Fig. 5 shows a schematic diagram of an apparatus for constructing a relationship network based on privacy protection according to an embodiment.
  • Fig. 6 shows a schematic block diagram of an apparatus for determining a user community among multiple candidate users according to an embodiment.
  • Figure 1 shows a schematic diagram of the implementation architecture of this specific implementation scenario.
  • the implementation architecture includes a business platform, business parties, and users.
  • the business platform is used to provide user communication and a medium for business interaction between business parties and users.
  • Alipay platform, WeChat platform, etc. can be platforms that take into account social and business services.
  • Users can register as registered users on the service platform, and each service party can provide users with related services in the form of sub-applications or register as a registered service party on the service platform.
  • the business platform can record user behavior information on the business platform (such as payment behavior data, transfer behavior data, consumption behavior data, etc.), which can be used to establish a relationship network.
  • each node can represent an entity (such as users, commodities, merchants, etc.).
  • the association relationship between entities is represented by connecting edges, and the nodes corresponding to entities with direct association relationships are mutually connected through connecting edges. connection.
  • each circle represents an entity
  • a line segment represents a connected edge.
  • Nodes that have a direct association relationship can be first-order neighbor nodes to each other. If two nodes are connected through a path connecting an edge, a node, and another connecting edge, the two nodes can be called second-order neighbor nodes with each other, and so on.
  • the order of neighbor nodes is consistent with the minimum number of connected edges in the middle interval.
  • the entities in the relationship network may be users.
  • the server form of the business platform may also be a server cluster form, which is not limited in this specification.
  • the computing platform pre-stores or remotely obtains the original relationship network generated based on the user behavior data recorded by the service platform in FIG. 1, where the user ID registered by the user on the service platform represents the user.
  • the business party a is suspected of encountering a batch attack or an organized group attack, and it can provide each user ID in its own user data to the computing platform.
  • the computing platform extracts the relationship network related to these users from the original relationship network according to the user ID provided by the business party a, as a candidate relationship network, and further divides multiple nodes in the candidate relationship network to form a composite node.
  • the composite node includes multiple nodes in the original relationship network.
  • each composite node is identified by a circular or elliptical dashed frame, and the connection relationship between the composite nodes is described by the dashed line.
  • the composite node can be regarded as a virtual user, corresponding to multiple users in the initial relationship network.
  • differential privacy can be used to introduce noise to the network structure, so that the processing result of the noise-introduced relationship network is consistent with the processing result of the original relationship network.
  • this relationship network not only effectively reduces the scale, but also provides accurate user aggregation relationships.
  • This relationship network can be called a relationship network based on privacy protection.
  • the computing platform can provide a third-party platform with a relationship network based on privacy protection.
  • the third-party platform uses a pre-trained group identification model to identify groups in the relationship network, and feeds back the recognition results to the business party a. In this way, it can help the business party a to prevent and fight offensive behaviors, illegal property behaviors and other gangs committing crimes, and eliminate risks.
  • the computing platform in Figure 2 can be located on the business platform in Figure 1, or can be located on other trusted platforms that are responsible for confidentiality.
  • the third-party platform can be any platform with certain computing capabilities, and it can belong to the computing platform in Figure 2 or an independent other party platform, which is not limited in this specification.
  • Figure 1 and Figure 2 only show an implementation architecture of the embodiment of this specification.
  • the computing platform in Figure 2 builds a relationship network based on privacy protection on the basis of an initial relationship network, which can be applied to any Related scenarios involving user relationships, such as mining malicious groups, identifying potential customers, etc., will not be listed here.
  • Fig. 3 shows a flowchart of a method for constructing a relationship network based on privacy protection according to an embodiment.
  • the execution subject of the method can be any system, equipment, device, platform or server with computing and processing capabilities.
  • the relationship network based on privacy protection combines the original nodes in the candidate relationship network on the basis of the candidate relationship network, and adds noise at a predetermined privacy cost, and hides the true connection relationship between the nodes through the differential privacy method.
  • the method for constructing a relationship network based on privacy protection includes the following steps: Step 301: Obtain a candidate relationship network; Step 302: Divide the original node in the candidate relationship network into multiple composite nodes according to the preset composite node capacity. Nodes, where the number of original nodes included in a single composite node does not exceed the capacity of the composite node; step 303, for the above multiple composite nodes, detect whether there is a connection edge between the two; step 304, based on the detection result, use a differential privacy method to The above multiple composite nodes add connection edges and weights, thereby constructing a relationship network based on privacy protection.
  • a candidate relationship network is obtained. It can be understood that the candidate relationship network is a basic network used to construct a relationship network based on privacy protection.
  • the initial relationship network is usually a relationship network constructed according to application scenarios and containing association relationships between entities, which contains a large amount of entity relationship data, such as user relationship data.
  • entity relationship data such as user relationship data.
  • the initial relationship network can be used to describe the user relationship network.
  • the nodes in the initial relationship network may be referred to as original nodes.
  • the initial relationship network usually includes a network formed by association relationships between all entities in a related scenario.
  • the candidate relationship network can be the initial relationship network itself or a part of the initial relationship network.
  • the relationship network corresponding to the candidate node can be extracted from the initial relationship network as the candidate relationship network through a predetermined node range.
  • the candidate node may be the above-mentioned given node.
  • each user in the user list provided by the business party a The nodes corresponding to these users can be called a given node. If these users are a total of 26 users from user a, user b to user z, the nodes corresponding to these 26 users are called candidate nodes. At this time, the nodes corresponding to user a, user b to user z and their mutual connection relationships can be extracted from the initial relationship network as a candidate relationship network.
  • the candidate relationship network does not include the node corresponding to user 11, and therefore does not include user 11
  • the connection edge between the corresponding node and the node corresponding to user a, but includes the nodes corresponding to user a, user b, and user d, and the node corresponding to user a is one of the nodes corresponding to user b and the node corresponding to user d.
  • the candidate node may be a node associated with a given node, for example, in addition to the given node, it also includes neighbor nodes within a predetermined order of the given node.
  • a given node can be a node corresponding to each user in the user list provided by business party a
  • a candidate node can be a given node and its predetermined order (such as second order).
  • Neighbor nodes such as first-order neighbor nodes, second-order neighbor nodes, etc.
  • the candidate relationship network can be a relationship network composed of a given node and its neighbor nodes within a predetermined order, which will not be repeated here.
  • the relationship network corresponding to the candidate node may also be adjusted. After further screening, it is used as a candidate relationship network, and the detailed process is described in step 302.
  • the candidate relational network is the initial relational network or a partial network extracted from the initial relational network
  • the node itself still exists as an independent node, that is to say, the node has not changed. Therefore, it can also be called the original node, which is only in the candidate relational network.
  • the properties of some original nodes have changed, for example, the number of connected edges (or the number of neighbor nodes) is reduced.
  • Step 302 Divide the nodes in the candidate relationship network into multiple composite nodes according to a preset composite node capacity.
  • the composite node capacity can be a preset value based on experience or the size of the candidate relationship network (including the number of nodes), such as 5, 8, 10, and so on.
  • the number of original nodes corresponding to a composite node does not exceed the capacity of the composite node.
  • the number of original nodes corresponding to a composite node can be the same as the capacity of the composite node.
  • the number of composite nodes can be determined according to the capacity of the composite nodes (hereinafter referred to as k).
  • the number of composite nodes may be the integer part of the ratio of the number of nodes in the candidate relationship network to the composite node capacity k.
  • the number of composite nodes may also be the integer part minus one. In this way, there can be a certain error space in the subsequent differential privacy processing, so that the relationship privacy can be maintained on the basis of ensuring the accuracy of the user relationship.
  • the candidate relational network can be randomly filtered, so that the number of nodes in the candidate relational network is consistent with the product of the number of composite nodes and the composite node capacity k, or is consistent with the composite node capacity k.
  • the number of nodes plus 1 is the number of nodes that is consistent with the product of the composite node capacity k, which is specifically related to the method for determining the number of composite nodes. In this way, it is equivalent to filtering out the nodes of the remaining part of the capacity of the original candidate relationship network and the composite node, which corresponds to the node selection described in step 301.
  • the number of nodes in the candidate relationship network after screening is the remainder of the number of nodes in the original candidate relationship network minus the number of nodes in the original candidate relationship network divided by the remainder of the composite node capacity k. That is to say, the number of composite nodes is determined according to the number of original nodes in the candidate relationship network and the capacity of the composite nodes, and then the original nodes in the candidate relationship network are screened according to the number of composite nodes. In this way, the original nodes in the candidate relationship network can be evenly distributed to each composite node, that is, each composite node corresponds to k original nodes, and the number of composite nodes is determined accordingly.
  • composite nodes can be divided for each original node in the candidate relationship network.
  • the number of original nodes corresponding to each composite node is equal to the capacity of the composite node, the number of matching nodes that can be divided can be recorded as the first number.
  • the first number of original nodes can be randomly selected from the candidate relationship network as the reference node of each composite node (similar to a "seed" function).
  • the compound node capacity k, k-1 the second number
  • nodes from near to far away from the reference node are added to the corresponding compound node.
  • the distance can be understood as the number of connected edges on the connection path, for example, the distance between the reference node and its first-order neighbor node is 1.
  • the original node that has been added to other composite nodes can be excluded.
  • step 303 may be used to detect whether there is a connection edge between the multiple composite nodes.
  • first composite node includes the original nodes A, B, C, D, and E
  • second composite node includes the original nodes F, G, H, I, and J.
  • any node (such as node C, can also be called the first original node) and any node of the original node F, G, H, I, J (such as node H, can be called the second original node
  • a set of connected edges may be determined for storing the detected connected edges.
  • the detection result may also include the number of connected edges in the connected edge set.
  • Step 304 Based on the detection result, use a differential privacy method to add connection edges and weights to multiple composite nodes, thereby constructing a relationship network based on privacy protection. It can be understood that when using a relational network for business processing, it is often necessary to consider the degree of association between nodes, and the degree of association can be described by the weight of the connecting edge.
  • Differential privacy is a means in cryptography, which aims to provide a way to maximize the accuracy of data query when querying from a statistical database, while minimizing the chance of identifying its records.
  • M is a random algorithm
  • PM is a set of all possible outputs of M.
  • Pr[M(D) ⁇ SM] ⁇ e ⁇ ⁇ Pr[M(D') ⁇ SM]
  • the algorithm M provides ⁇ -differential privacy protection, where the parameter ⁇ is called the privacy protection budget, which is used to balance the degree of privacy protection and accuracy.
  • can usually be set in advance. The closer ⁇ is to 0, the closer e ⁇ is to 1, and the closer the processing results of the random algorithm to the two adjacent data sets D and D'are, the stronger the degree of privacy protection.
  • the differential privacy method can reduce the sensitivity of query results by adding controlled noise.
  • the differential privacy method is usually used in the query field. Under the implementation framework of this specification, it is envisaged to use the differential privacy method to generate a relationship network based on privacy protection.
  • differential privacy is generally composable.
  • the two privacy factors are the result of differential privacy combination of ⁇ 1 and ⁇ 2 respectively, and the privacy factor is ⁇ 1 + ⁇ 2 .
  • the purpose of the differential privacy method is to balance privacy and accuracy, that is, on the basis of protecting privacy and taking into account accuracy.
  • the purpose of adding noise to the connecting edge is to make the random algorithm process the noise-added relational network and process the original noise network to obtain the same result, so as to achieve the purpose of protecting privacy.
  • a part of connected edges may be selected from the connected edges detected in step 303, and a certain number of connected edges may be added between composite nodes that do not have connected edges.
  • the privacy factor ⁇ 2 can be preset based on experience.
  • the first privacy factor ⁇ 2 can be positively correlated with the total number of composite nodes. For example, if the number of composite nodes n 1 is 1000, ⁇ 2 can be set to 0.01.
  • the second privacy factor ⁇ 1 can be determined by ⁇ - ⁇ 2 .
  • the differential privacy of the number of connected edges can be performed through the Laplace mechanism (Laplace).
  • the Laplacian noise is added to the number of connected edges in the connected edge set.
  • the noise that conforms to the Laplace distribution can be expressed by the probability density function: noise(y) ⁇ e -
  • the Laplace mechanism is a noise mechanism suitable for continuous data.
  • the sensitivity is used to indicate at least how many numbers in the data set are changed, which will affect the output result.
  • the sensitivity can be 1, and the Laplacian distribution of ⁇ 2 -differential privacy that is satisfied can be denoted as Lap(1/ ⁇ 2 ).
  • Lap(1/ ⁇ 2 ) the expression of Laplace distributed noise is:
  • Y is the Laplacian distribution when p takes 1/ ⁇ 2.
  • , the number of connected edges after adding Laplace noise can be: m 1
  • a preselected random algorithm is used to generate a random value for x (which can be called the first random value).
  • p) is P(1 / ⁇ 2 ).
  • P(1/ ⁇ 2 ) can be regarded as the increased number of noise edges.
  • the connecting edges between the composite nodes can be further selected and added according to the number of connected edges after the noise is added.
  • the third number of connected edges is selected from the connected edges detected in step 303, and the number of noisy connected edges (connected edges that do not exist in the detection result) constructed for each composite node is the fourth
  • the number of connected edges is the fifth number after adding the noise under the first privacy cost to the number of connected edges, and the maximum number of connected edges between each composite node is the sixth number
  • the third and fourth numbers The ratio is consistent with the ratio of the fifth quantity to the following quantity: the difference between the sixth quantity and the fifth quantity. Since the fifth number corresponding to the third number adds the amount of noise to the number of originally detected connected edges, the proportion of connected edges selected from the detected connected edges can be increased.
  • the sixth number m 0 in the above optional embodiment may be determined based on the number n 1 of composite nodes.
  • the ratio of the third quantity to the fourth quantity is:
  • the third choice in the number of connections from the edge E 1 generally, the right to be re-connected to the larger edge retention, weight smaller connecting side deleted.
  • any connected edge (such as the connected edge in the set E 1 ) detected in step 303, it can be recorded as the first connected edge.
  • the first connected edge on a given initial weight, add According to the bilateral geometric distribution of noise based on the second privacy cost, the corresponding first noise weight is obtained.
  • the first noise weight is greater than the first weight threshold, the first connection edge is selected as the connection in the privacy protection-based relational network Edge, and use the first noise weight as the weight of the first connected edge.
  • the second privacy cost ⁇ 1 is the difference between the predetermined overall privacy cost ⁇ and the first privacy cost ⁇ 2 .
  • Pr ( ⁇
  • ⁇ ) takes a value between 0-1, which can be determined by random sampling.
  • Pr ( ⁇
  • ⁇ ) it can uniquely correspond to a ⁇ .
  • the corresponding noise ⁇ can be determined.
  • weight make it weights the initial value W 0 is 1 or 0, where 1 is the initial state of real one connecting edge, otherwise 0, then e initial 1
  • the weight is 1.
  • its weight after adding noise is expressed as 1+ ⁇ .
  • the connecting edge e 1 satisfies ⁇ 1 -differential privacy, its weight after adding noise should be large enough to distinguish it from the node relationship in the original relationship network.
  • the weight 1+ ⁇ after adding noise can be compared with the first weight threshold ⁇ . That is to say, add noise ⁇ to W 0 to obtain the weight We 1 , then: when We 1 ⁇ ⁇ is satisfied, the corresponding connecting edge e 1 satisfies ⁇ 1 -differential privacy.
  • e 1 can be determined as the connecting edge between composite nodes in the relational network under differential privacy.
  • the weight of the connecting edge e 1 is We 1 . It can be understood that the weight is the weight after noise is added, and therefore, the privacy of the user relationship can be guaranteed.
  • the first weight threshold ⁇ can be set according to the threshold, or can be determined by means such as high-pass filtering.
  • the high-pass filtering method as an example, according to the principle of high-pass filtering, assuming that the first weight threshold is ⁇ , use M′ i to denote the weight of the i-th connected edge in E 1, let then:
  • adopts the rounded up form of the calculation result:
  • the value of ⁇ is the integer part of the calculation result plus one. This is because ⁇ is used as the lower weight threshold for adding noise.
  • the value of ⁇ is large, it can ensure that the noise is large enough, which is beneficial to maintaining the privacy of the user relationship.
  • the third number of connected edges can be selected from the connected edges detected in step 303 based on the comparison of the weight of the noise-added connected edges with ⁇ .
  • step 303 need to connect the detected edges (e.g., set in the connecting side E 1) addition, to increase the number of the fourth connecting edge, a relationship network based on the privacy composite node between the connecting side.
  • These connecting edges are temporarily assumed in the process of adding connecting edges. They can also be regarded as “connected edges with a weight of 0”. If the conditions are met, they will be added as connecting edges in a relational network based on privacy protection. Otherwise, it is still deemed that there are no connected edges.
  • a fourth number (for example, s) of connected edges can be randomly selected from the above-mentioned “connected edges with a weight of 0” as the connected edges in the relational network based on privacy protection, and randomized. Generate a weight of a predetermined value range (such as between 0-1). Among them, the randomly generated weight may be greater than a predetermined threshold, such as greater than 0.3, and so on. Then, the fourth number of connected edges is selected in descending order of the generated weights, and the weight of each connected edge is the generated weight.
  • a weight can be generated for each "connected edge with a weight of 0" according to the binomial distribution noise, and s connected edges can be selected according to the principle of a high-pass filter.
  • the fourth number s can be determined by the fifth number m 0 , the sixth number m 1 and the aforementioned first weight threshold ⁇ and the second privacy cost ⁇ 1 .
  • the noise weight generated by each connected edge with an initial weight of 0 satisfies the exponential distribution:
  • Pr[X ⁇ x] 1- ⁇ x- ⁇ +1 .
  • differential privacy processing based on the first privacy factor ⁇ 2 is performed on the number of existing connected edges.
  • Fig. 4 shows a method of determining a user community among multiple candidate users by using a relationship network based on privacy protection.
  • the method can be executed by an execution subject consistent with the method shown in FIG. 3, or by another execution subject (for example, a merchant who provides a user ID in FIG. 1), which is not limited herein.
  • the method for determining a user community among multiple candidate users shown in FIG. 4 includes the following steps: Step 401: Obtain a privacy protection-based relationship network generated for multiple candidate users; Step 402: Use a predetermined community identification model to process privacy-based
  • the protected relationship network obtains multiple composite node sets; step 403, at least one candidate composite node set is determined from the multiple composite node sets, so that the data party of the initial relationship network can follow each candidate composite node in the single candidate composite node set Determine the target user community from multiple candidate users.
  • a privacy protection-based relationship network generated for multiple candidate users is obtained.
  • the candidate users here can be provided by the corresponding business party.
  • the corresponding business party is, for example, the business provider of the consumer platform (such as a merchant).
  • the multiple user IDs provided by the corresponding business parties may be the registration IDs of their counterparts (such as consumers) on a certain business platform on the business platform. Each user ID corresponds to a candidate user.
  • the service platform can generate the initial user relationship network in advance.
  • the data party of the initial relationship network can determine the candidate relationship network from the initial relationship network based on these candidate users, and divide the original node in the candidate relationship network into multiple composite nodes according to the preset composite node capacity. Nodes, detect whether there is a connection edge between the two, based on the detection result, use the differential privacy method to add the connection edge and weight to the above-mentioned multiple composite nodes, thereby constructing a relationship network based on privacy protection.
  • the candidate relationship network may include users provided by the corresponding service party and their neighbor nodes within a predetermined order in the initial relationship network. This process has been described in the embodiment shown in FIG. 3, and will not be repeated here.
  • the relationship network based on privacy protection can be obtained locally.
  • a predetermined group recognition model is used to process the relationship network based on privacy protection to obtain multiple composite node sets.
  • the predetermined group recognition model is, for example, the Louvian algorithm, the maximum connected graph, and so on.
  • each composite node in the relationship network based on privacy protection can be regarded as a community, and then each composite node can be moved to the community of the adjacent composite node to calculate the modularity of the entire relationship network Size, and choose a movement method that maximizes modularity. Then, combine the moved composite nodes in the same community into a new community, and repeat the above steps until the modularity no longer increases.
  • Each community can be regarded as a set of composite nodes.
  • the modularity can be determined in the following ways:
  • n c is the number of communities in the current relationship network, initially is the number of communities in the relationship network based on privacy protection
  • l c is the total number of connected edges in community c
  • d c is each compound clustered by community c
  • m is the total number of connected edges in the current relational network
  • initially is the total number of connected edges in the relational network based on privacy protection.
  • Modularity optimization algorithms can be implemented using algorithms such as greedy algorithm (Newmann algorithm), simulated annealing algorithm, random walk algorithm, statistical principle algorithm, tag propagation algorithm, InfoMap algorithm, Louvain algorithm and the like.
  • step 403 at least one candidate composite node set is determined from the multiple composite node sets.
  • the data party of the initial relationship network can determine the corresponding candidate from multiple candidate users according to each candidate compound node in a single candidate compound node set.
  • Target user community At least one candidate composite node set is determined from the multiple composite node sets.
  • a composite node set whose number of composite nodes is greater than a predetermined number threshold (for example, 10) can be determined as a candidate composite node set.
  • a predetermined number threshold for example, 10
  • the data party of the initial relationship network can determine the corresponding target user community from multiple candidate users according to each candidate composite node in a single candidate composite node set in the following manner:
  • mapping rules map each candidate composite node to multiple initial users of the initial relationship network; select a user among multiple candidate users from the multiple initial users obtained, and identify the selected user as The target user community corresponding to a single candidate composite node set.
  • the generator of the initial relationship network may record the corresponding relationship between the composite node and the original node when generating the relationship network based on privacy protection.
  • the mapping rule here can be the corresponding relationship here.
  • the execution subject of the method shown in FIG. 4 is the data party of the initial relationship network. At this time, the execution subject may determine the candidate composite node set according to the method in the aforementioned possible design, and may also determine the candidate composite node set by other methods.
  • the above-mentioned execution subject may first map each composite node in the first composite node set to the initial relationship network according to a preset mapping rule. Multiple initial users, and then detect whether there are a predetermined number (such as 20) or a predetermined proportion (such as 60%) of the initial users among the multiple initial users, and the registration time is shorter than the predetermined time threshold (such as 1 month), If it exists, the first composite node set is determined as the candidate composite node set. Otherwise, it can be determined that the first composite node set is not a candidate composite node set.
  • a predetermined number such as 20
  • a predetermined proportion such as 60%
  • the candidate user IDs may include those not provided by the corresponding business party. After comparing the other user IDs in the user IDs, the remaining candidate user IDs can be identified as user groups after these user IDs are filtered out from the candidate user IDs.
  • the corresponding target user community in the candidate composite node set can be provided to the corresponding business party.
  • the user groups here may be individual user IDs of a batch attack or an organized group. After the corresponding business party obtains the corresponding user group information, it can conduct corresponding defense or accountability processing. Optionally, there may be only one or multiple target user groups, which are used to provide references for corresponding business parties.
  • the privacy protection-based relationship network construction method can be used to pre-aggregate users and add noise when providing user relationship networks to form a relationship network that satisfies differential privacy, thereby effectively protecting On the basis of user relationship privacy, reduce the amount of data processing and improve the effectiveness of the user relationship network.
  • a privacy-protected relationship network is used for user community discovery, it is not limited to a specific data holder. Any data processing party with computing power can identify candidate composite nodes in the relationship network through the community recognition model, and The data holder via the initial relationship network queries the user ID contained in the user community to provide it to the corresponding business party. In this way, the convenience of group identification can be increased on the basis of ensuring data security.
  • a privacy protection-based relationship network construction device is also provided.
  • the relationship network based on privacy protection is composed of multiple composite nodes, and the relationship between multiple composite nodes is described by connecting edges.
  • a single composite node corresponds to multiple original nodes in the candidate relationship network, and each original node corresponds to each user.
  • the connecting edges between the original nodes describe the association relationship between the corresponding users.
  • Fig. 5 shows a schematic block diagram of an apparatus for constructing a relational network based on privacy protection according to an embodiment. As shown in FIG.
  • the device 500 includes: an acquiring unit 51 configured to acquire a candidate relationship network; a node construction unit 52 configured to divide the original node in the candidate relationship network into a plurality of composite nodes according to a preset composite node capacity , Where the number of original nodes corresponding to a single composite node does not exceed the capacity of the composite node; the detection unit 53 is configured to detect whether there is a connecting edge between the two for multiple composite nodes; the edge construction unit 54 is configured to be based on the detection result,
  • the differential privacy method is used to add edges and weights to multiple composite nodes to build a relationship network based on privacy protection.
  • FIG. 6 shows an apparatus 600 for determining a user community among a plurality of candidate users.
  • the device 600 at least includes: an acquiring unit 61, configured to acquire a privacy protection-based relationship network generated by the device 500 for multiple candidate users; a processing unit 62, configured to use a predetermined group recognition model to process a privacy protection-based relationship network to obtain Multiple composite node sets; the determining unit 63 is configured to determine at least one candidate composite node set from the above multiple composite node sets, so that the data party of the initial relationship network can select multiple candidate composite nodes according to each candidate composite node set in a single candidate composite node set. The corresponding target user community is determined among the candidate users.
  • the above device 600 for determining a user community among multiple candidate users shown in FIG. 6 corresponds to the method embodiment shown in FIG. 4, and the corresponding description in the method embodiment corresponding to FIG. 4 is also applicable.
  • the device for determining the user community among multiple candidate users shown in FIG. 6 will not be repeated here.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the correspondingly described method.
  • a computing device including a memory and a processor, the memory stores executable code, and the processor implements the correspondingly described method when the executable code is executed.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Computing Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

基于隐私保护的关系网络构建方法和装置,可以利用在提供用户关系网络时,将用户关系预先进行聚合,添加噪声,形成满足差分隐私的关系网络,从而在有效保护用户关系隐私的基础上,减少数据处理量,提高用户关系网络的有效性。进一步地,基于隐私保护的关系网络用于用户团体发掘时,不局限于特定的数据持有方,任意有计算能力的数据处理方都可以通过团体识别模型识别关系网络中的候选复合节点,并经由初始关系网络的数据持有方查询出用户团体中包含的用户ID,以提供给相应业务方,如此,可以在保证数据安全的基础上增加团体识别的便利性。

Description

基于隐私保护的关系网络构建方法及装置 技术领域
本说明书一个或多个实施例涉及计算机技术领域,尤其涉及基于隐私保护的关系网络构建方法及装置。
背景技术
随着大数据化的发展趋势,关系网络的应用越来越广泛。关系网络往往用于描述多个实体之间的关联关系。例如,将用户作为实体,关系网络中的每个节点对应有个用户,节点之间的边对应用户之间的连接关系,可以描述出一个人际关系网络。关系网络应用过程中,可能涉及一些团体活动数据,例如,通过人际关系网络输出具有聚集性的账户数据,作为打击批量攻击和有组织的黑产攻击的有效手段。这种团体活动数据如果涉及诸如好友数据、转账数据、同设备环境操作数据等具有用户隐私的关系数据,那么用户隐私的关系数据就非常容易被反解析甚至泄露。
发明内容
本说明书一个或多个实施例描述的基于隐私保护的关系网络构建方法及装置,可以用于解决背景技术部分提到的一个或多个问题。
根据第一方面,提供了一种基于隐私保护的关系网络构建方法,其中,其中,基于隐私保护的关系网络通过多个复合节点构成,所述多个复合节点之间通过连接边描述关联关系,单个复合节点对应候选关系网络中的多个原始节点,各个原始节点分别对应各个用户,原始节点之间的连接边描述相应用户之间的关联关系;所述方法包括:获取所述候选关系网络;将所述候选关系网络中的原始节点按照预设的复合节点容量,划分出多个复合节点,其中,单个复合节点对应的原始节点数量不超过所述复合节点容量;针对所述多个复合节点,检测两两之间是否存在连接边;基于检测结果,利用差分隐私方式对所述多个复合节点添加边和权重,从而构建基于隐私保护的关系网络。
在一个实施例中,所述候选关系网络通过以下方式获取:获取基于第三业务方提供的多个候选用户的用户标识;基于所述用户标识,从初始关系网络中筛选出所述多个候选用户对应的原始节点,及其预定阶数内的邻居节点,作为候选节点;将所述候选节点构成的关系网络,作为候选关系网络。
在一个实施例中,所述将所述候选关系网络中的原始节点按照预设的复合节点容量,划分出多个复合节点包括:确定所述候选关系网络中的原始节点数量;根据所述原始节点数量和所述复合节点容量,确定第一数量,所述第一数量为,在各个复合节点对应的原始节点数量与所述复合节点容量相等的情况下,最多可以划分的复合节点数量;从所述候选关系网络中的原始节点中,随机选取所述第一数量的原始节点,作为各个复合节点的基准节点;对各个基准节点,分别从所述候选关系网络中确定第二数量的原始节点,和相应基准节点一起作为相应的复合节点,所述第二数量比所述第一数量小1个单位。
在一个实施例中,所述多个复合节点包括第一复合节点和第二复合节点,所述第一复合节点对应有第一原始节点,所述第二复合节点对应有第二原始节点,所述针对所述多个复合节点,检测两两之间是否存在连接边包括:在所述第一原始节点和所述第二原始节点之间存在连接边的情况下,确定所述第一复合节点和所述第二复合节点之间存在连接边。
在一个实施例中,所述检测结果包括,各个复合节点之间的连接边集合,以及所述连接边集合中的连接边数量,所述基于检测结果,利用差分隐私方式对所述多个复合节点添加边和权重包括:对所述连接边数量添加在第一隐私代价下的噪声。
在一个实施例中,所述在第一隐私代价下的噪声满足缩放参数为所述第一隐私代价的倒数的拉普拉斯分布。
在一个实施例中,所述在第一隐私代价下的噪声为,通过预定的随机算法生成第一随机值,在拉普拉斯分布的自变量为所述第一随机值时,拉普拉斯分布的因变量值。
在一个实施例中,所述基于检测结果,利用差分隐私方式对所述多个复合节点添加边和权重还包括:从所述连接边集合中选择第三数量的连接边;为各个复合节点构造第四数量的噪声连接边,所述噪声连接边是所述连接边集合之外的连接边。
在一个实施例中,对所述连接边数量添加在第一隐私代价下的噪声后得到第五数量,各个复合节点之间的最大连接边数量为第六数量,所述第三数量和所述第四数量的比值,与所述第五数量与所述第六数量的比值一致。
在一个实施例中,所述连接边集合中包括第一连接边,所述连接边集合中的连接边分别对应有给定一致的初始权重,所述从所述连接边集合中选择第三数量的连接边包括:对于所述第一连接边,在给定的初始权重上,添加符合基于第二隐私代价的累积概 率满足双边几何分布的噪声,得到相应的第一噪声权重,所述第二隐私代价是预定的整体隐私代价与所述第一隐私代价的差;在所述第一噪声权重大于第一权重阈值的情况下,选择所述第一连接边作为基于隐私保护的关系网络中的连接边,并将所述第一噪声权重作为所述第一连接边的权重。
在一个实施例中,所述给定的初始权重为1,并且,通过以下方式为所述第一连接边添加噪声:通过预定的随机算法为集合双边分布生成预定区间内的随机值;确定双边几何分布的自变量在得到所述随机值的情况下自变量的取值;为所述第一连接边添加噪声后的权重为所述初始权重与所述自变量的取值的和。
在一个实施例中,所述第一权重阈值是对所述连接边集合中的各个连接边,按照所述第二隐私代价下的高通滤波器进行单边滤波情况下,得到第一比例的连接边的自变量阈值,其中,所述第一比例是以下第一项与第二项的比值:所述第一项为基于对所述连接边数量添加在第一隐私代价下的噪声后得到的第五数量;所述第二项为各个复合节点之间的最大连接边数量。
在一个实施例中,所述第四数量是按照第二隐私代价下的高通滤波器的过滤比例确定的,所述第二隐私代价是预定的整体隐私代价与所述第一隐私代价的差,所述第四数量与以下项的差的比值与所述第二隐私代价下的高通滤波器的过滤比例一致:各个复合节点之间的最大连接边数量、基于对所述连接边数量添加在第一隐私代价下的噪声后得到的连接边数量。
在一个实施例中,所述多个复合节点包括第三复合节点和第四复合节点,所述第三复合节点和所述第四复合节点之间不存在所述连接边集合中的连接边相连,所述为各个复合节点构造第四数量的噪声连接边包括:为所述第三复合节点和所述第四复合节点添加初始权重为0的第二连接边;为所述第二连接边生成满足在所述第二隐私代价下的累积概率满足指数分布的噪声权重;在为所述第二连接边生成的噪声权重大于0的情况下,将所述第二联街边确定为添加的连接边,所生成的噪声权重为所述第二连接边的权重。
在一个实施例中,通过以下方式为所述第二连接边生成满足在所述第二隐私代价下的指数分布的噪声权重:通过预定的随机算法生成一个预定概率区间的随机值;将在所述第二隐私代价下的指数分布取所述随机值的情况下,自变量的取值作为为所述第二连接边生成的噪声权重。
根据第二方面,提供了一种在多个候选用户中确定用户团体的方法,所述方法包括:获取利用第一方面所述的方法为所述多个候选用户生成的基于隐私保护的关系网络;利用预定的团体识别模型处理基于隐私保护的关系网络,得到多个复合节点集合;从所述多个复合节点集合中确定至少一个候选复合节点集合,以供初始关系网络的数据方按照单个候选复合节点集合中的各个候选复合节点从所述多个候选用户中确定出相应的目标用户团体。
在一个实施例中,所述利用预定的团体识别模型处理基于隐私保护的关系网络,得到多个复合节点集合包括:将基于隐私保护的关系网络作为初始的当前关系网络,在初始的当前关系网络中,每个复合节点作为一个社区;执行以下模块度最大化步骤:将每个复合节点移动到与之相邻的复合节点所在的社区中,计算以社区为节点的当前关系网络的模块度大小,并选择使得模块度最大的一种移动方式;对移动后在同一个社区内的复合节点合并到同一个社区,迭代执行所述模块度最大化步骤,直至当前关系网络的模块度不再变化;针对各个社区,分别生成相应的各个复合节点集合。
在一个实施例中,当前关系网络的模块度通过对各个社区的节点度求和得到,当前关系网络中第一社区的节点度为,以下第一项与第二项的差:所述第一项为,所述第一社区中总的连接边数量与当前关系网络中总的连接边数的比值;所述第二项为,聚类到所述第一社区的各个复合节点的总度数与当前关系网络中总的连接边数的2倍的比值的平方。
在一个实施例中,所述模块度最大化步骤通过以下方式之一确定:贪心算法、仿真退火算法、随机游走算法、统计原理算法、标签传播算法、InfoMap算法、Louvain算法。
在一个实施例中,所述从所述多个复合节点集合中确定至少一个候选复合节点集合包括:将复合节点的数量大于预定数量阈值的复合节点集合确定为候选复合节点集合;从而使得初始关系网络的数据方通过以下方式按照单个候选复合节点集合中的各个候选复合节点从所述多个候选用户中确定出相应的目标用户团体:按照预先设定的映射规则,将各个候选复合节点分别映射到初始关系网络的多个初始用户;从所述多个初始用户中选择所述多个候选用户中的用户,并将选择出的用户识别为所述单个候选复合节点集合对应的目标用户团体。
在一个实施例中,所述方法的执行主体为初始关系网络的数据方,所述多个复合节点集合包括第一复合节点集合,所述从所述多个复合节点集合中确定至少一个候选复 合节点集合包括:按照预先设定的映射规则,将所述第一复合节点集合中的各个复合节点分别映射到初始关系网络的多个初始用户;检测所述多个初始用户中,是否存在预定数量或预定比例的初始用户,注册时间短于预定的时间阈值;若存在,则将所述第一复合节点集合确定为候选复合节点集合。
根据第三方面,提供了一种基于隐私保护的关系网络构建装置,其中,基于隐私保护的关系网络通过多个复合节点构成,所述多个复合节点之间通过连接边描述关联关系,单个复合节点对应候选关系网络中的多个原始节点,各个原始节点分别对应各个用户,原始节点之间的连接边描述相应用户之间的关联关系;所述装置包括:获取单元,配置为获取所述候选关系网络;节点构建单元,配置为将所述候选关系网络中的原始节点按照预设的复合节点容量,划分出多个复合节点,其中,单个复合节点对应的原始节点数量不超过所述复合节点容量;检测单元,配置为针对所述多个复合节点,检测两两之间是否存在连接边;边构建单元,配置为基于检测结果,利用差分隐私方式对所述多个复合节点添加边和权重,从而构建基于隐私保护的关系网络。
根据第四方面,提供了一种在多个候选用户中确定用户团体的装置,所述装置包括:获取单元,配置为获取利用第三方面的装置为所述多个候选用户生成的基于隐私保护的关系网络;处理单元,配置为利用预定的团体识别模型处理基于隐私保护的关系网络,得到多个复合节点集合;确定单元,配置为从所述多个复合节点集合中确定至少一个候选复合节点集合,以供初始关系网络的数据方按照单个候选复合节点集合中的各个候选复合节点从所述多个候选用户中确定出相应的目标用户团体。
根据第五方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行上述第一方面或第二方面的方法。
根据第六方面,提供了一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现上述第一方面或第二方面的方法。
本说明书实施例提供了基于隐私保护的关系网络构建方法和装置,可以利用在提供用户关系网络时,将各个用户预先聚合,添加噪声,形成满足差分隐私的关系网络,从而在有效保护用户关系隐私的基础上,减少数据处理量,提高用户关系网络的有效性。进一步地,基于隐私保护的关系网络用于用户团体发掘时,不局限于特定的数据持有方,任意有计算能力的数据处理方都可以通过团体识别模型识别关系网络中的候选复合节点,并经由初始关系网络的数据持有方查询出用户团体中包含的用户ID,以提供给相 应业务方,如此,可以在保证数据安全的基础上增加团体识别的便利性。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1示出本说明书实施例的一个实施架构示意图。
图2示出本说明书实施例的一个实施场景示意图。
图3示出根据一个实施例的基于隐私保护的关系网络构建流程示意图。
图4示出根据一个实施例的在多个候选用户中确定用户团体的流程示意图。
图5示出根据一个实施例的基于隐私保护的关系网络构建装置示意图。
图6示出根据一个实施例的在多个候选用户中确定用户团体的装置的示意性框图。
具体实施方式
下面结合附图,对本说明书提供的方案进行描述。
首先,结合图1、图2示出一个具体实施场景进行说明。
图1给出了该具体实施场景的实施架构示意图。如图1所示,在该实施架构中,包括业务平台、业务方和用户。业务平台用于提供用户交流,以及业务方和用户之间进行业务交互的媒介。例如支付宝平台、微信平台,等等,可以是兼顾社交和商业服务的平台。用户可以在业务平台注册成为注册用户,各个业务方可以以子应用,或者在业务平台注册成为注册业务方等形式为用户提供相关业务。
业务平台可以记录用户在业务平台的行为信息(如支付行为数据、转账行为数据、消费行为数据等等),这些行为信息可以用来建立关系网络。关系网络中,每个节点都可以表示一个实体(如用户、商品、商户等等),实体之间的关联关系通过连接边来表示,具有直接关联关系的实体对应的节点之间通过连接边互相连接。如图1所示,每个圆圈代表一个实体,一条线段代表一个连接边。具有直接关联关系的节点可以互为一阶邻居节点。如果两个节点中间经过一个连接边、一个节点、另一个连接边的路径相连接, 则这两个节点可以相互称为二阶邻居节点,以此类推。通常,邻居节点的阶数,与中间间隔的最少连接边数一致。在本说明书的实施架构下,关系网络中的实体可以是用户。
可以理解的是,图1中的业务方、用户仅为示例,实践中,分别可以是任意数量,业务平台的服务器形式也可能是服务器集群形式,本说明书对这些都不做限定。
请参考图2,给出在图1的实施架构下,一个具体实施场景示意图。在该实施场景中,计算平台预先存储或远程获取基于图1中的业务平台记录的用户行为数据生成的原始关系网络,该原始关系网络中以用户在业务平台注册的用户ID表示用户。业务方a疑似遭遇批量攻击或有组织的团伙攻击,其可以向计算平台提供自有用户数据中的各个用户ID。计算平台根据业务方a提供的用户ID从原始关系网络中抽取与这些用户相关的关系网络,作为候选关系网络,进一步地,将候选关系网络中的多个节点进行划分,形成复合节点,每个复合节点包括多个原始关系网络中的节点。如图2所示,每个复合节点用圆形或椭圆形虚线框标识,复合节点之间的连接关系通过虚线描述。该复合节点可以看作一个虚拟的用户,对应了初始关系网络中的多个用户。在建立复合节点的关系网络中,可以通过差分隐私的方式进行,对网络结构引入噪声,使得对引入噪声的关系网络的处理结果与对原关系网络的处理结果一致。如此,这个关系网络在有效保护用户之间的关系隐私数据基础上,不仅规模得到了有效精简,还可以提供准确的用户聚集性关系。该关系网络可以称为基于隐私保护的关系网络。
当该基于隐私保护的关系网络提供给任意第三方平台时,不会泄露用户的关系隐私数据。因此,计算平台可以向第三方平台提供基于隐私保护的关系网络,由第三方平台通过预先训练好的团体识别模型,识别关系网络中的团伙,并将识别结果反馈给业务方a。这样,可以帮助业务方a预防和打击攻击行为、黑产行为等的团伙作案,排除风险。
这里要说明的是,图2中的计算平台可以设于图1中的业务平台,也可以是设于负有保密职责的其他可信平台。第三方平台可以是具有一定计算能力的任意平台,其可以属于图2中的计算平台,也可以是独立的他方平台,本说明书对此不做限定。
其中,图1、图2仅给出了本说明书实施例的一个实施架构,实践中,图2中的计算平台在初始的关系网络的基础上构建基于隐私保护的关系网络的流程可以应用于任何涉及用户关系的相关场景中,例如挖掘恶意团伙、识别潜在客户等等,在此不再一一例举。
下面首先详细介绍基于隐私保护的关系网络构建的具体过程。
图3示出根据一个实施例的基于隐私保护的关系网络构建方法流程图。该方法的执行主体可以是任何具有计算、处理能力的系统、设备、装置、平台或服务器。例如图1示出的业务平台。基于隐私保护的关系网络在候选关系网络的基础上,将候选关系网络中的原始节点进行组合,并添加在预定隐私代价下的噪声,通过差分隐私方式隐藏节点之间的真实连接关系。
如图3所示,基于隐私保护的关系网络构建方法包括以下步骤:步骤301,获取候选关系网络;步骤302,将候选关系网络中的原始节点按照预设的复合节点容量,划分出多个复合节点,其中,单个复合节点包括的原始节点数量不超过复合节点容量;步骤303,针对上述多个复合节点,检测两两之间是否存在连接边;步骤304,基于检测结果,利用差分隐私方式对上述多个复合节点添加连接边和权重,从而构建基于隐私保护的关系网络。
首先,步骤301,获取候选关系网络。可以理解,候选关系网络是用来构建基于隐私保护的关系网络的基础网络。
初始的关系网络往往是根据应用场景构建的、包含实体之间的关联关系的关系网络,其包含了大量实体关系数据,例如用户关系数据。例如图1、图2示出的实施场景中,初始的关系网络可以用于描述用户关系的网络。在本说明书实施例中,初始关系网络中的节点可以称为原始节点。初始关系网络通常包含了相关场景下,所有实体之间的关联关系构成的网络。候选关系网络可以是初始关系网络本身,也可以是初始关系网络的一部分。
根据一个实施方式,可以通过预先给定的节点范围,从初始关系网络中提取候选节点对应的关系网络作为候选关系网络。
在一个实施例中,候选节点可以是上述给定的节点,以图2示出的实施场景为例,业务方a提供的用户列表中的各个用户。这些用户对应的节点就可以称作给定的节点。假如这些用户为用户a、用户b至用户z共26个用户,这26个用户对应的节点称为候选节点。此时,可以从初始关系网络中提取出用户a、用户b至用户z对应的节点及它们相互之间的连接关系,作为候选关系网络。举例而言,如果用户a和用户b、用户d对应的节点有连接关系,还和用户11对应的节点有连接关系,则由于候选关系网络中不包括用户11对应的节点,因此也不包括用户11对应的节点和用户a对应的节点之间 的连接边,但包括用户a、用户b、用户d对应的节点,以及用户a对应的节点分别和用户b对应的节点、用户d对应的节点之间的连接边。
在另一个实施例中,候选节点可以与给定的节点相关联的节点,例如除了给定的节点,还包括给定的节点预定阶数内的邻居节点。以图2示出的实施场景为例,给定节点可以是业务方a提供的用户列表中的各个用户对应的节点,候选节点可以是给定节点以及其预定阶数(如二阶)内的邻居节点,如一阶邻居节点、二阶邻居节点等。此时,候选关系网络就可以是给定的节点及其预定阶数内的邻居节点组成的关系网络,在此不再赘述。
可以理解的是,由于候选关系网络中的节点个数可能是任意数量,在一些实施例中,为了复合节点中数量的均衡,在可选的实施例中,还可以对候选节点对应的关系网络进一步筛选后作为候选关系网络,详细过程在步骤302中描述。
由于候选关系网络是初始关系网络或者从初始关系网络中提取的部分网络,节点本身还是作为独立节点存在,也就是说节点没有变化,因此,还可以称为原始节点,仅仅是在候选关系网络中,一些原始节点的属性发生了变化,例如,连接边数(或邻居节点个数)减少。
步骤302,将候选关系网络中的节点按照预设的复合节点容量,划分出多个复合节点。其中,每个复合节点所包括的原始节点数量不超过上述复合节点容量。复合节点容量可以是根据经验或候选关系网络的规模(包含节点数)预设的数值,例如5、8、10等。一个复合节点对应的原始节点的数量不超过复合节点容量。通常,一个复合节点对应的原始节点的数量可以与复合节点容量一致。
在一个实施例中,可以根据复合节点容量(以下记为k)来确定复合节点的数量。例如,复合节点的数量可以为候选关系网络中的节点数量与复合节点容量k的比值的整数部分。在可选的实现方式中,复合节点的数量还可以为上述整数部分减去1。如此,可以使得在后续的差分隐私处理中,有一定的误差空间,从而可以在保证用户关系准确度的基础上维护关系隐私。
在可选的实现方式中,可以在确定复合节点数量之后,对候选关系网络进行随机过滤,使得候选关系网络中的节点数量,与复合节点的数量和复合节点容量k的乘积一致,或者与复合节点的数量加1后的数值与复合节点容量k的乘积一致的节点数,具体和复合节点的数量的确定方法相关。这样,相当于过滤掉了原候选关系网络与复合节点 容量的余数部分的节点,和步骤301中描述的节点筛选对应。换句话说,筛选后的候选关系网络中的节点数,是原候选关系网络中的节点数减去原候选关系网络中的节点数除以复合节点容量k的余数后的数值。也就是说,根据候选关系网络中的原始节点数量和复合节点容量确定复合节点数量,再根据复合节点数量对候选关系网络中的原始节点进行筛选。如此,可以使得候选关系网络中的原始节点被均匀分配到各个复合节点,即每个复合节点均对应有k个原始节点,并据此确定复合节点的数量。
确定了复合节点的数量之后,可以针对候选关系网络中的各个原始节点划分复合节点。在各个复合节点对应的原始节点数量与复合节点容量相等的情况下,可以划分的符合节点数量可以记为第一数量。在一个实施例中,可以从候选关系网络中随机选择出第一数量的原始节点,作为各个复合节点的基准节点(类似“种子”的作用)。然后,按照复合节点容量k,将距离基准节点由近到远的k-1个(第二数量)节点加入到相应的复合节点。这里,距离可以理解为连接路径上的连接边数,例如基准节点和其一阶邻居节点之间的距离为1。可选地,遍历各个基准节点,检测距离由近到远的原始节点时,可以排除已经加入到其他复合节点的原始节点。
这样,由原始节点构成的候选关系网络,就变成了多个复合节点构成的集合。为了使得多个复合节点形成关系网络,进一步地,可以通过步骤303,针对多个复合节点,检测两两之间是否存在连接边。
首先,可以检测两两复合节点的原始节点之间是否存在连接边。如果存在连接边,则确定两个复合节点之间存在连接边。为了更清楚地进行描述,假设第一复合节点包括原始节点A、B、C、D、E,第二复合节点包括原始节点F、G、H、I、J,如果原始节点A、B、C、D、E中的任一节点(如节点C,也可以称为第一原始节点)和原始节点F、G、H、I、J任一节点(如节点H,可以称为第二原始节点)之间有连接边,则可以确定第一复合节点和第二复合节点之间有连接边。如果第一复合节点中没有一个原始节点和第二复合节点中的任意原始节点之间有连接边,则第一复合节点和第二复合节点之间没有连接边。
根据一个实施例,根据步骤303的检测结果,可以确定一个连接边集合,用于存储检测到的连接边。可选地,检测结果中还可以包括连接边集合中的连接边数量。
步骤304,基于检测结果,利用差分隐私方式对多个复合节点添加连接边和权重,从而构建基于隐私保护的关系网络。可以理解,利用关系网络进行业务处理时,往往还需要考虑节点之间的关联程度,该关联程度可以用连接边的权重来描述。
差分隐私(differential privacy)是密码学中的一种手段,旨在提供一种当从统计数据库查询时,最大化数据查询的准确性,同时最大限度减少识别其记录的机会。设有随机算法M,PM为M所有可能的输出构成的集合。对于任意两个邻近数据集D和D'以及PM的任何子集SM,若随机算法M满足:Pr[M(D)∈SM]<=e ε×Pr[M(D')∈SM],则称算法M提供ε-差分隐私保护,其中参数ε称为隐私保护预算,用于平衡隐私保护程度和准确度。ε通常可以预先设定。ε越接近0,e ε越接近1,随机算法对两个邻近数据集D和D'的处理结果越接近,隐私保护程度越强。
差分隐私方法可以以添加受控噪声实现降低查询结果的灵敏度。差分隐私方法通常用于查询领域,在本说明书的实施架构下,设想利用差分隐私的方式生成基于隐私保护的关系网络。
本领域技术人员可以理解,差分隐私通常具有可组合性。两个隐私因子分别为ε 1和ε 2的差分隐私组合结果,其隐私因子为ε 12。用ε表示整体的差分隐私代价,则ε=ε 12。ε越大,隐私保护强度越低,因此,可以预先设定ε的最大值,作为最大隐私代价,例如ε设为1。
容易理解的是,差分隐私方法的目的是在隐私和准确度之间进行平衡,即,在保护隐私的基础上,兼顾准确度。为连接边添加噪声的目的,是为了使得随机算法处理添加噪声后的关系网络与处理原噪声网络得到相同的结果,从而达到保护隐私的目的。为了生成基于隐私保护的关系网络,可以从步骤303中检测到的连接边中选择一部分连接边,并在不存在连接边的复合节点之间添加一定数量的连接边。
在本说明书的一个可能设计中下,可以考虑连接边的满足第一隐私因子ε 2差分隐私和连接边权重满足第二隐私因子ε 1的差分隐私。在差分隐私方式中,隐私因子越小,个体对整体结果的影响越小,隐私保护越好,但准确度会越低,因此,隐私因子ε 2可以根据经验预先设定。可选地,第一隐私因子ε 2可以与复合节点总数量正相关,例如,复合节点的数量n 1为1000,可以将ε 2设为0.01。当整体的隐私因子ε和第一隐私因子ε 2被设定时,第二隐私因子ε 1可以由ε-ε 2确定。
基于以上理论,首先对连接边进行差分隐私处理。复合节点之间的连接边集合可以记为E 1,连接边数量可以记为|E 1|。为了确保基于隐私保护的关系网络的准确性,可以对|E 1|添加噪声,从而增加连接边集合中的连接边的选择比例(原理下文详细描述)。
在可选的实现方式中,可以通过拉普拉斯机制(Laplace)进行连接边数量的差分 隐私。也就是说,为连接边集合中的连接边数量增加拉普拉斯噪声。符合拉普拉斯分布的噪声,其可以用概率密度函数:noise(y)∝e -|y|/λ表示,其均值为0,标准偏差是
Figure PCTCN2020124282-appb-000001
拉普拉斯机制是适用于连续数据的噪音机制。对于给定数据集D,差分隐私保护概念中的随机算法M(D)=f(D)+Y,算法M提供ε-差分隐私保护的情况下,Y服从参数为敏感度/ε的Laplace分布,即Lap(敏感度/ε)。其中,灵敏度用于表示至少改变数据集中的多少个数,会对输出结果产生影响。例如在由用户的关系数据构成的关系网络中,灵敏度可以为1,满足的ε 2-差分隐私的Laplace分布可以记为Lap(1/ε 2)。假设拉普拉斯分布噪声的表达为:
Figure PCTCN2020124282-appb-000002
将连接边的拉普拉斯噪声的第一隐私因子ε 2、敏感度1代入,则Y为p取1/ε 2时的拉普拉斯分布。根据随机算法M(D)=f(D)+Y可知,随机算法针对的数据集为复合节点之间真实存在的连接边的集合E 1时,f(D)表示边的数量,f(D)=|E 1|,可以使得添加拉普拉斯噪声后的连接边数量为:m 1=|E 1|+P(1/ε 2)。其中,使用预先选定的随机算法为x生成一个随机值(可以称为第一随机值),在x取该随机值时,拉普拉斯函数P(x|p)的值就是P(1/ε 2)。P(1/ε 2)可以看作增加的噪声边数量。在对连接边添加噪声后,还可以进一步根据添加噪声后的连接边数量选择和添加复合节点之间的连接边。在一个可能的实施例中,假设从步骤303中检测到的连接边中选择第三数量的连接边,为各个复合节点构造的噪声连接边(检测结果中不存在的连接边)数量为第四数量,对连接边数量添加在第一隐私代价下的噪声后得到连接边的数量为第五数量,各个复合节点之间的最大连接边数量为第六数量,则第三数量和第四数量的比值,与第五数量和以下数量的比值一致:第六数量与第五数量的差。由于第三数量对应的第五数量在本来检测到的连接边数量上添加了噪声数量,因此可以增加从检测到的连接边中选择的连接边的比例。
假设复合节点的数量为n 1,则考虑指向复合节点自身的连接,最大连接边数量为m 0=n 1(n 1-1)/2。也就是说,上文可选实施例中的第六数量m 0可以基于复合节点的数量n 1确定。第五数量为前述的m 1=|E 1|+P(1/ε 2)。第三数量与第四数量的比值为:
Figure PCTCN2020124282-appb-000003
下面详细介绍选择第三数量和添加第四数量的连接边的过程。
一方面,从E 1中选择第三数量的连接边,通常,可以将权重较大的连接边保留, 权重较小的连接边删除。
根据一个实施方式,可以对于步骤303中检测到的任意一个连接边(如集合E 1中的连接边),记作第一连接边,对于第一连接边,在给定的初始权重上,添加符合基于第二隐私代价的双边几何分布的噪声,得到相应的第一噪声权重,在第一噪声权重大于第一权重阈值的情况下,选择第一连接边作为基于隐私保护的关系网络中的连接边,并将第一噪声权重作为第一连接边的权重。其中第二隐私代价ε 1是预定的整体隐私代价ε与第一隐私代价ε 2的差。
作为示例,在第二隐私代价ε 1下,令
Figure PCTCN2020124282-appb-000004
则噪声δ的累积概率值满足双边几何分布:
Figure PCTCN2020124282-appb-000005
其中,取到所有δ的总概率为1,也就是说,Pr(Δ=δ|α)在0-1之间取值,可以由随机抽样确定。当确定一个累计概率值Pr(Δ=δ|α)时,可以唯一对应到一个δ。通过随机生成的概率值,可以确定相应的噪声δ。
对于检测到的连接边集合E 1中的连接边e 1,令其权重的初始值W 0为1或0,其中,1表示初始状态真实存在一条连接边,否则为0,则e 1的初始权重为1。添加噪声后,其添加噪声后的权重表示为1+δ。
如果连接边e 1满足ε 1-差分隐私,则其添加噪声后的权重应足够大,以与原始关系网络中的节点关系区分开。为了使得权重足够大,可以将添加噪声后的权重1+δ与第一权重阈值θ进行比较。也就是说,为W 0添加噪声δ,得到权重We 1,则有:W e 1≥θ满足时,相应连接边e 1满足ε 1-差分隐私。此时,可以将e 1确定为差分隐私下的关系网络中,复合节点之间的连接边。其中,连接边e 1的权重为W e 1。可以理解,该权重是添加噪声后的权重,因此,可以保证用户关系隐私。
其中,第一权重阈值θ可以根据阈值设定,也可以通过诸如高通滤波的方式确定。以高通滤波的方式为例,根据高通滤波原理,假设第一权重阈值为θ,用M′ i表示E 1中的第i个连接边的权重,令
Figure PCTCN2020124282-appb-000006
则:
Figure PCTCN2020124282-appb-000007
在本说明书实施例中,适应单边滤波情形(排除负值噪声),即:
Figure PCTCN2020124282-appb-000008
从而:
Figure PCTCN2020124282-appb-000009
可选地,θ采用计算结果的上取整形式:
Figure PCTCN2020124282-appb-000010
其中,当计算结果为小数时,θ的值为计算结果的整数部分加1。这是因为,θ作为添加噪声的下限权重阈值,θ的值较大时,可以保证噪声足够大,有利于维护用户关系隐私。
根据第一权重阈值θ,就可以在根据添加噪声后的连接边的权重与θ的比较,从步骤303中检测到的连接边中选择第三数量的连接边。
另一方面,需要在步骤303检测到的连接边(如集合E 1中的连接边)之外,增加第四数量的连接边,作为基于隐私保护的关系网络中复合节点间的连接边。这些连接边是在添加连接边过程中暂时假设的连接边,也可以将其看作“权重为0的连接边”,如果满足条件,则被添加为基于隐私保护的关系网络中的连接边,否则,仍然视为不存在连接边。
根据一个可能的实施例,可以从上述各个“权重为0的连接边”随机选择出第四数量(如记为s个)连接边作为基于隐私保护的关系网络中的连接边,并为其随机生成预定取值范围(如0-1之间)的权重。其中,随机生成的权重可以大于预定阈值,如大于0.3等等。然后,按照生成的权重从大到小的顺序选择第四数量的连接边,各个连接边的权重为所生成的权重。
在可选的实现方式中,可以按照二项分布噪声为各个“权重为0的连接边”生成权重,并按照高通滤波器的原理选择s个连接边。
根据前述类似的高通滤波原理,在单边滤波的情况下:
Figure PCTCN2020124282-appb-000011
于是:
Figure PCTCN2020124282-appb-000012
也就是说,第四数量s可以通过第五数量m 0、第六数量m 1及前述的第一权重阈值θ、第二隐私代价ε 1确定。其中,各个初始权重为0的连接边生成的噪声权重满足指数分布:
Pr[X≤x]=1-α x-θ+1
这是因为,用M′ i表示第i个连接边的权重的情况下,通过高通滤波器需满足以下条件:
Figure PCTCN2020124282-appb-000013
进一步地,对于所有概率大于θ的连接边,累计概率分布为:
Figure PCTCN2020124282-appb-000014
因此,如果生成一个0-1之间的随机值作为累计概率P(X≤x),那么可以唯一对应到一个自变量x的值,该自变量x的值也就是随机赋予当前连接边的噪声权重ω。
可以理解,由于x的值可能为正也可能为负,而在本说明书实施例中,只有权重为正的连接边才有意义,因此,如果所生成的权重ω≥0,那么可以将相应的连接边作为一条噪声边,相应的权重对应噪声边的噪声权重。如此,直至确定出s条噪声边。
以上过程中,边数量噪声满足拉普拉斯分布的情况下,任意随机算法对真实存在的连接边数量为|E 1|的关系网络的处理结果,小于等于
Figure PCTCN2020124282-appb-000015
与该任意随机算法对连接边数量为:m 1=|E 1|+P(1/ε 2)的关系网络的处理结果,所以满足ε 2-差分隐私。对于连接边的权重,添加双边几何分布噪声或指数分布噪声,使得任意随机算法对包括连接边集合E 1的关系网络的处理结果,小于等于
Figure PCTCN2020124282-appb-000016
与该任意随机算法对通过添加边数量噪声以及权重噪声的关系网络的处理结果,所以满足ε 1-差分隐私。
如此,对已有连接边的数量进行基于第一隐私因子ε 2的差分隐私处理,同时,在选择连接边时,对连接边权重进行基于第二隐私因子ε 1的差分隐私处理,从而可以生成满足ε-差分隐私的关系网络,其中ε=ε 21
对于满足ε-差分隐私的关系网络,不仅简化了关系网络结构,而且加入了噪声,掩盖了原有的用户关系,因此,可以在保护用户隐私的情况下,挖掘用户之间的关系。例如,图1示出的实施场景中,根据商户提供的用户ID,发掘用户之间的团伙关系。基于隐私保护的关系网络,即使被提供给第三方平台,也不会泄露用户的关系隐私。
图4示出利用基于隐私保护的关系网络在多个候选用户中确定用户团体的方法。 该方法可以由与图3所示的方法一致的执行主体执行,也可以由其他执行主体(例如图1中提供用户ID的商户)执行,在此不作限定。
图4示出的在多个候选用户中确定用户团体的方法包括以下步骤:步骤401,获取为多个候选用户生成的基于隐私保护的关系网络;步骤402,利用预定的团体识别模型处理基于隐私保护的关系网络,得到多个复合节点集合;步骤403,从多个复合节点集合中确定至少一个候选复合节点集合,以供初始关系网络的数据方按照单个候选复合节点集合中的各个候选复合节点从多个候选用户中确定出目标用户团体。
首先,在步骤401中,获取为多个候选用户生成的基于隐私保护的关系网络。可以理解,这里的候选用户可以由相应业务方提供。相应业务方例如是消费平台的业务提供方(如商户)。相应业务方提供的多个用户ID可以是其在某个业务平台的相对业务方(如消费者)在该业务平台的注册ID。每个用户ID对应一个候选用户。该业务平台作为初始关系网络的数据方,可以预先生成初始的用户关系网络。
初始关系网络的数据方可以根据这些候选用户从初始的关系网络中确定候选关系网络,并将候选关系网络中的原始节点按照预设的复合节点容量,划分出多个复合节点,针对多个复合节点,检测两两之间是否存在连接边,基于检测结果,利用差分隐私方式对上述多个复合节点添加连接边和权重,从而构建基于隐私保护的关系网络。可选地,候选关系网络中可以包括相应业务方提供的用户及其在初始关系网络中的预定阶数内的邻居节点。该过程已在图3示出的实施例中描述,在此不再赘述。
当图4示出的流程的执行主体与初始关系网络的数据方一致时,基于隐私保护的关系网络可以从本地获取。
然后,在步骤402中,利用预定的团体识别模型处理基于隐私保护的关系网络,得到多个复合节点集合。其中,预定的团体识别模型例如是Louvian算法、最大连通图等等。
以Louvian算法为例,可以将基于隐私保护的关系网络中的每个复合节点作为一个社区,然后将每个复合节点移动到与之相邻的复合节点的社区中,计算整个关系网络的模块度大小,并选择使得模块度最大的一种移动方式。接着,将移动后在同一个社区内的复合节点组合成一个新的社区,重复以上步骤,直到模块度不再增大为止。每个社区可以看作一个复合节点集合。
根据一个实施方式,模块度可以通过以下方式确定:
Figure PCTCN2020124282-appb-000017
其中n c是当前关系网络中社区的个数,初始时为基于隐私保护的关系网络中社区的个数,l c是社区c中总连接边数,d c是社区c聚类到的各个复合节点的总度数,m是当前关系网络中总的连接边数,初始时为基于隐私保护的关系网络中总的连接边数。模块度优化算法可以采用诸如贪心算法(Newmann算法)、仿真退火算法、随机游走算法、统计原理算法、标签传播算法、InfoMap算法、Louvain算法之类的算法实现。
之后,在步骤403,从多个复合节点集合中确定至少一个候选复合节点集合。如此,如果将这至少一个候选复合节点集合提供给初始关系网络的数据方,可以使得初始关系网络的数据方按照单个候选复合节点集合中的各个候选复合节点从多个候选用户中确定出相应的目标用户团体。
根据一个可能的设计,可以将复合节点的数量大于预定数量阈值(如10个)的复合节点集合确定为候选复合节点集合。这样,可以使得初始关系网络的数据方通过以下方式按照单个候选复合节点集合中的各个候选复合节点从多个候选用户中确定出相应的目标用户团体:
按照预先设定的映射规则,将各个候选复合节点分别映射到初始关系网络的多个初始用户;从得到的多个初始用户中选择多个候选用户中的用户,并将选择出的用户识别为单个候选复合节点集合对应的目标用户团体。也就是说,查找到原始用户后,过滤掉非候选用户,剩下的用户构成目标用户团体。可选地,初始关系网络的生成方在生成基于隐私保护的关系网络时,可以记录复合节点与原始节点的对应关系。这里的映射规则就可以是这里的对应关系。
根据另一个可能的设计,图4示出的方法的执行主体为初始关系网络的数据方。此时,该执行主体可以按照前述可能设计中的方法确定候选复合节点集合,还可以通过其他方法确定候选复合节点集合。
例如,假设步骤402得到的多个复合节点集合包括第一复合节点集合,上述执行主体可以先按照预先设定的映射规则,将第一复合节点集合中的各个复合节点分别映射到初始关系网络的多个初始用户,然后,检测多个初始用户中,是否存在预定数量(如20个)或预定比例(如60%)的初始用户,注册时间短于预定的时间阈值(如1个月),若存在,则将第一复合节点集合确定为候选复合节点集合。否则,可以确定第一复合节点集合不是候选复合节点集合。
可以理解,由于步骤401中使用的基于隐私保护的关系网络,在相应业务方提供的多个用户ID基础上可能进行扩充和/或添加噪声,因此,候选用户ID中可能包含不在相应业务方提供的用户ID中的其他用户ID,通过对比从候选用户ID中筛除这些用户ID之后,剩余的候选用户ID可以被识别为用户团体。
候选复合节点集合中对应的目标用户团体,可以被提供给相应业务方。这里的用户团体可能是进行批量攻击或有组织的团伙的各个用户ID,相应业务方获取相应用户团体信息之后,可以进行相应的防御或追责处理。可选地,目标用户团体可能只有一个,也可能有多个,用于为相应业务方提供参考。
回顾以上过程,本说明书实施例所提供的基于隐私保护的关系网络构建方法,可以利用在提供用户关系网络时,将各个用户预先聚合,添加噪声,形成满足差分隐私的关系网络,从而在有效保护用户关系隐私的基础上,减少数据处理量,提高用户关系网络的有效性。进一步地,基于隐私保护的关系网络用于用户团体发掘时,不局限于特定的数据持有方,任意有计算能力的数据处理方都可以通过团体识别模型识别关系网络中的候选复合节点,并经由初始关系网络的数据持有方查询出用户团体中包含的用户ID,以提供给相应业务方,如此,可以在保证数据安全的基础上增加团体识别的便利性。
根据另一方面的实施例,还提供一种基于隐私保护的关系网络构建装置。其中,基于隐私保护的关系网络通过多个复合节点构成,多个复合节点之间通过连接边描述关联关系,单个复合节点对应候选关系网络中的多个原始节点,各个原始节点分别对应各个用户,原始节点之间的连接边描述相应用户之间的关联关系。图5示出根据一个实施例的基于隐私保护的关系网络构建装置的示意性框图。如图5所示,装置500包括:获取单元51,配置为获取候选关系网络;节点构建单元52,配置为将候选关系网络中的原始节点按照预设的复合节点容量,划分出多个复合节点,其中,单个复合节点对应的原始节点数量不超过复合节点容量;检测单元53,配置为针对多个复合节点,检测两两之间是否存在连接边;边构建单元54,配置为基于检测结果,利用差分隐私方式对多个复合节点添加边和权重,从而构建基于隐私保护的关系网络。
值得说明的是,以上对图5所示的基于隐私保护的关系网络构建装置500,与图3示出的方法实施例相对应,图3对应的方法实施例中的相应描述也适用于图5所示的基于隐私保护的关系网络构建装置,在此不再赘述。
根据另一方面的实施例,还提供一种在多个候选用户中确定用户团体的装置。图6示出了在多个候选用户中确定用户团体的装置600。装置600至少包括:获取单元61, 配置为获取利用装置500为多个候选用户生成的基于隐私保护的关系网络;处理单元62,配置为利用预定的团体识别模型处理基于隐私保护的关系网络,得到多个复合节点集合;确定单元63,配置为从上述多个复合节点集合中确定至少一个候选复合节点集合,以供初始关系网络的数据方按照单个候选复合节点集合中的各个候选复合节点从多个候选用户中确定出相应的目标用户团体。
值得说明的是,以上对图6所示的在多个候选用户中确定用户团体的装置600,与图4示出的方法实施例相对应,图4对应的方法实施例中的相应描述也适用于图6所示的在多个候选用户中确定用户团体的装置,在此不再赘述。
根据另一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行相应描述的方法。
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现相应描述的方法。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本说明书实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。
以上所述的具体实施方式,对本说明书的技术构思的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本本说明书的技术构思的具体实施方式而已,并不用于限定本说明书的技术构思的保护范围,凡在本本说明书的技术构思的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本本说明书的技术构思的保护范围之内。

Claims (25)

  1. 一种基于隐私保护的关系网络构建方法,其中,基于隐私保护的关系网络通过多个复合节点构成,所述多个复合节点之间通过连接边描述关联关系,单个复合节点对应候选关系网络中的多个原始节点,各个原始节点分别对应各个用户,原始节点之间的连接边描述相应用户之间的关联关系;所述方法包括:
    获取所述候选关系网络;
    将所述候选关系网络中的原始节点按照预设的复合节点容量,划分出多个复合节点,其中,单个复合节点对应的原始节点数量不超过所述复合节点容量;
    针对所述多个复合节点,检测两两之间是否存在连接边;
    基于检测结果,利用差分隐私方式对所述多个复合节点添加连接边和权重,从而构建基于隐私保护的关系网络。
  2. 根据权利要求1所述的方法,其中,所述候选关系网络通过以下方式获取:
    获取基于第三业务方提供的多个候选用户的用户标识;
    基于所述用户标识,从初始关系网络中筛选出所述多个候选用户对应的原始节点,及其预定阶数内的邻居节点,作为候选节点;
    将所述候选节点构成的关系网络,作为候选关系网络。
  3. 根据权利要求1所述的方法,其中,所述将所述候选关系网络中的原始节点按照预设的复合节点容量,划分出多个复合节点包括:
    确定所述候选关系网络中的原始节点数量;
    根据所述原始节点数量和所述复合节点容量,确定第一数量,所述第一数量为,在各个复合节点对应的原始节点数量与所述复合节点容量相等的情况下,最多可以划分的复合节点数量;
    从所述候选关系网络中的原始节点中,随机选取所述第一数量的原始节点,作为各个复合节点的基准节点;
    对各个基准节点,分别从所述候选关系网络中确定第二数量的原始节点,和相应基准节点一起作为相应的复合节点,所述第二数量比所述第一数量小1个单位。
  4. 根据权利要求1所述的方法,其中,所述多个复合节点包括第一复合节点和第二复合节点,所述第一复合节点对应有第一原始节点,所述第二复合节点对应有第二原始节点,所述针对所述多个复合节点,检测两两之间是否存在连接边包括:
    在所述第一原始节点和所述第二原始节点之间存在连接边的情况下,确定所述第一复合节点和所述第二复合节点之间存在连接边。
  5. 根据权利要求1所述的方法,其中,所述检测结果包括,各个复合节点之间的连接边集合,以及所述连接边集合中的连接边数量,所述基于检测结果,利用差分隐私方式对所述多个复合节点添加边和权重包括:
    对所述连接边数量添加在第一隐私代价下的噪声。
  6. 根据权利要求5所述的方法,其中,所述在第一隐私代价下的噪声满足缩放参数为所述第一隐私代价的倒数的拉普拉斯分布。
  7. 根据权利要求6所述的方法,其中,所述在第一隐私代价下的噪声为,通过预定的随机算法生成第一随机值,在拉普拉斯分布的自变量为所述第一随机值时,拉普拉斯分布的因变量值。
  8. 根据权利要求5所述的方法,其中,所述基于检测结果,利用差分隐私方式对所述多个复合节点添加边和权重还包括:
    从所述连接边集合中选择第三数量的连接边;
    为各个复合节点构造第四数量的噪声连接边,所述噪声连接边是所述连接边集合之外的连接边。
  9. 根据权利要求8所述的方法,其中,对所述连接边数量添加在第一隐私代价下的噪声后得到第五数量,各个复合节点之间的最大连接边数量为第六数量,所述第三数量和所述第四数量的比值,与所述第五数量和以下数量的比值一致:所述第六数量与所述第五数量的差。
  10. 根据权利要求8所述的方法,其中,所述连接边集合中包括第一连接边,所述连接边集合中的连接边分别对应有给定一致的初始权重,所述从所述连接边集合中选择第三数量的连接边包括:
    对于所述第一连接边,在给定的初始权重上,添加符合基于第二隐私代价的累积概率满足双边几何分布的噪声,得到相应的第一噪声权重,所述第二隐私代价是预定的整体隐私代价与所述第一隐私代价的差;
    在所述第一噪声权重大于第一权重阈值的情况下,选择所述第一连接边作为基于隐私保护的关系网络中的连接边,并将所述第一噪声权重作为所述第一连接边的权重。
  11. 根据权利要求10所述的方法,其中,所述给定的初始权重为1,并且,通过以下方式为所述第一连接边添加噪声:
    通过预定的随机算法为集合双边分布生成预定区间内的第二随机值;
    确定双边几何分布的自变量在得到所述第二随机值的情况下自变量的取值;
    为所述第一连接边添加噪声后的权重为所述初始权重与所述自变量的取值的和。
  12. 根据权利要求10所述的方法,其中,所述第一权重阈值是对所述连接边集合中的各个连接边,按照所述第二隐私代价下的高通滤波器进行单边滤波情况下,得到第一比例的连接边的自变量阈值,其中,所述第一比例是以下第一项与第二项的比值:
    所述第一项为基于对所述连接边数量添加在第一隐私代价下的噪声后得到的第五数量;
    所述第二项为各个复合节点之间的最大连接边数量与所述第五数量的差。
  13. 根据权利要求8所述的方法,所述第四数量是按照第二隐私代价下的高通滤波器的过滤比例确定的,所述第二隐私代价是预定的整体隐私代价与所述第一隐私代价的差,所述第四数量与以下项的差的比值与所述第二隐私代价下的高通滤波器的过滤比例一致:各个复合节点之间的最大连接边数量、基于对所述连接边数量添加在第一隐私代价下的噪声后得到的连接边数量。
  14. 根据权利要求13所述的方法,其中,所述多个复合节点包括第三复合节点和第四复合节点,所述第三复合节点和所述第四复合节点之间不存在所述连接边集合中的连接边相连,所述为各个复合节点构造第四数量的噪声连接边包括:
    为所述第三复合节点和所述第四复合节点添加初始权重为0的第二连接边;
    为所述第二连接边生成满足在所述第二隐私代价下的累积概率满足指数分布的噪声权重;
    在为所述第二连接边生成的噪声权重大于0的情况下,将所述第二联街边确定为添加的连接边,所生成的噪声权重为所述第二连接边的权重。
  15. 根据权利要求14所述的方法,其中,通过以下方式为所述第二连接边生成满足在所述第二隐私代价下的指数分布的噪声权重:
    通过预定的随机算法生成一个预定概率区间的随机值;
    将在所述第二隐私代价下的指数分布取所述随机值的情况下,自变量的取值作为为所述第二连接边生成的噪声权重。
  16. 一种在多个候选用户中确定用户团体的方法,所述方法包括:
    获取利用权利要求1的方法为所述多个候选用户生成的基于隐私保护的关系网络;
    利用预定的团体识别模型处理基于隐私保护的关系网络,得到多个复合节点集合;
    从所述多个复合节点集合中确定至少一个候选复合节点集合,以供初始关系网络的数据方按照单个候选复合节点集合中的各个候选复合节点从所述多个候选用户中确定出相应的目标用户团体。
  17. 根据权利要求16所述的方法,其中,所述利用预定的团体识别模型处理基于 隐私保护的关系网络,得到多个复合节点集合包括:
    将基于隐私保护的关系网络作为初始的当前关系网络,在初始的当前关系网络中,每个复合节点作为一个社区;
    执行以下模块度最大化步骤:将每个复合节点移动到与之相邻的复合节点所在的社区中,计算以社区为节点的当前关系网络的模块度大小,并选择使得模块度最大的一种移动方式;
    对移动后在同一个社区内的复合节点合并到同一个社区,迭代执行所述模块度最大化步骤,直至当前关系网络的模块度不再变化;
    针对各个社区,分别生成相应的各个复合节点集合。
  18. 根据权利要求17所述的方法,其中,当前关系网络的模块度通过对各个社区的节点度求和得到,当前关系网络中第一社区的节点度为,以下第一项与第二项的差:
    所述第一项为,所述第一社区中总的连接边数量与当前关系网络中总的连接边数的比值;
    所述第二项为,聚类到所述第一社区的各个复合节点的总度数与当前关系网络中总的连接边数的2倍的比值的平方。
  19. 根据权利要求16-18中任一所述的方法,其中,所述模块度最大化步骤通过以下方式之一确定:贪心算法、仿真退火算法、随机游走算法、统计原理算法、标签传播算法、InfoMap算法、Louvain算法。
  20. 根据权利要求16所述的方法,其中,所述从所述多个复合节点集合中确定至少一个候选复合节点集合包括:
    将复合节点的数量大于预定数量阈值的复合节点集合确定为候选复合节点集合;
    从而使得初始关系网络的数据方通过以下方式按照单个候选复合节点集合中的各个候选复合节点从所述多个候选用户中确定出相应的目标用户团体:
    按照预先设定的映射规则,将各个候选复合节点分别映射到初始关系网络的多个初始用户;
    从所述多个初始用户中选择所述多个候选用户中的用户,并将选择出的用户识别为所述单个候选复合节点集合对应的目标用户团体。
  21. 根据权利要求16所述的方法,其中,所述方法的执行主体为初始关系网络的数据方,所述多个复合节点集合包括第一复合节点集合,所述从所述多个复合节点集合中确定至少一个候选复合节点集合包括:
    按照预先设定的映射规则,将所述第一复合节点集合中的各个复合节点分别映射到 初始关系网络的多个初始用户;
    检测所述多个初始用户中,是否存在预定数量或预定比例的初始用户,注册时间短于预定的时间阈值;
    若存在,则将所述第一复合节点集合确定为候选复合节点集合。
  22. 一种基于隐私保护的关系网络构建装置,其中,基于隐私保护的关系网络通过多个复合节点构成,所述多个复合节点之间通过连接边描述关联关系,单个复合节点对应候选关系网络中的多个原始节点,各个原始节点分别对应各个用户,原始节点之间的连接边描述相应用户之间的关联关系;所述装置包括:
    获取单元,配置为获取所述候选关系网络;
    节点构建单元,配置为将所述候选关系网络中的原始节点按照预设的复合节点容量,划分出多个复合节点,其中,单个复合节点对应的原始节点数量不超过所述复合节点容量;
    检测单元,配置为针对所述多个复合节点,检测两两之间是否存在连接边;
    边构建单元,配置为基于检测结果,利用差分隐私方式对所述多个复合节点添加边和权重,从而构建基于隐私保护的关系网络。
  23. 一种在多个候选用户中确定用户团体的装置,所述装置包括:
    获取单元,配置为获取利用权利要求22的装置为所述多个候选用户生成的基于隐私保护的关系网络;
    处理单元,配置为利用预定的团体识别模型处理基于隐私保护的关系网络,得到多个复合节点集合;
    确定单元,配置为从所述多个复合节点集合中确定至少一个候选复合节点集合,以供初始关系网络的数据方按照单个候选复合节点集合中的各个候选复合节点从所述多个候选用户中确定出相应的目标用户团体。
  24. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-21中任一项的所述的方法。
  25. 一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-21中任一项所述的方法。
PCT/CN2020/124282 2019-12-13 2020-10-28 基于隐私保护的关系网络构建方法及装置 WO2021114921A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911284478.0 2019-12-13
CN201911284478.0A CN111046429B (zh) 2019-12-13 2019-12-13 基于隐私保护的关系网络构建方法及装置

Publications (1)

Publication Number Publication Date
WO2021114921A1 true WO2021114921A1 (zh) 2021-06-17

Family

ID=70236206

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124282 WO2021114921A1 (zh) 2019-12-13 2020-10-28 基于隐私保护的关系网络构建方法及装置

Country Status (3)

Country Link
CN (1) CN111046429B (zh)
TW (1) TWI724896B (zh)
WO (1) WO2021114921A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114564752A (zh) * 2022-04-28 2022-05-31 蓝象智联(杭州)科技有限公司 一种基于图联邦的黑名单传播方法
CN115828312A (zh) * 2023-02-17 2023-03-21 浙江浙能数字科技有限公司 一种面向电力用户社交网络的隐私保护方法及系统
CN115114664B (zh) * 2022-06-24 2023-05-23 浙江大学 一种面向图数据的差分隐私保护发布方法及系统

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046429B (zh) * 2019-12-13 2021-06-04 支付宝(杭州)信息技术有限公司 基于隐私保护的关系网络构建方法及装置
CN111626890B (zh) * 2020-06-03 2023-08-01 四川大学 一种基于销售信息网络的显著社团发现方法
CN111783996B (zh) * 2020-06-18 2023-08-25 杭州海康威视数字技术股份有限公司 一种数据处理方法、装置及设备
CN111737751B (zh) * 2020-07-17 2020-11-17 支付宝(杭州)信息技术有限公司 实现隐私保护的分布式数据处理的方法及装置
CN112528166A (zh) * 2020-12-16 2021-03-19 平安养老保险股份有限公司 用户关系分析方法、装置、计算机设备及存储介质
CN113361055B (zh) * 2021-07-02 2024-03-08 京东城市(北京)数字科技有限公司 扩展社交网络中的隐私处理方法、装置、电子设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110105143A1 (en) * 2009-11-03 2011-05-05 Geosolutions B.V. Proximal relevancy ranking in a layered linked node database
CN104866781A (zh) * 2015-05-27 2015-08-26 广西师范大学 面向社区检测应用的社会网络数据发布隐私保护方法
CN106650487A (zh) * 2016-09-29 2017-05-10 广西师范大学 基于多维敏感数据发布的多部图隐私保护方法
CN107918664A (zh) * 2017-11-22 2018-04-17 广西师范大学 基于不确定图的社会网络数据差分隐私保护方法
CN110032603A (zh) * 2019-01-22 2019-07-19 阿里巴巴集团控股有限公司 一种对关系网络图中的节点进行聚类的方法及装置
CN111046429A (zh) * 2019-12-13 2020-04-21 支付宝(杭州)信息技术有限公司 基于隐私保护的关系网络构建方法及装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8468244B2 (en) * 2007-01-05 2013-06-18 Digital Doors, Inc. Digital information infrastructure and method for security designated data and with granular data stores
CN105376243B (zh) * 2015-11-27 2018-08-21 中国人民解放军国防科学技术大学 基于分层随机图的在线社会网络差分隐私保护方法
CN107689950B (zh) * 2017-06-23 2019-01-29 平安科技(深圳)有限公司 数据发布方法、装置、服务器和存储介质
CN109299615B (zh) * 2017-08-07 2022-05-17 南京邮电大学 一种面向社交网络数据的差分隐私处理发布方法
CN109639747B (zh) * 2017-10-09 2020-06-26 阿里巴巴集团控股有限公司 数据请求处理、询问消息处理方法、装置以及设备
KR102175167B1 (ko) * 2018-05-09 2020-11-05 서강대학교 산학협력단 K-평균 클러스터링 기반의 데이터 마이닝 시스템 및 이를 이용한 k-평균 클러스터링 방법
CN109344643B (zh) * 2018-09-03 2022-03-29 华中科技大学 一种面向图中三角形数据发布的隐私保护方法及系统
CN109829337B (zh) * 2019-03-07 2023-07-25 广东工业大学 一种社会网络隐私保护的方法、系统及设备
CN110147996A (zh) * 2019-05-21 2019-08-20 中央财经大学 一种基于区块链的数据交易本地化差分隐私保护方法及装置
CN110288358A (zh) * 2019-06-20 2019-09-27 武汉斗鱼网络科技有限公司 一种设备团体确定方法、装置、设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110105143A1 (en) * 2009-11-03 2011-05-05 Geosolutions B.V. Proximal relevancy ranking in a layered linked node database
CN104866781A (zh) * 2015-05-27 2015-08-26 广西师范大学 面向社区检测应用的社会网络数据发布隐私保护方法
CN106650487A (zh) * 2016-09-29 2017-05-10 广西师范大学 基于多维敏感数据发布的多部图隐私保护方法
CN107918664A (zh) * 2017-11-22 2018-04-17 广西师范大学 基于不确定图的社会网络数据差分隐私保护方法
CN110032603A (zh) * 2019-01-22 2019-07-19 阿里巴巴集团控股有限公司 一种对关系网络图中的节点进行聚类的方法及装置
CN111046429A (zh) * 2019-12-13 2020-04-21 支付宝(杭州)信息技术有限公司 基于隐私保护的关系网络构建方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114564752A (zh) * 2022-04-28 2022-05-31 蓝象智联(杭州)科技有限公司 一种基于图联邦的黑名单传播方法
CN114564752B (zh) * 2022-04-28 2022-07-26 蓝象智联(杭州)科技有限公司 一种基于图联邦的黑名单传播方法
CN115114664B (zh) * 2022-06-24 2023-05-23 浙江大学 一种面向图数据的差分隐私保护发布方法及系统
CN115828312A (zh) * 2023-02-17 2023-03-21 浙江浙能数字科技有限公司 一种面向电力用户社交网络的隐私保护方法及系统

Also Published As

Publication number Publication date
TWI724896B (zh) 2021-04-11
CN111046429A (zh) 2020-04-21
TW202123118A (zh) 2021-06-16
CN111046429B (zh) 2021-06-04

Similar Documents

Publication Publication Date Title
WO2021114921A1 (zh) 基于隐私保护的关系网络构建方法及装置
CN110958220B (zh) 一种基于异构图嵌入的网络空间安全威胁检测方法及系统
Enthoven et al. An overview of federated deep learning privacy attacks and defensive strategies
Yang et al. Density-based location preservation for mobile crowdsensing with differential privacy
Ji et al. General graph data de-anonymization: From mobility traces to social networks
Sarumi et al. Discovering computer networks intrusion using data analytics and machine intelligence
Priyanga et al. An improved rough set theory based feature selection approach for intrusion detection in SCADA systems
US20220131890A1 (en) System and method for assessing insider influence on enterprise assets
Zhang et al. Graph partition based privacy-preserving scheme in social networks
Guendouzi et al. A systematic review of federated learning: Challenges, aggregation methods, and development tools
Doyle et al. Predicting complex user behavior from CDR based social networks
Waniek et al. Attack tolerance of link prediction algorithms: How to hide your relations in a social network
Galli et al. Group privacy for personalized federated learning
Shen et al. Finding mnemon: Reviving memories of node embeddings
Lu et al. A security-assured accuracy-maximised privacy preserving collaborative filtering recommendation algorithm
Boutet et al. MixNN: Protection of federated learning against inference attacks by mixing neural network layers
CN102790707A (zh) 一种归类对象的方法和装置
Thakur et al. Collusion attack from hubs in the blockchain offline channel network
CN116401708A (zh) 去中心化社交图数据的本地差分隐私保护方法和系统
Shi et al. Mitigation of a poisoning attack in federated learning by using historical distance detection
Yang et al. Achieving privacy-preserving cross-silo anomaly detection using federated XGBoost
SM et al. Improving security with federated learning
Odera Federated learning and differential privacy in clinical health: Extensive survey
Vlahavas et al. Unsupervised clustering of bitcoin transactions
Liu et al. A network embedding based approach for telecommunications fraud detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20900377

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20900377

Country of ref document: EP

Kind code of ref document: A1