WO2024109454A1

WO2024109454A1 - Label propagation method and apparatus for associated network, and computer readable storage medium

Info

Publication number: WO2024109454A1
Application number: PCT/CN2023/127581
Authority: WO
Inventors: 刘红宝; 何朔; 高鹏飞; 郑建宾; 汤韬; 邱震尧
Original assignee: 中国银联股份有限公司
Priority date: 2022-11-25
Filing date: 2023-10-30
Publication date: 2024-05-30
Also published as: CN115733763A

Abstract

The present invention provides a label propagation method and apparatus for an associated network, and a computer readable storage medium. The method comprises: constructing a first associated network on the basis of first-party data, and constructing a second associated network on the basis of second-party data; associating the first associated network and the second associated network on the basis of a security intersection protocol to obtain a federated associated network; and iteratively performing multiple rounds of label propagation on nodes of the federated associated network, wherein each round of label propagation comprises: determining the probabilities of label propagation between adjacent nodes in a federated associated graph; and for each node, determining the label of the present round of each node according to the label of the present round of a neighboring node and the label propagation probability of the neighboring node for the node. By using the method, label propagation of cross-platform networks can be realized while ensuring privacy data.

Description

A method, device and computer-readable storage medium for label propagation of an associated network

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to a Chinese patent application filed with the Chinese Patent Office on November 25, 2022, with application number 202211492068.7 and application name “A method, device and computer-readable storage medium for label propagation of an associated network”, the entire contents of which are incorporated by reference in this application.

Technical Field

The present invention belongs to the field of computers, and in particular relates to a label propagation method and device for an associated network, and a computer-readable storage medium.

Background technique

This section is intended to provide a background or context to embodiments of the invention that are recited in the claims. No description herein is admitted to be prior art by inclusion in this section.

As privacy protection laws become stricter, data cooperation between institutions increasingly needs to consider data privacy protection issues. Currently, privacy computing technology mainly focuses on scenarios such as federated learning, secure intersection, and anonymous query, all of which are joint calculations for single-point data. The data sources of the current label propagation algorithm in the association network are all local. It is impossible to achieve cross-institutional data joint application under the premise of privacy protection. The update of labels cannot use the association network data of both parties at the same time, and the data value is not efficiently utilized.

Therefore, how to achieve federal network label propagation under the premise of privacy protection is an urgent problem to be solved.

Summary of the invention

In view of the problems existing in the above-mentioned prior art, a label propagation method, device and computer-readable storage medium for an associated network are proposed. By using this method, device and computer-readable storage medium, the above-mentioned problems can be solved.

The present invention provides the following solutions.

In a first aspect, a label propagation method for an association network is provided, comprising: constructing a first association network based on first-party data, and constructing a second association network based on second-party data; associating the first association network and the second association network based on a secure intersection protocol to obtain a federated association network; iteratively performing multiple rounds of label propagation on the nodes of the federated association network; wherein each round of label propagation comprises: determining a label propagation probability between adjacent nodes in a federated association graph; for each node, determining a current round label for each node based on the current round label of the neighboring node and the label propagation probability of the neighboring node for the node.

In one embodiment, associating the first associated network with the second associated network based on a secure intersection protocol to obtain a federated associated network further includes: performing encrypted intersection on the first-party data and the second-party data, determining a common node in the first associated network and the second associated network, and associating the first associated network with the second associated network according to the common node to obtain a federated associated network;

In one embodiment, determining the label propagation probability between adjacent nodes in a federated association graph further includes: determining the edge weight w _ij of the edge ij between node i and its neighbor node j in the federated association network; determining the edge weight sum ∑ _j w _ij between node i and all of its neighbor nodes J; and determining the label propagation probability P _ij of neighbor node j for node i based on the ratio of the edge weight w _ij and the edge weight sum ∑ _j w _ij .

In one implementation, if node i is a non-shared node, all neighbor nodes J represent all neighbor nodes in the graph where node i is located.

In one embodiment, if node i is a common node, all neighbor nodes J represent node i in the first associated network. The set of all neighbor nodes a of node i and all neighbor nodes b of node i in the second associated network.

In one embodiment, if node i is a common node, the first and second parties interact with each other by summing the edge weights between node i and all neighboring nodes a in the first associated network and summing the edge weights between node i and all neighboring nodes b in the second associated network.

In one embodiment, it also includes: if node i is a common node, using the following formula to determine the edge weight sum ∑ _j w _ij : ∑ _j w _ij ＝∑ _a w _ia +∑ _b w _ib ; wherein ∑ _a w _ia is the sum of the edge weights between node i and all neighbor nodes a of the first associated network, and ∑ _b w _ib is the sum of the edge weights between node i and all neighbor nodes b of the second associated network.

In one embodiment, multiple rounds of label propagation are iteratively performed on the nodes of the federated association network, and also include: determining the labeled nodes and unlabeled nodes of the federated association network; updating the labels of the unlabeled nodes in rounds until the labels of the unlabeled nodes no longer change and/or exceed an update round threshold; and, keeping the labels of the labeled nodes unchanged.

In one embodiment, the current round label of each node is determined based on the current round label of the neighboring nodes and the label propagation probability of the neighboring nodes to the node, including: for each node, determining the current round label of each neighboring node of the node, and the label propagation probability of each neighboring node to the node; among all the neighboring nodes of the node, calculating the sum of the label propagation probabilities corresponding to each label to obtain the label propagation aggregation probability corresponding to each label; updating the current round label of the node according to the label with the maximum label propagation aggregation probability.

In one implementation, if the node is a non-shared node, the method further includes: the party where the node is located calculates the label propagation aggregation probability corresponding to each neighbor node label of the node.

In one embodiment, if the node is a shared node, the method also includes: the first party calculates the first-party label propagation aggregation probability corresponding to all neighbor node labels of the node in the first associated network; the second party calculates the second label propagation aggregation probability corresponding to all neighbor node labels of the node in the second associated network; the first party and the second party interact with each other in the first label propagation aggregation probability and the second label propagation aggregation probability; the first party and the second party each perform label propagation probability aggregation again based on the interactive information to obtain the label propagation aggregation probability corresponding to each label.

In one embodiment, the method further includes: determining the graph weights of the first association network and the second association network according to the closeness of the node relationships between the first association network and the second association network; and introducing the graph weights during the interaction between the first association network and the second association network.

In one embodiment, graph weights are introduced in the interaction process between the first association network and the second association network, including: if node i is a common node, edge weights and ∑ _j w _ij are determined using the following formula:

∑ _j w _ij ＝θ _a ∑ _a w _ia +θ _b ∑ _b w _ib ; wherein, ∑ _a w _ia is the sum of the edge weights between node i and all neighbor nodes a of the first associated network, ∑ _b w _ib is the sum of the edge weights between node i and all neighbor nodes b of the second associated network, θ _a is the graph weight of the first associated network, and θ _b is the graph weight of the second associated network.

In one embodiment, graph weights are introduced during the interaction between the first association network and the second association network, and the process also includes: after the first party and the second party interact with the first label propagation aggregation probability and the second label propagation aggregation probability, the label propagation probability is aggregated again based on the graph weights of the first association network and the second association network to obtain the label propagation aggregation probability corresponding to each label.

In one implementation, it further includes: if the first associated network and the second associated network are directed graph networks, only the incoming neighbor nodes of each node are used as neighbor nodes.

In a second aspect, a label propagation device for an association network is provided, comprising: a graph construction module, used to construct a first association network based on first-party data, and to construct a second association network based on second-party data; a federated network module, used to associate the first association network with the second association network based on a secure intersection protocol to obtain a federated association network; a label propagation module, used to iteratively perform multiple rounds of label propagation on the nodes of the federated association network; wherein each round of label propagation includes: determining the label propagation probability between adjacent nodes in the federated association graph; for each node, according to the labels of the neighbor nodes in this round and the neighbor nodes; The label propagation probability of a node to another node determines the label of each node in this round.

According to a third aspect, a label propagation device for an associated network is provided, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute: the method according to the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, wherein the computer-readable storage medium stores a program, and when the program is executed by a multi-core processor, the multi-core processor executes the method of the first aspect.

One of the advantages of the above implementation is that it can achieve label propagation across platform networks while ensuring privacy data.

Other advantages of the present invention will be explained in more detail with reference to the following description and accompanying drawings.

It should be understood that the above description is only an overview of the technical solution of the present invention, so that the technical means of the present invention can be more clearly understood and implemented according to the contents of the specification. In order to make the above and other purposes, features and advantages of the present invention more obvious and easy to understand, the specific implementation methods of the present invention are described below by way of example.

BRIEF DESCRIPTION OF THE DRAWINGS

The advantages and benefits described herein and other advantages and benefits will be apparent to those of ordinary skill in the art upon reading the detailed description of the exemplary embodiments below. The accompanying drawings are only for the purpose of illustrating exemplary embodiments and are not to be considered as limiting the present invention. Also, the same reference numerals are used throughout the accompanying drawings to represent the same components. In the accompanying drawings:

FIG1 is a schematic diagram of the structure of a label propagation device for an associated network according to an embodiment of the present invention;

FIG2 is a schematic diagram of a flow chart of a label propagation method for an association network according to an embodiment of the present invention;

FIG3 is a schematic diagram of a first association network and a second association network according to an embodiment of the present invention;

FIG4 is a schematic diagram of a federated association network according to an embodiment of the present invention;

FIG5 is a schematic diagram of determining the label propagation probability of the first association network and the second association network according to an embodiment of the present invention;

FIG6 is a schematic diagram of determining a label propagation probability of a federated association network according to an embodiment of the present invention;

FIG7 is a schematic diagram of label propagation of an association network according to an embodiment of the present invention;

FIG8 is a schematic diagram of label propagation of an association network according to an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of a label propagation device for an associated network according to an embodiment of the present invention.

In the drawings, the same or corresponding reference numerals represent the same or corresponding parts.

Detailed ways

The exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although the exemplary embodiments of the present disclosure are shown in the accompanying drawings, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments described herein. On the contrary, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

In the description of the embodiments of the present application, it should be understood that terms such as "including" or "having" are intended to indicate the presence of features, numbers, steps, behaviors, components, parts, or a combination thereof disclosed in this specification, and are not intended to exclude the possibility of the presence of one or more other features, numbers, steps, behaviors, components, parts, or a combination thereof.

Unless otherwise specified, “/” means or. For example, A/B can mean A or B. The “and/or” in this article is merely a way to describe the association relationship of associated objects, indicating that three relationships can exist. For example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone.

The terms "first", "second", etc. are used for descriptive purposes only and should not be understood as indicating or implying relative importance or The number of technical features indicated is implicitly specified. Thus, a feature defined as "first", "second", etc. may explicitly or implicitly include one or more of the feature. In the description of the embodiments of the present application, unless otherwise specified, "plurality" means two or more.

In order to clearly explain the embodiments of the present application, some concepts that may appear in subsequent embodiments will be introduced first.

The present invention will be described in detail below with reference to the accompanying drawings and in conjunction with embodiments.

FIG1 shows a schematic diagram of an example of a computing device 100 according to an embodiment of the present disclosure. It should be noted that FIG1 is a schematic diagram of the structure of the hardware operating environment of the label propagation method of the associated network. The device of the label propagation method based on the associated network in the embodiment of the present invention can be a terminal device such as a PC, a portable computer, etc.

As shown in FIG1 , the label propagation method device of the associated network may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002. Among them, the communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 may be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also be a storage device independent of the aforementioned processor 1001.

Those skilled in the art will understand that the label propagation device structure of the associated network shown in FIG1 does not constitute a limitation on the label propagation method device of the associated network, and may include more or fewer components than shown in the figure, or a combination of certain components, or a different arrangement of components.

As shown in Figure 1, the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a label propagation method program for an associated network. The operating system is a program for managing and controlling the hardware and software resources of the label propagation device for the associated network, and supports the operation of the label propagation program for the associated network and other software or programs.

In the label propagation device of the associated network shown in FIG1 , the user interface 1003 is mainly used to receive requests, data, etc. sent by the first terminal, the second terminal, and the supervision terminal; the network interface 1004 is mainly used to connect the backend server to communicate data with the backend server; and the processor 1001 can be used to call the label propagation program of the associated network stored in the memory 1005, and perform the following operations:

A first association network is constructed based on first-party data, and a second association network is constructed based on second-party data; the first association network and the second association network are associated based on a secure intersection protocol to obtain a federated association network; multiple rounds of label propagation are iteratively performed on the nodes of the federated association network; wherein each round of label propagation includes: determining the label propagation probability between adjacent nodes in the federated association graph; for each node, determining the current round label of each node according to the current round label of the neighboring node and the label propagation probability of the neighboring node for the node.

Therefore, the two parties only need to exchange non-privacy data such as label propagation probability to carry out cross-institutional data association network joint calculation and application solutions without releasing the original data of both parties.

2 shows a flow chart of a method for performing a label propagation method of an association network according to an embodiment of the present disclosure. The method may be performed, for example, by the computing device 100 shown in FIG1. It should be understood that the method 200 may also include additional blocks not shown and/or may omit the blocks shown, and the scope of the present disclosure is not limited in this respect.

Step 210, constructing a first association network based on the first-party data, and constructing a second association network based on the second-party data;

For example, see Figure 3. Party A and Party B form nodes and edges in the association network based on their own data. Assuming Party A is a bank, the A association network formed by the transfer data between users is a transfer association network, in which the user's mobile phone number is the node in the association network, and the nodes with transfer relations are connected, and the transfer amount is the edge weight between nodes. Party B is the operator. The call data between users forms the party B association network, which is the call association network. The user's mobile phone number is the node in the association network. The nodes with call records are connected, and the number of calls is the edge weight value between nodes. Optionally, the edge weight values of each association network can be normalized.

Step 220, associating the first associated network and the second associated network based on a secure intersection protocol to obtain a federated associated network;

In one embodiment, the above step 220 further includes: encrypting and intersecting the first-party data and the second-party data, determining common nodes in the first association network and the second association network, and associating the first association network and the second association network according to the common nodes to obtain a federated association network.

For example, referring to FIG4 , a secure intersection algorithm (such as a privacy intersection algorithm based on RSA+HASH) can be used to securely intersect the node data of the two parties, and the common nodes can be found without exposing the original data, thereby forming a virtual federated association network. As shown in FIG4 , the first association network and the second association network in FIG3 can be associated to obtain the federated association network. Among them, Va represents the node of party A, Va represents the node of party B, Vab represents the common node of both parties, and Vab1, Vab2, and Vab3 are the common nodes of both parties. Taking node Vab1 as an example, from the perspective of party A's association network alone, Vab1 has 2 neighbor nodes. From the perspective of global data, Vab1 has 4 neighbor nodes.

Step 230, iteratively performing multiple rounds of label propagation on the nodes of the federated association network;

Specifically, the nodes of the federated association network may include labeled nodes and unlabeled nodes. For example, there may be some unlabeled nodes in the first association network and the second association network. For another example, all nodes in the first association network are labeled nodes, and all nodes in the second association network are unlabeled nodes. And so on.

In one embodiment, for the case where the above-mentioned federated association network includes both labeled nodes and unlabeled nodes, the above-mentioned step 230 may further include the following steps: first, determining the labeled nodes and unlabeled nodes of the federated association network; updating the labels of the unlabeled nodes in rounds until the labels of the unlabeled nodes no longer change and/or exceed the update round threshold; and keeping the labels of the labeled nodes unchanged. In this way, the labels of the original samples can be kept unchanged, and the accuracy of label propagation can be guaranteed.

Optionally, the labels of the annotated nodes can also be dynamically updated round by round, that is, the labels of all nodes in the federated association network are updated round by round until the labels of the nodes no longer change and/or exceed the update round threshold. In this way, the original labels can be corrected and hidden risk labels can be mined.

In the above step 230, each round of label propagation specifically includes the following steps 231-232:

Step 231, determining the label propagation probability between adjacent nodes in the federated association graph;

In one implementation, the above step 231 may specifically include:

(1) Determine the edge weight w _ij of the edge ij between node i and its neighbor node j in the federated association network;

(2) Determine the sum of edge weights ∑ _j w _ij between node i and all its neighboring nodes J;

In one implementation, if node i is a common node, all neighbor nodes J represent a set of all neighbor nodes a of node i in the first associated network and all neighbor nodes b of node i in the second associated network.

Further, if node i is a common node, the first party and the second party interact with the sum of the edge weights ∑ _a w _ia between node i and all neighbor nodes a of the first associated network, and the sum of the edge weights ∑ _b w _ib between node i and all neighbor nodes b of the second associated network. In this way, the first party and the second party can each calculate the sum of the edge weights ∑ _j w _ij between node i and all its neighbor nodes J based on the sum of the edge weights of the two parties interacting.

Further, in one implementation, if node i is a shared node, based on the sum of the edge weights of both parties' interactions, both the first party and the second party may determine the sum of the edge weights ∑ _j w _ij using the following formula:

∑ _j w _ij = ∑ _a w _ia + ∑ _b w _ib ;

Among them, ∑ _a w _ia is the sum of the edge weights between node i and all neighboring nodes a in the first associated network, and ∑ _b w _ib is the sum of the edge weights between node i and all neighboring nodes b in the second associated network.

Optionally, in another implementation, the impact of business scenarios on the closeness of node relationships can be further considered. For example, in a financial scenario, a transfer relationship is a strong relationship, and a call relationship is a weak relationship. Therefore, when calculating the label propagation probability of an edge, the strength of the edge relationship under different business scenarios of both parties can be considered for weighted aggregation.

In this case, both the first party and the second party can determine the weighted edge weight ∑ _j w _ij using the following formula:

∑ _j w _ij =θ _a ∑ _a w _ia +θ _b ∑ _b w _ib ;

Among them, _θa is the corresponding graph weight of the first association network, and _θb is the graph weight of the second association network.

(3) According to the ratio of edge weight w _ij and the sum of edge weights ∑ _j w _ij , determine the label propagation probability P _ia of neighbor node j to node i.

Specifically, the label propagation probability of each edge of the federated association network is Where w _ij represents the weight value of edge ij. Here, for non-shared nodes, J represents the neighbor nodes of node i; for shared nodes, J represents all neighbor nodes of node i on both sides A and B.

The calculation logic of ∑ _j w _ij is that Party A calculates the sum of the weights of neighboring nodes of its own node i as ∑ _a w _ia ; Party B calculates the sum of the weights of neighboring nodes of its own node i as ∑ _b w _ib . The two parties interact with ∑ _a w _ia and ∑ _b w _ib , and the final weight calculation denominator value is ∑ _j w _ij =∑ _a w _ia +∑ _b w _ib .

Referring to Figure 5, node Vab2 is taken as an example. In the local network of party A, Vab2 has one neighbor node, and the label propagation probability calculated separately is P = 0.1/0.1 = 1; in the local network of party B, Vab2 has three neighbor nodes, and the label propagation probabilities calculated separately are P = 0.2/(0.2+0.4+0.8) = 1/7, P = 0.4/(0.2+0.4+0.8) = 2/7, and P = 0.8/(0.2+0.4+0.8) = 4/7; further, the two parties exchange the weight values of the target node Vab2 and the neighbor nodes, which are 0.1 for party A and 0.2+0.4+0.8 = 1.4 for party B. Combined with the federated association network, the label propagation probability of the target node is updated.

Referring to Figure 6, after the above calculation, on side A, the probability of label propagation of its neighbor nodes for node i is On side B, similarly, the label propagation probabilities of its neighboring nodes for node i are 2/15, 4/15, and 8/15 respectively.

Step 232 , for each node, determine the current round label of each node according to the current round labels of neighboring nodes and the label propagation probability of neighboring nodes for the node.

Referring to Figure 7, we continue to take the federated association network formed by Vab2 as an example, as shown below. Node 5 is a risk node, displayed as a white node, and its label is set to "1"; the remaining nodes are unknown nodes, displayed as gray nodes, and their labels are set to "0". During the label propagation process, the label of node 5 is always "1", and the labels of the remaining nodes are updated round by round until the labels of all nodes no longer change or exceed the update round threshold.

In one embodiment, the above step 232 further includes the following steps:

Step 2321, for node i, determine the current round label of each neighbor node of node i, and the label propagation probability of each neighbor node for node i;

Step 2322, among all neighboring nodes of node i, calculate the sum of the label propagation probabilities corresponding to each label to obtain the label propagation aggregation probability corresponding to each label;

Specifically, if node i is a non-shared node, only the party where node i is located calculates the label propagation aggregation probability corresponding to each neighbor node label of node i.

Specifically, if node i is a shared node, the following steps are performed: first, the first party calculates the first-party label propagation aggregation probability corresponding to all neighbor node labels of node i in the first associated network; the second party calculates the second-party label propagation aggregation probability corresponding to all neighbor node labels of node i in the second associated network; secondly, the first party and the second party interact with each other to calculate the first label propagation aggregation probability and the second label propagation aggregation probability; finally, the first party and the second party each interact with each other to calculate the first label propagation aggregation probability and the second label propagation aggregation probability; The information is again subjected to label propagation probability aggregation to obtain the label propagation aggregation probability corresponding to each label.

Step 2323: Update the current round label of node i according to the label with the maximum label propagation aggregation probability.

The node label update rules shown in the above steps 2321 to 2323 may include the following specific steps:

First, for the T-th round update of node i, let its neighbor node set be J((J ₁ , L ₁ , _Pi1 ),(J ₂ , L ₂ , _Pi2 ),(J _j , L _j , Pi _ij ),…,(J _n , L _n , _Pin )), where J _j is the identifier of neighbor node j, L _j is the label of neighbor node j, and _Pij is the propagation probability of edge <i, J _j >.

Secondly, calculate the aggregate propagation probability of all labels of the neighbor node set. Specifically, P(L _j ) = ∑P _ij , where P _ij is the label propagation probability of the neighbor node with label L _j for the target node i. Among them, if node i is a non-shared node, only the propagation probability of the neighbor node label of the party needs to be calculated. If node i is a shared node, party A calculates the probability corresponding to all neighbor node labels of the party P(L _aj ) = ∑P _iaj , and party B calculates the probability corresponding to the neighbor node label of the party P(L _bj ) = ∑P _ibj , party A and party B interact P(L _aj ) and P(L _bj ), and each party performs label propagation probability aggregation again on its own side, and finally obtains the label propagation aggregation probability P(L _j ) = P(L _aj ) + P(L _bj ) combined with the associated network information of both parties.

Finally, select the label L _j corresponding to the largest P(L _j ) as the label of node i in this round. Repeat the above steps until the labels of all nodes no longer change.

In a specific example, a specific calculation example of label update is given with reference to FIG7 and FIG8 .

Referring to FIG7 , for the first round of propagation, for node Vab2, the following calculation is performed:

(1) The label propagation aggregation probability of the neighbor nodes of party A is calculated as <“0”, 1/15>, where “0” represents the risk-free label and 1/15 represents the label propagation aggregation probability corresponding to label “0”. It can be understood that since party A’s Vab2 has only one neighbor node 1, and its initial label value is “0”, the label propagation probability of node 1 to node Vab2 has been calculated as 1/15 in the above text. Therefore, for party A’s Vab2 node, it has only one propagable label “0”, and the label propagation aggregation probability corresponding to the propagable label “0” is 1/15.

(2) Calculate the label propagation aggregation probability of the neighboring nodes of party B as <“0”, 6/15>, <“1”, 8/15>, where “0” represents a risk-free label, and 6/15 represents the label propagation aggregation probability corresponding to label “0”. “1” represents a risky label, and 8/15 represents the label propagation aggregation probability corresponding to label “1”. Since party B’s Vab2 has three neighboring nodes (3, 4, 5), and the initial label values of nodes 3 and 4 are “0”, and the initial label value of node 5 is “1”. In the above, it has been calculated that the label propagation probability of node 3 to node Vab2 is 2/15, the label propagation probability of node 4 to node Vab2 is 4/15, and the label propagation probability of node 5 to node Vab2 is 8/15. Therefore, for the B-side Vab2 node, it has two propagable labels “0” and “1”, and the label propagation aggregation probability corresponding to the propagable label “0” is 6/15=2/15+4/15, and the label propagation aggregation probability corresponding to the propagable label “1” is 8/15.

(3) The two parties exchange label propagation aggregation probabilities and add up the label propagation aggregation probabilities corresponding to the same label. They can calculate the label propagation aggregation probabilities of node Vab2 as <“0”, 7/15>, <“1”, 8/15>, that is, the label propagation aggregation probability corresponding to the propagable label “0” is 7/15, and the label propagation aggregation probability corresponding to the propagable label “1” is 8/15.

(4) Select the label "1" corresponding to the maximum label propagation aggregation probability <"1", 8/15> as the label of node Vab2 in this round. The above steps are similar for other nodes.

After the first round of propagation, the updated node label distribution diagram of the federated association network is shown in Figure 8, where nodes Vab1 and Vab2 are both updated to label “1”. The next round of label propagation continues until the node label no longer changes or the number of propagation rounds exceeds a certain threshold.

In one embodiment, the graph weights of the first association network and the second association network are determined according to the closeness of the node relationships between the first association network and the second association network; and the graph weights are introduced during the interaction between the first association network and the second association network.

For example, the first association network and the second association network can be determined to be strongly or weakly associated according to the business scenario, and then the graph weights of the first association network and the second association network can be introduced when calculating the label propagation probability of the edge. Of course, the graph weights of the first association network and the second association network can also be introduced when calculating the label propagation aggregation probability of each label, and this application does not impose specific restrictions on this.

In one embodiment, graph weights are introduced into the interaction process between the first association network and the second association network, including at least the following two introduction methods:

(1) In the above step 231, if node i is a common node, the edge weight ∑ _j w _ij is determined using the following formula:

(2) In the above step 232, after the first party and the second party exchange the first label propagation aggregation probability and the second label propagation aggregation probability, the label propagation probability is aggregated again based on the graph weights of the first association network and the second association network to obtain the label propagation aggregation probability corresponding to each label. For example, if node i is a common node, party A calculates the probability P(L _aj )＝P(L _iaj ) corresponding to the labels of all its neighboring nodes, and party B calculates the probability P(L _bj )＝P(L _ibj ) corresponding to the labels of its neighboring nodes. Party A and party B exchange P(L _aj ) and P(L _bj ), and each party performs label propagation probability aggregation again on its own side to obtain the final label propagation aggregation probability P(L _j )＝θ _a P(L _aj )+θ _b P(L _bj ) combining the association network information of both parties.

In one embodiment, it further includes: if the first associated network and the second associated network are directed graph networks, only the incoming neighbor nodes of each node are used as neighbor nodes. For example, for a directed graph, when calculating the propagation probability of a node, only the incoming neighbor nodes of the target node may be considered. The specific judgment may be made in combination with the business scenario.

In the description of this specification, the description with reference to the terms "some possible embodiments", "some embodiments", "examples", "specific examples", or "some examples" means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present invention. In this specification, the schematic representation of the above terms does not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described may be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled in the art may combine and combine different embodiments or examples described in this specification and the features of different embodiments or examples without contradiction.

About the method flow chart of the present application embodiment, some operations are described as different steps performed in a certain order. Such flow chart belongs to illustrative and non-restrictive. Some steps described in this article can be grouped together and performed in a single operation, some steps can be divided into multiple sub-steps, and some steps can be performed in an order different from that shown in this article. Each step shown in the flow chart can be realized in any way by any circuit structure and/or tangible mechanism (for example, by software, hardware (for example, the logical function realized by processor or chip) etc. running on computer equipment and/or any combination thereof).

Based on the same technical concept, the embodiment of the present invention further provides a label propagation device for an associated network, which is used to execute the label propagation method for an associated network provided by any of the above embodiments. FIG9 is a schematic diagram of the structure of a label propagation device for an associated network provided by an embodiment of the present invention.

As shown in FIG9 , the apparatus 900 includes:

A graph construction module 910, configured to construct a first association network based on first-party data and a second association network based on second-party data;

A federated network module 920, configured to associate the first associated network with the second associated network based on a secure intersection protocol to obtain a federated associated network;

The label propagation module 930 is used to iteratively perform multiple rounds of label propagation on the nodes of the federated association network; wherein, Each round of label propagation includes: determining the label propagation probability between adjacent nodes in the federated association graph; for each node, determining the label of each node in this round according to the label of the neighboring node in this round and the label propagation probability of the neighboring node for the node.

It should be noted that the device in the implementation mode of the present application can implement each process of the implementation mode of the aforementioned method and achieve the same effects and functions, which will not be repeated here.

According to some embodiments of the present application, a non-volatile computer storage medium of a label propagation method for an association network is provided, on which computer executable instructions are stored, and the computer executable instructions are configured to execute the method described in the above embodiments when executed by a processor.

Each embodiment in this application is described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the device, equipment, and computer-readable storage medium embodiments, since they are basically similar to the method embodiments, their descriptions are simplified, and the relevant parts can be referred to the partial description of the method embodiments.

The apparatus, equipment and computer-readable storage medium provided in the embodiments of the present application correspond one-to-one to the method. Therefore, the apparatus, equipment and computer-readable storage medium also have similar beneficial technical effects as the corresponding methods. Since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the apparatus, equipment and computer-readable storage medium will not be repeated here.

It will be appreciated by those skilled in the art that the embodiments of the present invention may be provided as methods, devices (equipment or system), or computer-readable storage media. Therefore, the present invention may be implemented in the form of a complete hardware implementation, a complete software implementation, or an implementation combining software and hardware. Moreover, the present invention may be implemented in the form of a computer-readable storage medium implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.

The present invention is described with reference to the flowchart and/or block diagram of the method, device (equipment or system) and computer-readable storage medium according to the embodiment of the present invention. It should be understood that each process and/or box in the flowchart and/or block diagram, and the combination of the process and/or box in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for implementing the function specified in one process or multiple processes in the flowchart and/or one box or multiple boxes in the block diagram.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

In a typical configuration, a computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.

Memory may include non-permanent storage in a computer-readable medium, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash RAM. Memory is a computer-readable medium. Quality example.

Computer-readable media include permanent and non-permanent, removable and non-removable media that can be used to store information by any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, read-only compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device. In addition, although the operations of the method of the present invention are described in a particular order in the accompanying drawings, this does not require or imply that these operations must be performed in this particular order, or that all the operations shown must be performed to achieve the desired results. Additionally or alternatively, some steps can be omitted, multiple steps can be combined into one step, and/or one step can be decomposed into multiple steps.

Although the spirit and principle of the present invention have been described with reference to several specific embodiments, it should be understood that the present invention is not limited to the disclosed specific embodiments, and the division of various aspects does not mean that the features in these aspects cannot be combined to benefit, and such division is only for the convenience of expression. The present invention is intended to cover various modifications and equivalent arrangements included in the spirit and scope of the attached claims.

Claims

A label propagation method for an association network, characterized by comprising:

Building a first association network based on the first-party data, and building a second association network based on the second-party data;

Associating the first association network and the second association network based on a secure intersection protocol to obtain a federated association network;

Iteratively performing multiple rounds of label propagation on nodes of the federated association network;

Among them, each round of label propagation includes: determining the label propagation probability between adjacent nodes in the federated association graph; for each node, determining the current round label of each node according to the current round label of the neighboring node and the label propagation probability of the neighboring node for the node.
The method according to claim 1, characterized in that the step of associating the first association network and the second association network based on a secure intersection protocol to obtain a federated association network further comprises:

The first party data and the second party data are encrypted and intersected to determine common nodes in the first association network and the second association network, and the first association network and the second association network are associated according to the common nodes to obtain a federated association network.
The method according to claim 1, characterized in that determining the label propagation probability between adjacent nodes in the federated association graph further comprises:

Determine the edge weight w ij of the edge ij between the node i and its neighbor node j in the federated association network;

Determine the sum of edge weights ∑ j w ij between the node i and all its neighboring nodes J;

According to the ratio of the edge weight w ij and the edge weight sum ∑ j w ij , the label propagation probability P ij of the neighbor node j to the node i is determined.
The method according to claim 3, characterized in that

If the node i is a non-shared node, the all neighbor nodes J represent all neighbor nodes in the graph where the node i is located.
The method according to claim 3, characterized in that

If the node i is a common node, the total neighbor nodes J represent the set of all neighbor nodes a of the node i in the first associated network and all neighbor nodes b of the node i in the second associated network.
The method according to claim 3, characterized in that

If the node i is a common node, the first party and the second party exchange the sum of edge weights between the node i and all neighboring nodes a of the first associated network and the sum of edge weights between the node i and all neighboring nodes b of the second associated network.
The method according to claim 3, further comprising:

If the node i is a common node, the edge weight ∑ j w ij is determined using the following formula:
∑ j w ij = ∑ a w ia + ∑ b w ib ;

Among them, ∑ a w ia is the sum of the edge weights between the node i and all neighboring nodes a of the first associated network, and ∑ b w ib is the sum of the edge weights between the node i and all neighboring nodes b of the second associated network.
The method according to claim 1, characterized in that the multiple rounds of label propagation are iteratively performed on the nodes of the federated association network, and further comprising:

Determining labeled nodes and unlabeled nodes of the federated association network;

Updating the labels of the unlabeled nodes in rounds until the labels of the unlabeled nodes no longer change and/or exceed an update round threshold; and,

The label of the annotated node remains unchanged.
The method according to claim 1 is characterized in that determining the current round label of each node according to the current round label of the neighbor node and the label propagation probability of the neighbor node to the node comprises:

For each node, determine the current round label of each neighbor node of the node, and the label propagation probability of each neighbor node to the node;

Among all neighboring nodes of the node, the sum of the label propagation probabilities corresponding to each label is calculated to obtain the label propagation aggregation probability corresponding to each label;

The current round label of the node is updated according to the label with the maximum label propagation aggregation probability.
The method according to claim 9, characterized in that if the node is a non-shared node, the method further comprises:

The node location calculates the label propagation aggregation probability corresponding to each neighbor node label of the node.
The method according to claim 9, characterized in that if the node is a shared node, the method further comprises:

The first party calculates the first party label propagation aggregation probability corresponding to all neighbor node labels of the node in the first associated network;

The second party calculates a second label propagation aggregation probability corresponding to all neighbor node labels of the node in the second associated network;

The first party and the second party interact with each other with the first label propagation aggregate probability and the second label propagation aggregate probability;

The first party and the second party each perform label propagation probability aggregation again based on the interactive information to obtain a label propagation aggregation probability corresponding to each label.
The method according to claim 1, further comprising:

Determining graph weights of the first association network and the second association network according to the closeness of the node relationships between the first association network and the second association network; and,

The graph weight is introduced in the interaction process between the first association network and the second association network.
The method according to claim 12, characterized in that introducing the graph weight in the interaction process between the first association network and the second association network comprises:

If the node i is a common node, the edge weight ∑ j w ij is determined using the following formula:
∑ j w ij =θ a ∑ a w ia +θ b ∑ b w ib ;

Among them, ∑ a w ia is the sum of the edge weights between the node i and all neighboring nodes a of the first associated network, ∑ b w ib is the sum of the edge weights between the node i and all neighboring nodes b of the second associated network, θ a is the graph weight of the first associated network, and θ b is the graph weight of the second associated network.
The method according to claim 12, characterized in that introducing the graph weight in the interaction process between the first association network and the second association network comprises:

After the first party and the second party exchange the first label propagation aggregation probability and the second label propagation aggregation probability, the label propagation probability is aggregated again based on the graph weights of the first association network and the second association network to obtain the label propagation aggregation probability corresponding to each label.
The method according to claim 1, further comprising:

If the first associated network and the second associated network are directed graph networks, only the incoming neighbor nodes of each node are used as the neighbor nodes.
A label propagation device for an associated network, characterized by comprising:

A graph construction module, configured to construct a first association network based on the first-party data and a second association network based on the second-party data;

A federated network module, configured to associate the first associated network with the second associated network based on a secure intersection protocol to obtain a federated associated network;

A label propagation module is used to iteratively perform multiple rounds of label propagation on the nodes of the federated association network; wherein each round of label propagation includes: determining the label propagation probability between adjacent nodes in the federated association graph; for each node, determining the current round label of each node based on the current round label of the neighboring node and the label propagation probability of the neighboring node for the node.
A label propagation device for an associated network, characterized by comprising:

At least one processor; and, a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor so as to enable the at least one processor to execute: a method as described in any one of claims 1-15.
A computer-readable storage medium stores a program, and when the program is executed by a multi-core processor, the multi-core processor executes the method according to any one of claims 1 to 15.