WO2023142490A1 - 基于分布式图嵌入的联邦图聚类方法、装置及可读存储介质 - Google Patents

基于分布式图嵌入的联邦图聚类方法、装置及可读存储介质 Download PDF

Info

Publication number
WO2023142490A1
WO2023142490A1 PCT/CN2022/117418 CN2022117418W WO2023142490A1 WO 2023142490 A1 WO2023142490 A1 WO 2023142490A1 CN 2022117418 W CN2022117418 W CN 2022117418W WO 2023142490 A1 WO2023142490 A1 WO 2023142490A1
Authority
WO
WIPO (PCT)
Prior art keywords
graph
node
pia
pib
federated
Prior art date
Application number
PCT/CN2022/117418
Other languages
English (en)
French (fr)
Inventor
汤韬
陈滢
高鹏飞
庞悦
郑建宾
刘红宝
潘婧
周雍恺
Original Assignee
中国银联股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国银联股份有限公司 filed Critical 中国银联股份有限公司
Publication of WO2023142490A1 publication Critical patent/WO2023142490A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the invention belongs to the field of clustering, and in particular relates to a federated graph clustering method, device and readable storage medium based on distributed graph embedding.
  • the current federated learning technology has high application potential for the joint use of data that does not leave the database, and to mine the value of multi-party data.
  • the main supported algorithms are traditional machine learning classification models, regression models, etc., focusing on the evaluation of individual value portraits.
  • the mining of potential gang behavior is relatively lacking.
  • graph computing involves multiple rounds of topological interactive computing of multi-party data, the research on the development of graph mining algorithms based on privacy computing is relatively weak, and the industry has few achievements.
  • the present invention provides the following solutions.
  • a federated graph clustering method based on distributed graph embedding including: constructing a first graph based on first-party data, and constructing a second graph based on second-party data; The data is encrypted and intersected, the common nodes in the first graph and the second graph are determined, and the federated graph is obtained by associating the first graph and the second graph according to the common nodes; the federated graph is learned by using the distributed graph embedding algorithm based on random walk, Determine the first graph embedding vector [PiA, PiB] starting from the first graph and the second graph embedding vector [PiA', PiB'] starting from the second graph, where PiA and PiA' are the first The embedding vector of a graph node, PiB and PiB' are the embedding vectors of each second graph node in the second graph; based on the federated clustering method, the first graph embedding vector [PiA, PiB] and the second graph embedding vector of the
  • determining the first graph embedding vector [PiA, PiB] and the second graph embedding vector [PiA', PiB'] includes: performing random Walking, the first party determines PiA according to the walking path on the first graph, and the second party determines PiB according to the matching walking path on the second graph; the node in the second graph is used as the starting node to perform multiple times on the federated graph Random walk, the second party determines PiB' according to the walk path on the second graph, and the first party determines PiA' according to the matching walk path on the first graph.
  • associating the first graph and the second graph according to the common nodes to obtain the federated graph further includes: removing island nodes that have no direct or indirect relationship with the common nodes in the first graph and the second graph to obtain the federated graph .
  • the first party data and the second party data are segregated from each other.
  • the nodes in the first graph are the first-party users and/or the first-party merchants, and the edges of the first graph are determined according to the association relationship between the nodes in the first graph; the nodes in the second graph are the second-party merchants. For users and/or second-party merchants, the edges of the second graph are determined according to the association relationship between the nodes of the second graph.
  • the first-party data and the second-party data are encrypted and intersected, and the common nodes in the first graph network and the second graph network are determined, including: according to the attribute information of merchants and/or users, aligning the Common nodes in the first graph network and the second graph network.
  • multiple random walks are performed on the federated graph with the node in the first graph as the starting node, and the first party determines PiA according to the walk path on the first graph, including: defining the number of random walk steps M , the first party takes any node in the first graph as the starting node to perform a random walk on the first graph, stops walking when it reaches any common node, and records the number of walking steps Mia, The identification Vab i of the public node that has been walked and the first graph nodes that this walk has passed; after random walk X times, count the number of steps Mia in the first graph of each walk and each first graph node The frequency of being walked to obtain the first graph node frequency matrix corresponding to the number of walking steps Mia in each first graph; perform matrix accumulation calculation on the first graph node frequency matrix corresponding to the number of walking steps Mia in each first graph , and divide by the number of random walks X to get the first graph part PiA of the first graph embedding vector.
  • the second party determines PiB according to the matching walk path on the second graph, including: during or after X random walks, the identity of each common node that the first party will walk to Vab i and the corresponding number of steps Mia in the first graph are sent to the second party; the second party determines the graph embedding vector of the (M-Mia) steps corresponding to the walk in the second graph starting from each public node Vab i PiB_Vab i ; Accumulate the graph embedding vectors PiB_Vab i corresponding to all public nodes Vab i , and divide by the number of sub-walks X 1 to obtain the second graph part PiB of the first graph embedding vector; where, the number of sub-walks X 1 refers to the number of walks to the second graph during X random walks.
  • PiA is calculated using the following formula:
  • PiB is calculated using the following formula:
  • multiple random walks are performed on the federated graph with the node in the second graph as the starting node, and the second party determines the PIB' according to the walk path on the second graph, including: defining the number of random walk steps M', the second party randomly walks on the second graph with any node in the second graph as the starting node, stops walking when it reaches any common node, and records the number of steps Mia in the second graph ', the identification Vab i of the public node that has been walked to and the nodes of each second graph that this walk has passed through; after random walk X' times, count the number of steps Mia' in the second graph of each walk and The frequency of each second graph node is walked to, and the second graph node frequency matrix corresponding to each second graph walk step number Mia' is obtained; the second graph corresponding to each second graph walk step number Mia'
  • the node frequency matrix is subjected to matrix accumulation calculation and divided by the number of random walks X' to obtain the second graph part PIB' of the
  • the first party determines PiA' according to the matching walk path on the first graph, including: during or after X' random walks, each common node that the second party will walk to The identity Vab i and the corresponding number of walking steps in the second graph Mia' are sent to the first party; the first party determines the (M'-Mia') corresponding to walking in the first graph starting from each public node Vab i step graph embedding vector PiA'_Vab i ; accumulate the graph embedding vector PiA'_Vab i corresponding to all common nodes Vab i , and divide by the number of sub-walks X′ 1 to get the first graph of the second graph embedding vector Part of PiA'; wherein, the number of sub-walks X' 1 refers to the number of times to walk to the first graph during X' random walks.
  • the PIB' is calculated using the following formula:
  • PiA' is calculated using the following formula:
  • cluster analysis is performed on the first graph embedding vector [PiA, PiB] and the second graph embedding vector [PiA', PiB'] of the federated graph based on a federated clustering method, including: based on a federated clustering method Perform cluster analysis on the first graph part PiA of the first graph embedding vector and the first graph part PiA' of the second graph embedding vector to obtain the first cluster of the first graph part of the federated graph; based on the federated clustering method performing cluster analysis on the second graph part PiB of the first graph embedding vector and the second graph part PiB' of the second graph embedding vector to obtain a second cluster of the first graph part of the federated graph; based on the first clustering
  • the cluster and the second cluster screen the cross-graph cluster to obtain the target cluster with a higher degree of clustering.
  • a federated graph clustering device based on distributed graph embedding, including: a construction module for constructing a first graph based on first-party data, and a second graph based on second-party data; an association module for To encrypt and intersect the first-party data and the second-party data, determine the common nodes in the first graph and the second graph, associate the first graph and the second graph according to the common nodes, and obtain the federated graph; the learning module is used for Use the distributed graph embedding algorithm based on random walk to learn the federated graph, determine the first graph embedding vector [PiA, PiB] starting from the first graph and the second graph embedding vector [PiA', PiB' starting from the second graph ], where PiA and PiA' are the embedding vectors of each first graph node of the first graph, PiB and PiB' are the embedding vectors of each second graph node of the second graph; the clustering module is used for federated clustering based on Method
  • a federated graph clustering device based on distributed graph embedding, including: at least one processor; and a memory connected in communication with at least one processor; wherein, the memory stores information that can be executed by at least one processor An instruction, the instruction is executed by at least one processor, so that the at least one processor can execute: the method in the first aspect.
  • a computer-readable storage medium stores a program, and when the program is executed by a multi-core processor, the multi-core processor executes the method in the first aspect.
  • One of the advantages of the above embodiment is that it is possible to learn the graph embedding vector of the federated graph when the first-party data and the second-party data are private data by using the distributed graph embedding algorithm, and the graph structure of the federated graph can be
  • the topological feature reduces the dimension to the matrix, reduces the computational complexity through matrix analysis, and improves the effect and efficiency of federated graph calculation.
  • Fig. 1 is a schematic structural diagram of a federated graph clustering device based on distributed graph embedding according to an embodiment of the present invention
  • FIG. 2 is a schematic flow diagram of a federated graph clustering method based on distributed graph embedding according to an embodiment of the present invention
  • Fig. 3 is a schematic diagram of the first figure and the second figure according to an embodiment of the present invention.
  • Fig. 4 is a schematic diagram of a federated graph according to an embodiment of the present invention.
  • Fig. 5 is a schematic diagram of another federated graph according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a federated graph clustering device based on distributed graph embedding according to an embodiment of the present invention
  • Fig. 7 is a schematic structural diagram of a federated graph clustering device based on distributed graph embedding according to another embodiment of the present invention.
  • A/B can mean A or B; "and/or” in this article is just an association relationship describing associated objects, indicating that there can be three relationships, For example, A and/or B may mean that A exists alone, A and B exist simultaneously, and B exists alone.
  • first”, “second”, etc. are used for descriptive purposes only, and should not be understood as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as “first”, “second”, etc. may expressly or implicitly include one or more of that feature. In the description of the embodiments of the present application, unless otherwise specified, "plurality" means two or more.
  • FIG. 1 is a schematic structural diagram of a hardware operating environment involved in the solution of the embodiment of the present invention.
  • FIG. 1 is a schematic structural diagram of a hardware operating environment of a federated graph clustering device based on distributed graph embedding.
  • the federated graph clustering device based on distributed graph embedding in the embodiment of the present invention may be a terminal device such as a PC or a portable computer.
  • the federated graph clustering device based on distributed graph embedding may include: a processor 1001 , such as a CPU, a network interface 1004 , a user interface 1003 , a memory 1005 , and a communication bus 1002 .
  • the communication bus 1002 is used to realize connection and communication between these components.
  • the user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 can be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a disk memory.
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .
  • the structure of the federated graph clustering device based on distributed graph embedding shown in Figure 1 does not constitute a limitation on the federated graph clustering device based on distributed graph embedding, and may include more or fewer components, or combining certain components, or a different arrangement of components.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a federated graph clustering program embedded in a distributed graph.
  • the operating system is a program that manages and controls the hardware and software resources of the federated graph clustering device based on distributed graph embedding, and supports the operation of the federated graph clustering program based on distributed graph embedding and other software or programs.
  • the user interface 1003 is mainly used to receive requests, data, etc. sent by the first terminal, the second terminal, and the supervision terminal;
  • the network interface 1004 is mainly used to connect
  • the background server performs data communication with the background server;
  • the processor 1001 can be used to call the federated graph clustering program based on distributed graph embedding stored in the memory 1005, and perform the following operations:
  • the distributed graph embedding algorithm it is possible to learn the graph embedding vector of the federated graph when the first-party data and the second-party data are private data, and the graph structure topology of the federated graph can be reduced to Matrix, which reduces computational complexity through matrix analysis and improves the effect and efficiency of federated graph calculations.
  • Fig. 2 is a schematic flowchart of a distributed graph embedding-based federated graph computing method according to an embodiment of the present application.
  • the execution subject may be one or more electronic devices, more specifically It may be a processing module; from a program point of view, the execution subject may be a program carried on these electronic devices.
  • the execution subject of the method may be the processor in the embodiment shown in FIG. 1 .
  • the method provided in this embodiment may include the following steps:
  • first-party data and the second-party data are the private data of the first party and the second party respectively, and then the first party constructs the first graph based on the first-party data, and the second party constructs the graph based on the second-party data. Second picture.
  • the first graph A and the second graph B under the data owned by the two parties can be respectively constructed according to the entity nodes and edge relationships of the actual task definition graph.
  • the nodes in the first graph are the first-party users and/or the first-party merchants, and the edges of the first graph are determined according to the association relationship between the nodes in the first graph; the nodes in the second graph are the second-party merchants.
  • the edges of the second graph are determined according to the association relationship between the nodes of the second graph.
  • the above association relationship may be a transaction relationship between a user and a merchant, a transfer relationship between a user and a user, a transfer relationship between a merchant and a merchant, and any other association relationship between nodes.
  • the association between the first graph and the second graph can be established by finding the common nodes in the first graph and the second graph, where the common nodes refer to the same entity nodes that co-exist in the first graph and the second graph , such as the same user, the same merchant, and so on.
  • the first A graph A is associated with a second graph B to form a federated graph.
  • the common nodes in the first graph network and the second graph network may be aligned according to the attribute information of merchants and/or users.
  • the public node corresponding to the same user can be determined through user attribute information such as mobile phone number and email address.
  • island nodes that have no direct or indirect relationship with common nodes in the first graph and the second graph may be eliminated to obtain a federated graph.
  • the island nodes in the first graph A and the second graph B can be eliminated and filtered to form a federation for two-way graph calculation picture.
  • PiA and PiA' are the embedding vectors of each first graph node of the first graph
  • PiB and PiB' are the embedding vectors of each second graph node of the second graph.
  • Graph Embedding is a method of mapping graph nodes (high-dimensional vectors) into low-dimensional vectors, so as to obtain the unique representation of each node, and then use its vectors to implement such as recommendation, classification or prediction, etc. Task.
  • the graph embedding algorithm based on random walk first uses the random walk algorithm to sample the nodes on the graph multiple times to obtain some node sequences, and then generates the vector representation of each node of the graph according to the node sequence to obtain the graph embedding vector.
  • the first graph embedding vector [PiA, PiB] includes the first graph part PiA and the second graph part PiB, wherein the first graph part PiA needs to be obtained by sampling on the first graph by the first party, Part of the second graph PiB needs to be sampled by the second party on the second graph, so it is necessary to use a distributed graph embedding algorithm to distribute the above graph embedding method on the first graph and the second graph to obtain the first graph and the second graph respectively. Part of the second figure.
  • the above 204 further includes: taking the first graph node as the starting node to perform multiple random walks on the federated graph, the first party determines PiA according to the walk path on the first graph, and the second party determines PiA according to The matching walk path on the second graph determines PiB; multiple random walks are performed on the federated graph with the node in the second graph as the starting node, and the second party determines PiB' according to the walk path on the second graph. Fang determines PiA' according to the matching walk path on the first graph.
  • the above-mentioned matching walk path on the second graph may be obtained after multiple random walks in the second graph starting from common nodes between the first graph and the second graph.
  • the above-mentioned matching walk path on the first graph may be obtained after performing multiple random walks on the first graph starting from common nodes between the first graph and the second graph.
  • the first party since the first party does not know the random walk path of the second graph, similarly, the second party does not know the random walk path of the first graph. Therefore, for the case of random walks starting from the nodes in the first graph, the first party can obtain the PiA part of the embedding vector of the first graph, and the second party can match the remaining walk tasks from the first party. The PiB portion of the first embedding vector. Similarly, for the case of random walks starting from the nodes of the second graph, the second party can obtain the PiB' part of the embedding vector of the second graph, and the first party can match the remaining walk tasks from the second party Out of the PiA' part of the second graph embedding vector.
  • determining the first graph portion PiA of the first graph embedding vector comprises:
  • the first party starts with any node in the first graph and performs a random walk on the first graph, stops walking when it reaches any common node, and records the first graph
  • obtaining the node frequency matrix of the first graph corresponding to the walking steps Mia of each first graph includes:
  • the frequency matrix PA_Mia of walking Mia steps from the first graph to the public node Vab i can be obtained, where the value of Mia is an integer between the minimum number of steps m from the starting node to the public node and the total number of steps M;
  • the first graph includes Na node Pa n1 ; the value of Mia is an integer between the minimum number of steps m from the starting node to the public node and the total number of steps M; Pan_Mia is a random walk from the starting node X times The number of times to pass through the node Pa n of the first graph after the Mia step.
  • first graph part PiA of the first graph embedding vector can be calculated using the following formula:
  • determining the second graph portion PiB of the first graph embedding vector comprises the steps of:
  • the first party sends to the second party the identifier Vab i of each common node and the corresponding number of steps Mia of all first graph walks.
  • PB_Mib_Vab i [Pb 1 _Mib, Pb 2 _Mib, ..., Pb Nb _Mia]; further, matrix accumulation is performed on the frequency matrices of each common node Vab i corresponding to multiple second graph walk steps Mib to obtain PiB_Vab i .
  • the graph embedding vector PiB_Vab i corresponding to the above common node Vab i may also be pre-calculated from the second graph.
  • the number of sub-walks X 1 refers to the number of times of walking to the second graph during X random walks.
  • multiple random walks can be performed on the federated graph with the node in the second graph as the starting node, so that the second party can determine PiB' according to the walk path on the second graph, and the first party can determine PiB' according to the walk path on the first graph.
  • the matching walk path of determines PiA', and obtains the above-mentioned second graph embedding vector [PiA', PiB'].
  • the number of random walk steps M' can be defined.
  • the second party performs a random walk on the second graph starting from any node in the second graph, and stops walking when it reaches any common node. And record the number of walking steps Mia' in the second graph, the identification Vab i of the public node to which the walk is made, and each second graph node that this walk passes through; after random walking X' times, count the The number of walking steps Mia' in the second graph and the frequency of each second graph node being walked to obtain the second graph node frequency matrix corresponding to the number of walking steps Mia' in each second graph; corresponding to each second graph
  • the second graph node frequency matrix of the walk step number Mia' is calculated by matrix accumulation, and divided by the number of random walks X' to obtain the second graph part PIB' of the second graph embedding vector.
  • the first party determines PiA' based on a matching walk path on the first graph, comprising:
  • the second party sends the identification Vab i of each common node and the corresponding walk number Mia' of all the second graphs to the first party;
  • the first party determines the graph embedding vector PiA'_Vab i corresponding to the (M'-Mia') step of walking in the first graph starting from each common node Vab i ;
  • the number of sub-walks X'1 refers to the number of times to walk to the first graph during X' random walks.
  • the second graph node frequency matrix corresponding to each walk step Mia' includes:
  • the second graph includes Nb nodes Pb n2 ; the value of Mia' is an integer between the minimum number of steps m' from the starting node to the public node and the total number of steps M'; Pb 2 _Mia' is a random number from the starting node The number of times the second graph node Pb n2 is passed after walking X' times Mia' steps.
  • the PIB' is calculated using the following formula:
  • PiA' is calculated using the following formula:
  • the above 205 may further include: performing cluster analysis on the first graph part PiA of the first graph embedding vector and the first graph part PiA' of the second graph embedding vector based on the federated clustering method to obtain the federated graph The first clustering cluster of the first part of the graph; Based on the federated clustering method, cluster analysis is performed on the second graph part PiB of the first graph embedding vector and the second graph part PiB' of the second graph embedding vector to obtain a federated graph The second clustering cluster of the first part of the graph; based on the first clustering cluster and the second clustering cluster, the cross-graph clustering is screened to obtain the target clustering cluster with a higher degree of clustering.
  • this embodiment of the application can be used to identify risk groups in scenarios such as syndicated transaction fraud between payment institutions. Taking the mining of fraudulent users of transaction users under payment platform A and cloud QuickPass as an example, there is currently one payment channel merchant Transaction arbitrage, money laundering and the risk of fraudulent gangs transferring transactions under another payment channel, through the following steps to achieve the association of transaction fraud:
  • payment platform A, merchants, and transfer transactions between users form the first graph
  • payment platform B users, merchants, and transfer transactions between users form the second graph
  • the entities are user and merchant nodes, and the relationship is the transfer relationship between users, and The transaction payment relationship between the user and the merchant.
  • each user or merchant node of the payment platform B walks M steps, and when it reaches the common node Vab i after walking Mia steps, record the frequency vector Pa-Mia that it has gone through Mia steps to the common node for many times.
  • the synchronization number goes to the common node vector weighted aggregation to obtain its embedding vector PiA, and according to the reached common node sequence, the corresponding correlation matches the vector matrix of the B-side payment platform
  • a user walking M-Mia steps from the public node, at the intermediate node The corresponding common node Vab i and the vector matrix under the walk (M-Ma) step are encrypted and matched, and the corresponding embedding vector PiB is obtained by weighted aggregation of multiple matching results, and PiA and PiB are stored in their respective data spaces.
  • first and second are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features.
  • the features defined as “first” and “second” may explicitly or implicitly include at least one of these features.
  • “plurality” means at least two, such as two, three, etc., unless otherwise specifically defined.
  • an embodiment of the present invention also provides a distributed graph embedding-based federated graph clustering device, which is used to implement the distributed graph embedding-based federated graph clustering method provided by any of the above embodiments.
  • Fig. 6 is a schematic structural diagram of a federated graph clustering device based on distributed graph embedding provided by an embodiment of the present invention.
  • the device 600 includes:
  • a construction module 601 configured to construct a first graph based on the first-party data, and construct a second graph based on the second-party data;
  • An associating module 602 configured to encrypt and intersect the first-party data and the second-party data, determine the common nodes in the first graph and the second graph, and associate the first graph and the second graph according to the common nodes to obtain a federated graph;
  • the learning module 603 is used to learn the federated graph by using the distributed graph embedding algorithm based on random walk, and determine the first graph embedding vector [PiA, PiB] starting from the first graph and the second graph embedding vector starting from the second graph [PiA', PiB'], wherein, PiA and PiA' are the embedding vectors of each first graph node of the first graph, and PiB and PiB' are the embedding vectors of each second graph node of the second graph;
  • the clustering module 604 is used to perform cluster analysis on the first graph embedding vector [PiA, PiB] and the second graph embedding vector [PiA', PiB'] of the federated graph based on the federated clustering method to obtain a clustering result.
  • Fig. 7 is a federated graph clustering device based on distributed graph embedding according to an embodiment of the present application, which is used to execute the federated graph clustering method based on distributed graph embedding shown in Fig. 2, the device includes: at least one process and a memory connected in communication with at least one processor; wherein, the memory stores instructions that can be executed by at least one processor, and the instructions are executed by at least one processor, so that at least one processor can perform the method of the above-mentioned embodiment .
  • a non-volatile computer storage medium based on a distributed graph embedding federated graph clustering method, on which computer-executable instructions are stored, and the computer-executable instructions are configured to be executed by a processor Execution at runtime: the method described in the above-mentioned embodiments.
  • the device, device, and computer-readable storage medium provided in the embodiments of the present application correspond to the method one-to-one. Therefore, the device, device, and computer-readable storage medium also have beneficial technical effects similar to their corresponding methods.
  • the beneficial technical effect of the method has been described in detail, therefore, the beneficial technical effect of the device, equipment and computer-readable storage medium will not be repeated here.
  • the embodiments of the present invention may be provided as methods, systems or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-permanent storage in computer readable media, in the form of random access memory (RAM) and/or nonvolatile memory such as read-only memory (ROM) or flash RAM. Memory is an example of computer readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash random access memory
  • Computer-readable media including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information.
  • Information may be computer readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM Electrically Erasable Programmable Read-

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供了基于分布式图嵌入的联邦图聚类方法、装置及读存储介质,该方法包括:基于第一方数据构建第一图,基于第二方数据构建第二图;对第一方数据和第二方数据进行加密求交,确定第一图和第二图中的公共节点,根据公共节点关联第一图和第二图,得到联邦图;利用基于随机游走的分布式图嵌入算法学习联邦图,确定从第一图出发的第一图嵌入向量[PiA,PiB]和从第二图出发的第二图嵌入向量[PiA',PiB'];基于联邦聚类方法对联邦图的第一图嵌入向量[PiA,PiB]和第二图嵌入向量[PiA',PiB']进行聚类分析,得到聚类结果。利用上述方法,能够实现对双方隐私数据进行联邦图聚类,得到更好的聚类效果。

Description

基于分布式图嵌入的联邦图聚类方法、装置及可读存储介质
本申请要求于2022年01月28日提交的、申请号为202210106101.1、标题为“基于分布式图嵌入的联邦图聚类方法、装置及可读存储介质”的中国专利申请的优先权,该中国专利申请的公开内容以引用的方式并入本文。
技术领域
本发明属于聚类领域,具体涉及基于分布式图嵌入的联邦图聚类方法、装置及可读存储介质。
背景技术
本部分旨在为权利要求书中陈述的本发明的实施方式提供背景或上下文。此处的描述不因为包括在本部分中就承认是现有技术。
当前联邦学习技术对于数据不出库下的数据联合使用,挖掘多方数据价值有较高的应用潜力,但主要支持的算法为传统机器学习分类模型、回归模型等,集中于个体的价值画像评价,对于潜在的团伙行为挖掘较为欠缺,同时由于图计算涉及多方数据的多轮拓扑交互计算,基于隐私计算的图挖掘算法开发目前研究较为薄弱,业界成果较少。
因此,基于隐私图结构的联邦学习是一个亟待解决的问题。
发明内容
针对上述现有技术中存在的问题,提出了一种基于分布式图嵌入的联邦图聚类方法、装置及可读存储介质,利用这种方法、装置及计算机可读存储介质,能够解决上述问题。
本发明提供了以下方案。
第一方面,提供一种基于分布式图嵌入的联邦图聚类方法,包括:基于第一方数据构建第一图,基于第二方数据构建第二图;对第一方数据和第二方数据进行加密求交,确定第一图和第二图中的公共节点,根据公共节点关联第一图和第二图,得到联邦图;利用基于随机游走的分布式图嵌入算法学习联邦图,确定从第一图出发的第一图嵌入向量[PiA,PiB]和从第二图出发的第二图嵌入向量[PiA’,PiB’],其中,PiA和PiA’为第一图的各个第一图节点的嵌入向量,PiB和PiB’为第二图的各个第二图节点的嵌入向量;基于联邦聚类方法对联 邦图的第一图嵌入向量[PiA,PiB]和第二图嵌入向量[PiA’,PiB’]进行聚类分析,得到聚类结果。
在一个实施例中,确定第一图嵌入向量[PiA,PiB]和第二图嵌入向量[PiA’,PiB’],包括:以第一图节点为起始节点在联邦图上进行多次随机游走,第一方根据第一图上的游走路径确定PiA,第二方根据第二图上的匹配游走路径确定PiB;以第二图节点为起始节点在联邦图上进行多次随机游走,第二方根据第二图上的游走路径确定PiB’,第一方根据第一图上的匹配游走路径确定PiA’。
在一个实施例中,根据公共节点关联第一图和第二图得到联邦图,还包括:剔除第一图和第二图中与公共节点无直接或间接的关联关系的孤岛节点,得到联邦图。
在一个实施例中,第一方数据和第二方数据相互隔离。
在一个实施例中,第一图的节点为第一方用户和/或第一方商户,第一图的边根据第一图节点之间的关联关系确定;第二图的节点为第二方用户和/或第二方商户,第二图的边根据第二图节点之间的关联关系确定。
在一个实施例中,对第一方数据和第二方数据进行加密求交,确定第一图网络和第二图网络中的公共节点,包括:根据商户和/或用户的属性信息,对齐第一图网络和第二图网络中的公共节点。
在一个实施例中,以第一图节点为起始节点在联邦图上进行多次随机游走,第一方根据第一图上的游走路径确定PiA,包括:定义随机游走步数M,第一方以任意一个第一图节点为起始节点在第一图上进行随机游走,当游走至任意一个公共节点时停止游走,并记录第一图游走步数Mia、游走到的公共节点的标识Vab i和本次游走经过的各第一图节点;在随机游走X次以后,统计每次游走的第一图游走步数Mia以及各第一图节点被游走到的频数,得到对应于各个第一图游走步数Mia的第一图节点频数矩阵;将对应于各个第一图游走步数Mia的第一图节点频数矩阵进行矩阵累加计算,并除以随机游走次数X,得到第一图嵌入向量的第一图部分PiA。
在一个实施例中,第二方根据第二图上的匹配游走路径确定PiB,包括:在X次随机游走过程之中或之后,第一方将游走到的每个公共节点的标识Vab i和对应的全部第一图游走步数Mia发送给第二方;第二方确定从每个公共节点Vab i出发在第二图游走对应的(M-Mia)步的图嵌入向量PiB_Vab i;将全部的公共节点Vab i对应的图嵌入向量PiB_Vab i进行累加,并除以子游走次数X 1,得到第一图嵌入向量的第二图部分PiB;其中,子游走次数X 1是指X次随机游走过程中游走至第二图的次数。
在一个实施例中,对应于各个第一图游走步数Mia的第一图节点频数矩阵包括:PA_Mia=[Pa n1_Mia,n1=1,2,…,Na];其中,第一图包括Na个节点Pa n1;Mia取值为起始节点距离公共节点的最小步数m与总步数M之间的整数;Pa n_Mia为从起始节点随机游走X次Mia步之后经过第一图的节点Pa n的次数。
在一个实施例中,利用以下公式计算得到PiA:
Figure PCTCN2022117418-appb-000001
在一个实施例中,利用以下公式计算得到PiB:
Figure PCTCN2022117418-appb-000002
在一个实施例中,以第二图节点为起始节点在联邦图上进行多次随机游走,第二方根据第二图上的游走路径确定PIB’,包括:定义随机游走步数M’,第二方以任意一个第二图节点为起始节点在第二图上进行随机游走,当游走至任意一个公共节点时停止游走,并记录第二图游走步数Mia’、游走到的公共节点的标识Vab i和本次游走经过的各第二图节点;在随机游走X’次以后,统计每次游走的第二图游走步数Mia’以及各第二图节点被游走到的频数,得到对应于各个第二图游走步数Mia’的第二图节点频数矩阵;将对应于各个第二图游走步数Mia’的第二图节点频数矩阵进行矩阵累加计算,并除以随机游走次数X’,得到第二图嵌入向量的第二图部分PIB’。
在一个实施例中,第一方根据第一图上的匹配游走路径确定PiA’,包括:在X’次随机游走过程之中或之后,第二方将游走到的每个公共节点的标识Vab i和对应的全部第二图游走步数Mia’发送给第一方;第一方确定从每个公共节点Vab i出发在第一图游走对应的(M’-Mia’)步的图嵌入向量PiA’_Vab i;将全部的公共节点Vab i对应的图嵌入向量PiA’_Vab i进行累加,并除以子游走次数X′ 1,得到第二图嵌入向量的第一图部分PiA’;其中,子游走次数X′ 1是指X’次随机游走过程中游走至第一图的次数。
在一个实施例中,对应于各个游走步数Mia’的第二图节点频数矩阵包括:PB_Mia’=[Pb n2_Mia′,n2=1,2,…,Nb];其中,第二图包括Nb个节点Pb n2;Mia’取值为起始节点距离公共节点的最小步数m’与总步数M’之间的整数;Pb 2_Mia′为从起始节点随机游走X’次Mia’步之后经过第二图节点Pb n2的次数。
在一个实施例中,利用以下公式计算得到PIB’:
Figure PCTCN2022117418-appb-000003
在一个实施例中,利用以下公式计算得到PiA’:
Figure PCTCN2022117418-appb-000004
在一个实施例中,基于联邦聚类方法对联邦图的第一图嵌入向量[PiA,PiB]和第二图嵌入向量[PiA’,PiB’]进行聚类分析,包括:基于联邦聚类方法对第一图嵌入向量的第一图部分PiA和第二图嵌入向量的第一图部分PiA’进行聚类分析,获得联邦图的第一图部分的第一 聚类簇;基于联邦聚类方法对第一图嵌入向量的第二图部分PiB和第二图嵌入向量的第二图部分PiB’进行聚类分析,获得联邦图的第一图部分的第二聚类簇;基于第一聚类簇和第二聚类簇筛选跨图聚类簇,获得聚类程度更高的目标聚类簇。
第二方面,提供一种基于分布式图嵌入的联邦图聚类装置,包括:构建模块,用于基于第一方数据构建第一图,基于第二方数据构建第二图;关联模块,用于对第一方数据和第二方数据进行加密求交,确定第一图和第二图中的公共节点,根据公共节点关联第一图和第二图,得到联邦图;学习模块,用于利用基于随机游走的分布式图嵌入算法学习联邦图,确定从第一图出发的第一图嵌入向量[PiA,PiB]和从第二图出发的第二图嵌入向量[PiA’,PiB’],其中,PiA和PiA’为第一图的各个第一图节点的嵌入向量,PiB和PiB’为第二图的各个第二图节点的嵌入向量;聚类模块,用于基于联邦聚类方法对联邦图的第一图嵌入向量[PiA,PiB]和第二图嵌入向量[PiA’,PiB’]进行聚类分析,得到聚类结果。
第三方面,提供一种基于分布式图嵌入的联邦图聚类装置,包括:至少一个处理器;以及,与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行:如第一方面的方法。
第四方面,提供一种计算机可读存储介质,计算机可读存储介质存储有程序,当程序被多核处理器执行时,使得多核处理器执行如第一方面的方法。
上述实施例的优点之一,能够通过利用分布式图嵌入算法实现在第一方数据和第二方数据互为隐私数据的情况下学习联邦图的图嵌入向量,并且可将联邦图的图结构拓扑特性降维至矩阵,通过矩阵分析降低计算复杂度,提高联邦图计算的效果和效率。
本发明的其他优点将配合以下的说明和附图进行更详细的解说。
应当理解,上述说明仅是本发明技术方案的概述,以便能够更清楚地了解本发明的技术手段,从而可依照说明书的内容予以实施。为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举例说明本发明的具体实施方式。
附图说明
通过阅读下文的示例性实施例的详细描述,本领域普通技术人员将明白本文的优点和益处以及其他优点和益处。附图仅用于示出示例性实施例的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的标号表示相同的部件。在附图中:
图1为根据本发明一实施例的基于分布式图嵌入的联邦图聚类设备的结构示意图;
图2为根据本发明一实施例的基于分布式图嵌入的联邦图聚类方法的流程示意图;
图3为根据本发明一实施例的第一图和第二图的示意图;
图4为根据本发明一实施例的一种联邦图的示意图;
图5为根据本发明一实施例的另一种联邦图的示意图;
图6为根据本发明一实施例的基于分布式图嵌入的联邦图聚类装置的结构示意图;
图7为根据本发明又一实施例的基于分布式图嵌入的联邦图聚类装置的结构示意图。
在附图中,相同或对应的标号表示相同或对应的部分。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域技术人员。
在本申请实施例的描述中,应理解,诸如“包括”或“具有”等术语旨在指示本说明书中所公开的特征、数字、步骤、行为、部件、部分或其组合的存在,并且不旨在排除一个或多个其他特征、数字、步骤、行为、部件、部分或其组合存在的可能性。
除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”等的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
本申请中的所有代码都是示例性的,本领域技术人员根据所使用的编程语言,具体的需求和个人习惯等因素会在不脱离本申请的思想的条件下想到各种变型。
另外还需要说明的是,在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。
如图1所示,图1是本发明实施例方案涉及的硬件运行环境的结构示意图。
需要说明的是,图1即可为基于分布式图嵌入的联邦图聚类设备的硬件运行环境的结构示意图。本发明实施例基于分布式图嵌入的联邦图聚类设备可以是PC,便携计算机等终端设备。
如图1所示,该基于分布式图嵌入的联邦图聚类设备可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于 实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图1中示出的基于分布式图嵌入的联邦图聚类设备结构并不构成对基于分布式图嵌入的联邦图聚类设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及分布式图嵌入的联邦图聚类程序。其中,操作系统是管理和控制基于分布式图嵌入的联邦图聚类设备硬件和软件资源的程序,支持基于分布式图嵌入的联邦图聚类程序以及其它软件或程序的运行。
在图1所示的基于分布式图嵌入的联邦图聚类设备中,用户接口1003主要用于接收第一终端、第二终端和监管终端发送的请求、数据等;网络接口1004主要用于连接后台服务器与后台服务器进行数据通信;而处理器1001可以用于调用存储器1005中存储的基于分布式图嵌入的联邦图聚类程序,并执行以下操作:
基于第一方数据构建第一图,基于第二方数据构建第二图;对第一方数据和第二方数据进行加密求交,确定第一图和第二图中的公共节点,根据公共节点关联第一图和第二图,得到联邦图;利用基于随机游走的分布式图嵌入算法学习联邦图,确定从第一图出发的第一图嵌入向量[PiA,PiB]和从第二图出发的第二图嵌入向量[PiA’,PiB’],其中,PiA和PiA’为第一图的各个第一图节点的嵌入向量,PiB和PiB’为第二图的各个第二图节点的嵌入向量;基于联邦聚类方法对联邦图的第一图嵌入向量[PiA,PiB]和第二图嵌入向量[PiA’,PiB’]进行聚类分析,得到聚类结果。
由此,通过利用分布式图嵌入算法可以实现在第一方数据和第二方数据互为隐私数据的情况下学习联邦图的图嵌入向量,并且可将联邦图的图结构拓扑特性降维至矩阵,通过矩阵分析降低计算复杂度,提高联邦图计算的效果和效率。
图2为根据本申请一实施例的基于分布式图嵌入的联邦图计算方法方法的流程示意图,在该流程中,从设备角度而言,执行主体可以是一个或者多个电子设备,更具体地可以是的处理模块;从程序角度而言,执行主体相应地可以是搭载于这些电子设备上的程序。在本实施例中,方法的执行主体可以是图1所示实施例中的处理器。
如图2所示,本实施例提供的方法可以包括以下步骤:
202a、基于第一方数据构建第一图;202b、基于第二方数据构建第二图。
可以理解,第一方数据和第二方数据分别为第一方和第二方的隐私数据,进而第一方基于该第一方数据构建第一图,第二方基于该第二方数据构建第二图。
参考图3,可以根据实际任务定义图的实体节点和边关系,分别构建两方所拥有数据下的第一图A和第二图B。
在一些实施例中,第一图的节点为第一方用户和/或第一方商户,第一图的边根据第一图节点之间的关联关系确定;第二图的节点为第二方用户和/或第二方商户,第二图的边根据第二图节点之间的关联关系确定。比如,上述关联关系可以是用户和商户之间的交易关系,用户和用户之间的转账关系,商户和商户之间的转账关系等任意一种节点之间的关联关系。
204、对第一方数据和第二方数据进行加密求交,确定第一图和第二图中的公共节点,根据公共节点关联第一图和第二图,得到联邦图;
可以理解,可以通过寻找第一图和第二图中的公共节点来建立第一图和第二图的关联,其中公共节点是指在第一图中和第二图中共同存在的相同实体节点,比如同一用户、同一商户等等。
参考图4,假设第一图A中的V AB1和第二图中的V AB1为公共节点,第一图A中的V AB2和第二图中的V AB2为公共节点,可以据此将第一图A和第二图B关联形成联邦图。
在一些实施例中,可以根据商户和/或用户的属性信息,对齐第一图网络和第二图网络中的公共节点。比如,可以通过手机号、邮箱等用户属性信息确定对应于同一用户的公共节点。
在一些实施例中,在关联第一图和第二图之后,还可以剔除第一图和第二图中与公共节点无直接或间接的关联关系的孤岛节点,得到联邦图。
参考图4,由于上述孤岛节点不能进行跨越第一图和第二图的联邦训练,因此,可以剔除第一图A和第二图B中的孤岛节点,过滤形成用于两方图计算的联邦图。
206、利用基于随机游走的分布式图嵌入算法学习联邦图,确定从第一图出发的第一图嵌入向量[PiA,PiB]和从第二图出发的第二图嵌入向量[PiA’,PiB’];
PiA和PiA’为第一图的各个第一图节点的嵌入向量,PiB和PiB’为第二图的各个第二图节点的嵌入向量。
图嵌入(Graph Embedding)是一种将图节点(高维向量)映射为低维向量的方法,借此来得到每个节点的唯一性表示,然后用其向量来实现诸如推荐、分类或预测等任务。基于随机游走(random walk)的图嵌入算法首先利用随机游走算法对图上的节点进行多次采样,得到一些节点序列,然后根据该节点序列生成图的各个节点的向量表示,得到图嵌入向量。具体 到本实施例中,由于第一图嵌入向量[PiA,PiB]包括第一图部分PiA和第二图部分PiB,其中第一图部分PiA需要由第一方在第一图上采样获得,第二图部分PiB需要由第二方在第二图上采样获得,因此需要采用分布式图嵌入算法将上述图嵌入方法分布在第一图和第二图上,以分别获取第一图部分和第二图部分。
可以首先定义图嵌入中每个节点的随机游走步数,记作M步。对于每个第一图节点,当其随机游走M步时,可能会从第一图游走通过公共节点走向第二图,在第一图A走Mia步,在第二图B走(M-Mia)步。例如,参考图5,假设从节点1出发,游走M=4步,经历节点1-2-5-6-7,其中第一图A游走2步,1-2-5;其中第二图B游走2步,5-6-7。
在一些实施例中,上述204进一步包括:以第一图节点为起始节点在联邦图上进行多次随机游走,第一方根据第一图上的游走路径确定PiA,第二方根据第二图上的匹配游走路径确定PiB;以第二图节点为起始节点在联邦图上进行多次随机游走,第二方根据第二图上的游走路径确定PiB’,第一方根据第一图上的匹配游走路径确定PiA’。
上述第二图上的匹配游走路径可以是从第一图与第二图之间的公共节点出发,在第二图进行多次随机游走后获得的。
上述第一图上的匹配游走路径可以是从第一图与第二图之间的公共节点出发,在第一图进行多次随机游走后获得的。
可以理解,在联邦图中,由于第一方并不知晓第二图的随机游走路径,同样地,第二方也并不知晓第一图的随机游走路径。因此,针对从第一图节点出发进行随机游走的情况,可以由第一方获取第一图嵌入向量的PiA部分,并由第二方针对第一方传来的剩余游走任务匹配出该第一嵌入向量的PiB部分。同样地,针对从第二图节点出发进行随机游走的情况,可以由第二方获取第二图嵌入向量的PiB’部分,并由第一方针对第二方传来的剩余游走任务匹配出该第二图嵌入向量的PiA’部分。
在一些实施例中,确定第一图嵌入向量的第一图部分PiA,包括:
定义随机游走步数M,第一方以任意一个第一图节点为起始节点在第一图上进行随机游走,当游走至任意一个公共节点时停止游走,并记录第一图游走步数Mia、游走到的公共节点的标识Vab i和本次游走经过的各第一图节点;
在随机游走X次以后,统计每次游走的第一图游走步数Mia以及各第一图节点被游走到的频数,得到对应于各个第一图游走步数Mia的第一图节点频数矩阵;
将对应于各个第一图游走步数Mia的第一图节点频数矩阵进行矩阵累加计算,并除以随机游走次数X,得到第一图嵌入向量的第一图部分PiA。
具体地,获取对应于各个第一图游走步数Mia的第一图节点频数矩阵包括:
游走1次到达公共节点:
Mia=1,频数矩阵:PA_1=[Pa 1_1,Pa 2_1,Pa 3_1,Pa 4_1,…,Pa Na_1];
游走2次到达公共节点:
Mia=2,频数矩阵:PA_2=[Pa 2_2,Pa 2_2,Pa 3_2,Pa 4_2,…,Pa Na_2];
游走M次到达公共节点:
Mia=M,频数矩阵:PA_M=[Pa M_M,Pa 2_M,Pa 3_M,Pa 4_M,…,Pa Na_M];
综上,可以获得从第一图游走Mia步至公共节点Vab i的频数矩阵PA_Mia,其中,Mia取值为起始节点距离公共节点的最小步数m与总步数M之间的整数;
PA_Mia=[Pa n1_Mia,n1=1,2,…,Na]=[Pa 1_Mia,Pa 2_Mia,…,Pa Na_Mia];
其中,第一图包括Na个节点Pa n1;Mia取值为起始节点距离公共节点的最小步数m与总步数M之间的整数;Pa n_Mia为从起始节点随机游走X次Mia步之后经过第一图的节点Pa n的次数。
进一步地,可以利用以下公式计算得到第一图嵌入向量的第一图部分PiA:
Figure PCTCN2022117418-appb-000005
在一些实施例中,确定第一图嵌入向量的第二图部分PiB,包括如下步骤:
在X次随机游走过程之中或之后,第一方将游走到的每个公共节点的标识Vab i和对应的全部第一图游走步数Mia发送给第二方。比如,第二方收到来自第一方的多个(Vab i,剩余的第二图游走步数Mib=M-Mia)的组合。
进一步地,第二方确定从每个公共节点Vab i出发在第二图游走对应的Mib=(M-Mia)步的图嵌入向量PiB_Vab i
比如,在第一图A游走Mia步到公共节点Vab i之后,继续在第二图以公共节点Vab i为游走起点游走Mib步,频数矩阵为:PB_Mib_Vab i=[Pb 1_Mib,Pb 2_Mib,…,Pb Nb_Mia];进一步,将每个公共节点Vab i对应多个第二图游走步数Mib的频数矩阵进行矩阵累加,得到PiB_Vab i
可以理解,也可以由第二图预先计算好上述公共节点Vab i对应的图嵌入向量PiB_Vab i
进一步地,将全部的公共节点Vab i对应的图嵌入向量PiB_Vab i进行累加,并除以子游走次数X 1,得到第一图嵌入向量的第二图部分PiB;
比如,利用以下公式计算得到PiB:
Figure PCTCN2022117418-appb-000006
其中,子游走次数X 1是指X次随机游走过程中游走至第二图的次数。
同样地,可以以第二图节点为起始节点在联邦图上进行多次随机游走,使第二方根据第二图上的游走路径确定PiB’,使第一方根据第一图上的匹配游走路径确定PiA’,得到上述第二图嵌入向量[PiA’,PiB’]。
具体地,可以定义随机游走步数M’,第二方以任意一个第二图节点为起始节点在第二图上进行随机游走,当游走至任意一个公共节点时停止游走,并记录第二图游走步数Mia’、游走到的公共节点的标识Vab i和本次游走经过的各第二图节点;在随机游走X’次以后,统计每次游走的第二图游走步数Mia’以及各第二图节点被游走到的频数,得到对应于各个第二图游走步数Mia’的第二图节点频数矩阵;将对应于各个第二图游走步数Mia’的第二图节点频数矩阵进行矩阵累加计算,并除以随机游走次数X’,得到第二图嵌入向量的第二图部分PIB’。
在一些实施例中,第一方根据第一图上的匹配游走路径确定PiA’,包括:
在X’次随机游走过程之中或之后,第二方将游走到的每个公共节点的标识Vab i和对应的全部第二图游走步数Mia’发送给第一方;
第一方确定从每个公共节点Vab i出发在第一图游走对应的(M’-Mia’)步的图嵌入向量PiA’_Vab i
将全部的公共节点Vab i对应的图嵌入向量PiA’_Vab i进行累加,并除以子游走次数X′ 1,得到第二图嵌入向量的第一图部分PiA’;
其中,子游走次数X′ 1是指X’次随机游走过程中游走至第一图的次数。
在一些实施例中,对应于各个游走步数Mia’的第二图节点频数矩阵包括:
PB_Mia’=[Pb n2_Mia′,n2=1,2,…,Nb];
其中,第二图包括Nb个节点Pb n2;Mia’取值为起始节点距离公共节点的最小步数m’与总步数M’之间的整数;Pb 2_Mia′为从起始节点随机游走X’次Mia’步之后经过第二图节点Pb n2的次数。
在一些实施例中,利用以下公式计算得到PIB’:
Figure PCTCN2022117418-appb-000007
在一些实施例中,利用以下公式计算得到PiA’:
Figure PCTCN2022117418-appb-000008
208、基于联邦聚类方法对联邦图的第一图嵌入向量[PiA,PiB]和第二图嵌入向量[PiA’,PiB’]进行聚类分析,得到聚类结果。
在一些实施例中,上述205可以进一步包括:基于联邦聚类方法对第一图嵌入向量的第一图部分PiA和第二图嵌入向量的第一图部分PiA’进行聚类分析,获得联邦图的第一图部分的第一聚类簇;基于联邦聚类方法对第一图嵌入向量的第二图部分PiB和第二图嵌入向量 的第二图部分PiB’进行聚类分析,获得联邦图的第一图部分的第二聚类簇;基于第一聚类簇和第二聚类簇筛选跨图聚类簇,获得聚类程度更高的目标聚类簇。
在一个示例中,本申请实施例可用于跨支付机构间团伙化交易欺诈等场景的风险团伙识别,以支付平台A和云闪付下交易用户的欺诈用户挖掘为例,当前存在一方支付渠道商户交易套利、洗钱后在另一支付通道下转账交易的欺诈团伙风险,通过以下步骤实现交易欺诈的关联:
1、基于第一方数据构建第一图,基于第二方数据构建第二图;
如支付平台A和商户、用户间转账交易形成第一图,支付平台B的用户和商户、用户间转账交易形成第二图,其实体为用户及商户节点,关系为用户间的转账关系,以及用户与商户的交易支付关系。
2、对第一方数据和第二方数据进行加密求交,确定第一图和第二图中的公共节点,根据公共节点关联第一图和第二图,得到联邦图。
比如,通过用户身份信息及企业工商信息为主键值,对齐两支付通道下的公共用户和商户,公共用户为两支付通道下的共有注册交易用户、公共商户则为聚合支付商户、实现商户收款码互认互扫的交易商户等。
3、利用基于随机游走的分布式图嵌入算法学习联邦图。
定义该支付平台B的每个用户或商户节点游走M步,当其游走Mia步后到达共有节点Vab i,记录其多次经过Mia步走到共有节点的频数向量Pa-Mia,通过不同步数走到共有节点向量加权聚合获得其嵌入向量PiA,根据所到达的共有节点序列,对应关联匹配B侧支付平台A用户从公有节点开始游走M-Mia步的向量矩阵,在中间节点处加密匹配对应的共有节点Vab i及其游走(M-Ma)步下的向量矩阵,并根据多次匹配结果加权聚合获得对应的嵌入向量PiB,PiA和PiB存储于各自数据空间。
利用同样的方法,从支付平台A的用户节点或商户节点出发进行随机游走,得到PiA’和PiB’。
4、采用基于联邦学习聚类分析的方法,先对第一图部分PiA,PiA’分别做聚类分析,标记基于第一图的向量挖掘的聚类簇;在此基础上中间节点记录聚类簇中心核数据,引入第二图部分PiB,PiB’后比较聚类簇内各节点在第二图下的聚类情况,以挖掘第二图的向量空间聚类中心核,再次筛选聚类簇,获得最强关联节点数据。
在本说明书的描述中,参考术语“一些可能的实施方式”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述 不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本发明的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本发明的实施例所属技术领域的技术人员所理解。
关于本申请实施例的方法流程图,将某些操作描述为以一定顺序执行的不同的步骤。这样的流程图属于说明性的而非限制性的。可以将在本文中所描述的某些步骤分组在一起并且在单个操作中执行、可以将某些步骤分割成多个子步骤、并且可以以不同于在本文中所示出的顺序来执行某些步骤。可以由任何电路结构和/或有形机制(例如,由在计算机设备上运行的软件、硬件(例如,处理器或芯片实现的逻辑功能)等、和/或其任何组合)以任何方式来实现在流程图中所示出的各个步骤。
基于相同的技术构思,本发明实施例还提供一种基于分布式图嵌入的联邦图聚类装置,用于执行上述任一实施例所提供的基于分布式图嵌入的联邦图聚类方法。图6为本发明实施例提供的一种基于分布式图嵌入的联邦图聚类装置结构示意图。
如图6所示,装置600包括:
构建模块601,用于基于第一方数据构建第一图,基于第二方数据构建第二图;
关联模块602,用于对第一方数据和第二方数据进行加密求交,确定第一图和第二图中的公共节点,根据公共节点关联第一图和第二图,得到联邦图;
学习模块603,用于利用基于随机游走的分布式图嵌入算法学习联邦图,确定从第一图出发的第一图嵌入向量[PiA,PiB]和从第二图出发的第二图嵌入向量[PiA’,PiB’],其中,PiA和PiA’为第一图的各个第一图节点的嵌入向量,PiB和PiB’为第二图的各个第二图节点的嵌入向量;
聚类模块604,用于基于联邦聚类方法对联邦图的第一图嵌入向量[PiA,PiB]和第二图嵌入向量[PiA’,PiB’]进行聚类分析,得到聚类结果。
需要说明的是,本申请实施例中的装置可以实现前述方法的实施例的各个过程,并达到相同的效果和功能,这里不再赘述。
图7为根据本申请一实施例的基于分布式图嵌入的联邦图聚类装置,用于执行图2所示出的基于分布式图嵌入的联邦图聚类方法,该装置包括:至少一个处理器;以及,与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行上述实施例的方法。
根据本申请的一些实施例,提供了基于分布式图嵌入的联邦图聚类方法的非易失性计算机存储介质,其上存储有计算机可执行指令,该计算机可执行指令设置为在由处理器运行时执行:上述实施例所述的方法。
本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置、设备和计算机可读存储介质实施例而言,由于其基本相似于方法实施例,所以其描述进行了简化,相关之处可参见方法实施例的部分说明即可。
本申请实施例提供的装置、设备和计算机可读存储介质与方法是一一对应的,因此,装置、设备和计算机可读存储介质也具有与其对应的方法类似的有益技术效果,由于上面已经对方法的有益技术效果进行了详细说明,因此,这里不再赘述装置、设备和计算机可读存储介质的有益技术效果。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。此外,尽管在附图中以特定顺序描述了本发明方法的操作,但是,这并非要求或者暗示必须按照该特定顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。附加地或备选地,可以省略某些步骤,将多个步骤合并为一个步骤执行,和/或将一个步骤分解为多个步骤执行。
虽然已经参考若干具体实施方式描述了本发明的精神和原理,但是应该理解,本发明并不限于所公开的具体实施方式,对各方面的划分也不意味着这些方面中的特征不能组合以进行受益,这种划分仅是为了表述的方便。本发明旨在涵盖所附权利要求的精神和范围内所包括的各种修改和等同布置。

Claims (20)

  1. 一种基于分布式图嵌入的联邦图聚类方法,包括:
    基于第一方数据构建第一图,基于第二方数据构建第二图;
    对所述第一方数据和所述第二方数据进行加密求交,确定所述第一图和所述第二图中的公共节点,根据所述公共节点关联所述第一图和所述第二图,得到联邦图;
    利用基于随机游走的分布式图嵌入算法学习所述联邦图,确定从所述第一图出发的第一图嵌入向量[PiA,PiB]和从所述第二图出发的第二图嵌入向量[PiA’,PiB’],其中,所述PiA和所述PiA’为所述第一图的各个第一图节点的嵌入向量,所述PiB和所述PiB’为所述第二图的各个第二图节点的嵌入向量;
    基于联邦聚类方法对所述联邦图的所述第一图嵌入向量[PiA,PiB]和所述第二图嵌入向量[PiA’,PiB’]进行聚类分析,得到聚类结果。
  2. 根据权利要求1所述的方法,其中,确定所述第一图嵌入向量[PiA,PiB]和所述第二图嵌入向量[PiA’,PiB’],包括:
    以所述第一图节点为起始节点在所述联邦图上进行多次随机游走,第一方根据所述第一图上的游走路径确定所述PiA,第二方根据所述第二图上的匹配游走路径确定所述PiB;
    以所述第二图节点为起始节点在所述联邦图上进行多次随机游走,所述第二方根据所述第二图上的游走路径确定所述PiB’,所述第一方根据所述第一图上的匹配游走路径确定所述PiA’。
  3. 根据权利要求1或2所述的方法,其中,根据所述公共节点关联所述第一图和所述第二图得到联邦图,还包括:
    剔除所述第一图和所述第二图中与所述公共节点无直接或间接的关联关系的孤岛节点,得到所述联邦图。
  4. 根据权利要求1-3中任一项所述的方法,其中,所述第一方数据和所述第二方数据相互隔离。
  5. 根据权利要求1-4中任一项所述的方法,其中,
    所述第一图的节点为第一方用户和/或第一方商户,所述第一图的边根据所述第一图节点之间的关联关系确定;
    所述第二图的节点为第二方用户和/或第二方商户,所述第二图的边根据所述第二图节点之间的关联关系确定。
  6. 根据权利要求1-5中任一项所述的方法,其中,对所述第一方数据和所述第二方数据进行加密求交,确定所述第一图网络和所述第二图网络中的公共节点,包括:
    根据商户和/或用户的属性信息,对齐所述第一图网络和所述第二图网络中的公共节点。
  7. 根据权利要求2-6中任一项所述的方法,其中,以所述第一图节点为起始节点在所述联邦图上进行多次所述随机游走,所述第一方根据所述第一图上的游走路径确定所述PiA,包括:
    定义随机游走步数M,所述第一方以任意一个所述第一图节点为起始节点在所述第一图上进行随机游走,当游走至任意一个公共节点时停止游走,并记录第一图游走步数Mia、游走到的所述公共节点的标识Vab i和本次游走经过的各第一图节点;
    在随机游走X次以后,统计每次游走的所述第一图游走步数Mia以及各第一图节点被游走到的频数,得到对应于各个所述第一图游走步数Mia的第一图节点频数矩阵;
    将对应于各个所述第一图游走步数Mia的所述第一图节点频数矩阵进行矩阵累加计算,并除以随机游走次数X,得到所述第一图嵌入向量的第一图部分PiA。
  8. 根据权利要求7所述的方法,其中,所述第二方根据所述第二图上的匹配游走路径确定所述PiB,包括:
    在所述X次随机游走过程之中或之后,所述第一方将游走到的每个公共节点的标识Vab i和对应的全部第一图游走步数Mia发送给第二方;
    所述第二方确定从每个所述公共节点Vab i出发在所述第二图游走对应的第二图游走步数Mib的图嵌入向量PiB_Vab i,其中M=Mib+Mia;
    将全部的所述公共节点Vab i对应的图嵌入向量PiB_Vab i进行累加,并除以子游走次数X 1,得到所述第一图嵌入向量的第二图部分PiB;
    其中,所述子游走次数X 1是指所述X次随机游走过程中游走至所述第二图的次数。
  9. 根据权利要求7或8所述的方法,其中,对应于各个所述第一图游走步数Mia的所述第一图节点频数矩阵包括:
    PA_Mia=[Pa n1_Mia,n1=1,2,…,Na];
    其中,所述第一图包括Na个节点Pa n1;所述Mia取值为所述起始节点距离所述公共节点的最小步数m与总步数M之间的整数;所述Pa n_Mia为从起始节点随机游走X次Mia步之后经过所述第一图的节点Pa n的次数。
  10. 根据权利要求9所述的方法,其中,利用以下公式计算得到所述PiA:
    Figure PCTCN2022117418-appb-100001
  11. 根据权利要求8所述的方法,其中,利用以下公式计算得到所述PiB:
    Figure PCTCN2022117418-appb-100002
  12. 根据权利要求2-11中任一项所述的方法,其中,以所述第二图节点为起始节点在所述联邦图上进行多次所述随机游走,所述第二方根据所述第二图上的游走路径确定所述PIB’,包括:
    定义随机游走步数M’,所述第二方以任意一个所述第二图节点为起始节点在所述第二图上进行随机游走,当游走至任意一个公共节点时停止游走,并记录第二图游走步数Mia’、游走到的所述公共节点的标识Vab i和本次游走经过的各第二图节点;
    在随机游走X’次以后,统计每次游走的所述第二图游走步数Mia’以及各第二图节点被游走到的频数,得到对应于各个所述第二图游走步数Mia’的第二图节点频数矩阵;
    将对应于各个所述第二图游走步数Mia’的所述第二图节点频数矩阵进行矩阵累加计算,并除以随机游走次数X’,得到所述第二图嵌入向量的第二图部分PIB’。
  13. 根据权利要求12所述的方法,其中,所述第一方根据所述第一图上的匹配游走路径确定所述PiA’,包括:
    在所述X’次随机游走过程之中或之后,所述第二方将游走到的每个公共节点的标识Vab i和对应的全部第二图游走步数Mia’发送给第一方;
    所述第一方确定从每个所述公共节点Vab i出发在所述第一图游走对应的(M’-Mia’)步的图嵌入向量PiA’_Vab i
    将全部的所述公共节点Vab i对应的图嵌入向量PiA’_Vab i进行累加,并除以子游走次数X′ 1,得到所述第二图嵌入向量的第一图部分PiA’;
    其中,所述子游走次数X′ 1是指所述X’次随机游走过程中游走至所述第一图的次数。
  14. 根据权利要求12或13所述的方法,其中,对应于各个所述游走步数Mia’的所述第二图节点频数矩阵包括:
    PB_Mia’=[Pb n2_Mia′,n2=1,2,…,Nb];
    其中,所述第二图包括Nb个节点Pb n2;所述Mia’取值为所述起始节点距离所述公共节点的最小步数m’与总步数M’之间的整数;所述Pb 2_Mia′为从起始节点随机游走X’次Mia’步之后经过所述第二图节点Pb n2的次数。
  15. 根据权利要求14所述的方法,其中,利用以下公式计算得到所述PIB’:
    Figure PCTCN2022117418-appb-100003
  16. 根据权利要求13所述的方法,其中,利用以下公式计算得到所述PiA’:
    Figure PCTCN2022117418-appb-100004
  17. 根据权利要求1-16中任一项所述的方法,其中,基于联邦聚类方法对所述联邦图的所述第一图嵌入向量[PiA,PiB]和所述第二图嵌入向量[PiA’,PiB’]进行聚类分析,包括:
    基于联邦聚类方法对所述第一图嵌入向量的第一图部分PiA和所述第二图嵌入向量的第一图部分PiA’进行聚类分析,获得所述联邦图的所述第一图部分的第一聚类簇;
    基于联邦聚类方法对所述第一图嵌入向量的第二图部分PiB和所述第二图嵌入向量的第二图部分PiB’进行聚类分析,获得所述联邦图的所述第一图部分的第二聚类簇;
    基于所述第一聚类簇和所述第二聚类簇筛选跨图聚类簇,获得聚类程度更高的目标聚类簇。
  18. 一种基于分布式图嵌入的联邦图聚类装置,包括:
    构建模块,用于基于第一方数据构建第一图,基于第二方数据构建第二图;
    关联模块,用于对所述第一方数据和所述第二方数据进行加密求交,确定所述第一图和所述第二图中的公共节点,根据所述公共节点关联所述第一图和所述第二图,得到联邦图;
    学习模块,用于利用基于随机游走的分布式图嵌入算法学习所述联邦图,确定从所述第一图出发的第一图嵌入向量[PiA,PiB]和从所述第二图出发的第二图嵌入向量[PiA’,PiB’],其中,所述PiA和所述PiA’为所述第一图的各个第一图节点的嵌入向量,所述PiB和所述PiB’为所述第二图的各个第二图节点的嵌入向量;
    聚类模块,用于基于联邦聚类方法对所述联邦图的所述第一图嵌入向量[PiA,PiB]和所述第二图嵌入向量[PiA’,PiB’]进行聚类分析,得到聚类结果。
  19. 一种基于分布式图嵌入的联邦图聚类装置,包括:
    至少一个处理器;以及,与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行:如权利要求1-17任一项所述的方法。
  20. 一种计算机可读存储介质,所述计算机可读存储介质存储有程序,当所述程序被多核处理器执行时,使得所述多核处理器执行如权利要求1-17中任一项所述的方法。
PCT/CN2022/117418 2022-01-28 2022-09-07 基于分布式图嵌入的联邦图聚类方法、装置及可读存储介质 WO2023142490A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210106101.1A CN114492647A (zh) 2022-01-28 2022-01-28 基于分布式图嵌入的联邦图聚类方法、装置及可读存储介质
CN202210106101.1 2022-01-28

Publications (1)

Publication Number Publication Date
WO2023142490A1 true WO2023142490A1 (zh) 2023-08-03

Family

ID=81477204

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/117418 WO2023142490A1 (zh) 2022-01-28 2022-09-07 基于分布式图嵌入的联邦图聚类方法、装置及可读存储介质

Country Status (2)

Country Link
CN (1) CN114492647A (zh)
WO (1) WO2023142490A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114492647A (zh) * 2022-01-28 2022-05-13 中国银联股份有限公司 基于分布式图嵌入的联邦图聚类方法、装置及可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200342006A1 (en) * 2019-04-29 2020-10-29 Adobe Inc. Higher-Order Graph Clustering
CN112182399A (zh) * 2020-10-16 2021-01-05 中国银联股份有限公司 一种联邦学习的多方安全计算方法及装置
CN112200263A (zh) * 2020-10-22 2021-01-08 国网山东省电力公司电力科学研究院 一种应用于配电物联网的自组织联邦聚类方法
CN113076422A (zh) * 2021-04-15 2021-07-06 国家计算机网络与信息安全管理中心 一种基于联邦图神经网络的多语种社交事件检测方法
CN113298267A (zh) * 2021-06-10 2021-08-24 浙江工业大学 一种基于节点嵌入差异检测的垂直联邦模型防御方法
CN114492647A (zh) * 2022-01-28 2022-05-13 中国银联股份有限公司 基于分布式图嵌入的联邦图聚类方法、装置及可读存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090006457A1 (en) * 2007-02-16 2009-01-01 Stivoric John M Lifeotypes
US20180052906A1 (en) * 2016-08-22 2018-02-22 Facebook, Inc. Systems and methods for recommending content items
CN111553470B (zh) * 2020-07-10 2020-10-27 成都数联铭品科技有限公司 适用于联邦学习的信息交互系统及方法
CN112288094B (zh) * 2020-10-09 2022-05-17 武汉大学 联邦网络表示学习方法及系统
CN112181971B (zh) * 2020-10-27 2022-11-01 华侨大学 一种基于边缘的联邦学习模型清洗和设备聚类方法、系统
CN113923225A (zh) * 2020-11-16 2022-01-11 京东科技控股股份有限公司 基于分布式架构的联邦学习平台、方法、设备和存储介质
CN112101579B (zh) * 2020-11-18 2021-02-09 杭州趣链科技有限公司 基于联邦学习的机器学习方法、电子装置和存储介质
CN113469373B (zh) * 2021-08-17 2023-06-30 北京神州新桥科技有限公司 基于联邦学习的模型训练方法、系统、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200342006A1 (en) * 2019-04-29 2020-10-29 Adobe Inc. Higher-Order Graph Clustering
CN112182399A (zh) * 2020-10-16 2021-01-05 中国银联股份有限公司 一种联邦学习的多方安全计算方法及装置
CN112200263A (zh) * 2020-10-22 2021-01-08 国网山东省电力公司电力科学研究院 一种应用于配电物联网的自组织联邦聚类方法
CN113076422A (zh) * 2021-04-15 2021-07-06 国家计算机网络与信息安全管理中心 一种基于联邦图神经网络的多语种社交事件检测方法
CN113298267A (zh) * 2021-06-10 2021-08-24 浙江工业大学 一种基于节点嵌入差异检测的垂直联邦模型防御方法
CN114492647A (zh) * 2022-01-28 2022-05-13 中国银联股份有限公司 基于分布式图嵌入的联邦图聚类方法、装置及可读存储介质

Also Published As

Publication number Publication date
CN114492647A (zh) 2022-05-13

Similar Documents

Publication Publication Date Title
Li et al. Practical federated gradient boosting decision trees
US11604787B2 (en) Method of generating globally verifiable unique identifiers using a scalable interlinked blockchain structure
US11875400B2 (en) Systems, methods, and apparatuses for dynamically assigning nodes to a group within blockchains based on transaction type and node intelligence using distributed ledger technology (DLT)
US11126659B2 (en) System and method for providing a graph protocol for forming a decentralized and distributed graph database
CN111461874A (zh) 一种基于联邦模式的信贷风险控制系统及方法
Chawathe Clustering blockchain data
CN111046237B (zh) 用户行为数据处理方法、装置、电子设备及可读介质
CN104820708B (zh) 一种基于云计算平台的大数据聚类方法和装置
Vijayan et al. Alignment of dynamic networks
CN104077723B (zh) 一种社交网络推荐系统及方法
WO2020151321A1 (zh) 基于图计算技术的理赔反欺诈方法、装置、设备及存储介质
WO2023142490A1 (zh) 基于分布式图嵌入的联邦图聚类方法、装置及可读存储介质
WO2022237175A1 (zh) 图数据的处理方法、装置、设备、存储介质及程序产品
Lanciano et al. A survey on the densest subgraph problem and its variants
Kumar et al. RETRACTED ARTICLE: Big data analytics to identify illegal activities on Bitcoin Blockchain for IoMT
Keller et al. Balancing quality and efficiency in private clustering with affinity propagation
Rahmadika et al. Reliable collaborative learning with commensurate incentive schemes
US20150358165A1 (en) Method and arrangement for distributed realisation of token set management and recommendation system with clustering
WO2021217933A1 (zh) 同质网络的社群划分方法、装置、计算机设备和存储介质
CN111414406B (zh) 一种用于识别不同渠道事务中的相同用户的方法和系统
KR20200105379A (ko) 전문가에 의해 생성되는 프로젝트 결과물을 블록체인에 저장된 빅데이터 기반으로 관리하는 방법 및 시스템
Zhao et al. Pigeonhole design: Balancing sequential experiments from an online matching perspective
CN111951057A (zh) 一种基于以太坊智能合约平台的广告推荐方法和系统
KR20200105378A (ko) 전문가에 의해 생성되는 프로젝트 결과물을 빅데이터 기반으로 관리하여 가상화폐를 제공하기 위한 방법 및 시스템
Xue et al. Bitcoin transaction pattern recognition based on semi-supervised learning