WO2022237175A1 - Graph data processing method and apparatus, device, storage medium, and program product - Google Patents

Graph data processing method and apparatus, device, storage medium, and program product Download PDF

Info

Publication number
WO2022237175A1
WO2022237175A1 PCT/CN2021/140229 CN2021140229W WO2022237175A1 WO 2022237175 A1 WO2022237175 A1 WO 2022237175A1 CN 2021140229 W CN2021140229 W CN 2021140229W WO 2022237175 A1 WO2022237175 A1 WO 2022237175A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
aggregation result
subgraph
participant
aggregation
Prior art date
Application number
PCT/CN2021/140229
Other languages
French (fr)
Chinese (zh)
Inventor
吴子凡
张潮宇
陈天健
杨强
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2022237175A1 publication Critical patent/WO2022237175A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption

Definitions

  • the present application relates to the technical field of data processing, and in particular to a method, device, equipment, storage medium, and program product for processing graph data.
  • the graph neural network gradually replaces the traditional artificially designed graph features to extract the hidden value behind the graph data and make related recognition predictions. deal with. For example, based on the graph data constructed by financial institutions, it is possible to identify whether users have overdue risks, etc.
  • the main purpose of the present application is to provide a processing method, device, equipment, storage medium and program product for graph data, aiming at improving the accuracy of prediction for graph data.
  • the present application provides a method for processing graph data, the graph data includes a first subgraph and a second subgraph, the first subgraph includes nodes belonging to a first participant, and the second The subgraph includes nodes belonging to a second party; the method is applied to the first party, the method comprising:
  • traverse the nodes in the first subgraph and perform the following operations for each traversed node:
  • the node has a connection relationship with the nodes in the second subgraph, then determine the final aggregation result of the node according to the first aggregation result and the second aggregation result; wherein the second aggregation result is the The second participant is determined according to the eigenvectors of the last round of iterative process of the neighbor nodes of the node in the second subgraph;
  • the eigenvector of the current round of iterative process of the node is determined; after the number of iterations meets the requirement, the eigenvector of the last round of iterative process is used to calculate the prediction result corresponding to the node.
  • determining the final aggregation result of the node according to the first aggregation result and the second aggregation result includes:
  • the node has a connection relationship with a node in the second subgraph, send request information to the second participant, where the request information is used to request the second participant to calculate the first node corresponding to the node two aggregation results and encrypting said second aggregation result;
  • the encrypted second aggregation result is a second aggregation result encrypted with a public key; according to the encrypted second aggregation result, determining the final aggregation result of the node includes:
  • searching for neighbor nodes of the node in the first subgraph, and performing an aggregation operation according to the feature vectors of the last iteration process of the neighbor nodes including:
  • the first aggregation result is calculated according to the eigenvectors of the last round of iterative process of the selected neighbor nodes.
  • the method also includes:
  • connection relationship of the nodes in the first subgraph constructs the connection relationship of the nodes in the first subgraph, and the connection relationship is used to determine the neighbor nodes;
  • the method further includes:
  • the method also includes:
  • the eigenvector of the last iterative process used in the first iterative process is the initial eigenvector.
  • determining the business risk information of the user account corresponding to the node includes:
  • determining the final aggregation result of the node according to the first aggregation result and the second aggregation result includes:
  • a final aggregation result is determined through a nonlinear algorithm according to the first aggregation result and the second aggregation result.
  • the graph data is used to implement social behavior analysis
  • the nodes in the graph data are used to represent users
  • the preset association relationships include family relationships and employment relationships
  • the preset association relationships are used to determine neighbor nodes.
  • the present application also provides a graph data processing device, the graph data includes a first subgraph and a second subgraph, the first subgraph includes nodes belonging to the first participant, and the second subgraph includes nodes belonging to A node of a second participant; the device is applied to the first participant, the device comprising:
  • An execution module configured to traverse the nodes in the first subgraph during any iteration, and perform the following operations for each traversed node;
  • a search module configured to search for neighbor nodes of the node in the first subgraph, perform an aggregation operation according to the feature vectors of the last round of iteration process of the neighbor nodes, and obtain a first aggregation result;
  • An aggregation module configured to determine the final aggregation result of the node according to the first aggregation result and the second aggregation result when the node has a connection relationship with the nodes in the second sub-graph; wherein, the first The second aggregation result is determined by the second participant according to the eigenvectors of the last iteration process of the neighbor nodes of the node in the second subgraph;
  • the determination module is configured to determine the eigenvector of the current round of iterative process of the node according to the final aggregation result; after the number of iterations meets the requirement, the eigenvector of the last round of iterative process is used to calculate the prediction result corresponding to the node.
  • the aggregation module is specifically used for:
  • the node has a connection relationship with a node in the second subgraph, send request information to the second participant, where the request information is used to request the second participant to calculate the first node corresponding to the node two aggregation results and encrypting said second aggregation result;
  • the encrypted second aggregation result is a second aggregation result encrypted with a public key; the aggregation module determines the final aggregation result of the node according to the encrypted second aggregation result , specifically for:
  • search module is specifically used for:
  • the first aggregation result is calculated according to the eigenvectors of the last round of iterative process of the selected neighbor nodes.
  • the execution module is also used for:
  • connection relationship of the nodes in the first subgraph constructs the connection relationship of the nodes in the first subgraph, and the connection relationship is used to determine the neighbor nodes;
  • the executing module is also used for:
  • the execution module is also used for:
  • the eigenvector of the last iterative process used in the first iterative process is the initial eigenvector.
  • the execution module determines the business risk information of the user account corresponding to the node, it is specifically used to:
  • the aggregation module determines the final aggregation result of the node according to the first aggregation result and the second aggregation result, it is specifically used to:
  • a final aggregation result is determined through a nonlinear algorithm according to the first aggregation result and the second aggregation result.
  • the present application also provides a graph data processing device, the graph data processing device includes: a memory, a processor, and a graph data processing program stored in the memory and operable on the processor, the When the graph data processing program is executed by the processor, the steps of the graph data processing method described in any one of the preceding items are realized.
  • the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a processing program for graph data, and when the processing program for graph data is executed by a processor, the graph as described in any one of the preceding items is realized.
  • the steps of the data processing method are not limited to:
  • the present application also provides a computer program product, including a computer program.
  • a computer program product including a computer program.
  • the computer program is executed by a processor, the method described in any one of the preceding items is implemented.
  • graph data including a first subgraph including nodes belonging to a first participant and a second subgraph including nodes belonging to a second participant may be processed.
  • node in any round of iteration process, the first participant can traverse the nodes in the first subgraph, and for each node traversed, find the neighbor nodes of the node in the first subgraph, Perform an aggregation operation according to the eigenvectors of the last iteration of the neighbor node to obtain a first aggregation result, if the node has a connection relationship with a node in the second sub-graph, then according to the first aggregation result and the second aggregation result to determine the final aggregation result of the node, wherein the second aggregation result is the last round of iteration process of the second participant according to the neighbor nodes of the node in the second subgraph Determined by the eigenvector of the node, according to the final aggregation result, determine the eigenvector of the current round
  • FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application
  • Fig. 2 is a schematic diagram of a kind of graph data provided by the embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a method for processing graph data provided in an embodiment of the present application
  • FIG. 4 is a system architecture diagram of graph data processing provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of another graph data processing method provided by the embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a graph data processing device provided in an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a graph data processing device provided by an embodiment of the present application.
  • Association analysis is an important class of analysis methods. Through such methods, operators can easily model and analyze the relationship between entities, and determine whether there is a specific connection form or important node in the network.
  • Traditional analysis methods mainly use artificially designed graph features as the analysis object, such as PageRank (web page ranking), centrality and so on.
  • PageRank web page ranking
  • centrality centrality
  • graph neural networks have gradually replaced traditional artificially designed graph features to extract the value hidden behind graph data.
  • graph neural network needs to rely on graph data, but in actual operation, graph data often involves multiple institutions. Subject to the relevant requirements of data privacy, these data cannot be collected to form an effective network to obtain more accurate prediction results.
  • FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • Institution 1 and Institution 2 are both banks, and both Institution 1 and Institution 2 have multiple user accounts.
  • Institution 1 includes User Account A, User Account B, User Account C, etc.
  • Institution 2 includes User Account Account D, user account E, user account F, etc.
  • the user account number may specifically be a bank card number or the like. Connections between different user accounts represent transfer records between them.
  • graph data can be constructed, because the graph data contains the user's basic attribute information and related financial characteristics, so the overdue risk prediction of user accounts can be performed based on the graph data Or identify abnormal accounts, etc., to meet monitoring requirements.
  • the user account in organization 1 will not only have a transfer relationship with other user accounts in the organization, but may also have a transfer relationship with the user account in organization 2.
  • user account B in organization 1 has a transfer relationship with user account A and user account C of the organization, and also has a transfer relationship with user account E in organization 2.
  • institution 1 cannot obtain the detailed information of user account E in institution 2, and can only analyze and process user account B based on the detailed information of user accounts A and C of its own institution, thus losing The valid information of user account B in other institutions leads to poor accuracy of prediction and identification.
  • the embodiment of the present application provides a method for processing graph data, which can process graph data in cooperation with different participants.
  • the graph data may comprise a first subgraph comprising nodes belonging to a first party and a second subgraph comprising nodes belonging to a second party.
  • the first participant and the second participant can both be financial institutions such as banks
  • the nodes in the graph data can be used to represent user accounts, and the connection relationship between nodes can be represented by Indicates the transfer relationship between user accounts.
  • the first participant can perform an aggregation operation according to the neighbor nodes of the node in the first sub-graph to obtain the first aggregation result; the second participant can obtain the first aggregation result according to the node Neighboring nodes in the second subgraph perform an aggregation operation to obtain a second aggregation result; according to the first aggregation result and the second aggregation result, a final aggregation result corresponding to the node can be obtained; performing subsequent processing according to the final aggregation result, The prediction results corresponding to the nodes can be obtained.
  • Fig. 2 is a schematic diagram of a kind of graph data provided by the embodiment of the present application.
  • the graph data G may include a first subgraph G1 and a second subgraph G2, the small circles in the figure represent nodes, and the first subgraph G1 includes nodes v11, v12, v13, v14 belonging to the first participant , v15, the second subgraph G2 includes nodes v21, v22, v23, and v24 belonging to the second participant, each node represents a user account, and the connection lines in the figure show the connection relationship between the nodes.
  • nodes in graph data there can be multiple rounds of iterative process. In each round of iteration, all nodes can be traversed, and for any node, data from multiple parties can be combined for analysis and processing.
  • the first participant can perform an aggregation operation based on its neighbor nodes in the first subgraph G1, that is, nodes v11 and v14, to obtain the first aggregation result
  • the second participant can obtain the first aggregation result based on its neighbor nodes in the first subgraph G1.
  • Neighbor nodes in the second subgraph G2, ie, nodes v21, v22, and v23 perform an aggregation operation to obtain a second aggregation result.
  • the final aggregation result corresponding to node v12 can be obtained.
  • the feature vector of node v12 in this round can be determined, and the feature vector of this round is used for the next round of aggregation operate. After the number of iterations meets the requirements, the final feature vector can be used to calculate the prediction result, such as whether node v12 has an overdue risk, etc.
  • the data of each participant does not leave the local area, and it is possible to jointly predict graph data with multiple parties while ensuring data security.
  • the aggregation result of the first participant on the neighbor nodes of the first subgraph and the aggregation result of the second participant on the neighbor nodes of the second subgraph are integrated during the node analysis process, it can be more comprehensive , Accurately reflect the transfer relationship of the nodes, so as to extract the fund flow characteristics of the nodes more accurately, and effectively improve the accuracy of predictions on the basis of ensuring data security.
  • FIG. 3 is a schematic flowchart of a method for processing graph data provided by an embodiment of the present application.
  • the graph data includes a first subgraph comprising nodes belonging to a first party and a second subgraph comprising nodes belonging to a second party.
  • the method provided in this embodiment may be applied to the first participant.
  • the first participant may process the graph data through multiple rounds of iterative process. As shown in Figure 3, in any round of iteration, the nodes in the first subgraph can be traversed, and for each node traversed, the following operations are performed:
  • Step 301 Find the neighbor nodes of the currently traversed node in the first subgraph, perform an aggregation operation according to the eigenvectors of the last iteration process of the neighbor nodes, and obtain a first aggregation result.
  • the neighbor nodes of the current node in the first subgraph where the neighbor nodes can be nodes that have a direct connection relationship with the current node , according to the eigenvectors of neighbor nodes in the k-1th round, the first aggregation result of the k-th round of the current node can be determined.
  • each node in the graph data may be initialized to determine the initial feature vector of each node.
  • the feature vector of a node may be any information that can characterize the feature of the node.
  • the node may be assigned a value according to the attribute information of each node in the graph data to obtain a corresponding initial feature vector.
  • the first participant can traverse each node in the first subgraph, and perform an aggregation operation according to the initial feature vectors corresponding to the neighbor nodes of the current node to obtain the first aggregation result of the current node.
  • Step 302 If the node has a connection relationship with the nodes in the second sub-graph, determine the final aggregation result of the node according to the first aggregation result and the second aggregation result.
  • the second aggregation result is determined by the second participant according to the eigenvectors of the node's neighbor nodes in the second subgraph in the last round of iterative process.
  • judging whether any two nodes have a connection relationship can be achieved in the following manner: judging whether the two nodes have a preset association relationship, wherein the preset association relationship can be set according to actual needs, there are Two nodes with a preset association relationship can be regarded as having a connection relationship in graph data. The connection relationship can be used to determine neighbor nodes.
  • the preset association relationship may be a transfer relationship.
  • the nodes in the graph data can be used to represent user accounts, and the connection relationship between nodes can be used to represent the transfer relationship between user accounts; if a certain node in the first sub-graph and a certain node in the second sub-graph If there is a transfer record between them, it can be considered that the two have a connection relationship.
  • nodes in graph data may be used to represent users, and the preset association relationship may be family relationship, employment relationship, and the like.
  • the second participant can correspond to the current node according to the neighbor nodes in the second subgraph.
  • the initial eigenvector of perform an aggregation operation, and obtain the second aggregation result of the current node.
  • the first participant may determine the final aggregation result of the first round of iteration process of the current node according to the first aggregation result and the second aggregation result of the first round of iteration process of the current node.
  • the final aggregation result may be the sum of the first aggregation result and the second aggregation result.
  • the final aggregation result can be determined through a nonlinear algorithm, such as a log function, an exponential function, taking a maximum value, taking a minimum value, etc., so that the result With higher nonlinearity, it can fit more complex situations.
  • a nonlinear algorithm such as a log function, an exponential function, taking a maximum value, taking a minimum value, etc.
  • Step 303 determine the feature vector of the current round of iteration process of the node.
  • the feature vector of the first round of iteration process of the current node may be determined according to the final aggregation result. After traversing all nodes of the first subgraph, the feature vectors of the first round of iterative process of all nodes are obtained.
  • the second round of iteration according to the feature vector of the first round of iteration, repeat the above steps to obtain the feature vector of the second round of iteration, and so on, until the feature vector of the last round of iteration is obtained.
  • the eigenvectors of the last round of iterative process can be used to calculate the prediction results corresponding to the nodes.
  • the feature vector of the last round of iterative process of the node can be input to the predictor, or input to the Sigmoid function to obtain the corresponding prediction result.
  • the first participant and the second participant can respectively construct the first subgraph and the second subgraph in the graph data, and the nodes in the first subgraph and the nodes in the second subgraph can have a connection relationship .
  • the first participant can process the node in the first subgraph, calculate the aggregation result of the neighbor nodes of the node in the first subgraph, and obtain the neighbor nodes of the node in the second subgraph from the second participant , and determine the final aggregation result according to the aggregation results corresponding to each subgraph.
  • the second participant can also use a similar method to process the second subgraph, traverse the nodes in the second subgraph, calculate the aggregation result of the node's neighbor nodes in the second subgraph, and at the same time from the first The participant obtains the aggregation result of the neighbor nodes of the node in the first subgraph, and determines the final aggregation result according to the aggregation results corresponding to each subgraph.
  • the results corresponding to all the nodes in the graph data can be obtained, and the processing of the graph data can be completed without the original data of the nodes leaving the local area.
  • the number of second participants in the embodiment of the present application may be multiple, and correspondingly, the number of second subgraphs may also be multiple, and each second participant may The neighbor node of the arrived node in the corresponding second subgraph calculates the corresponding second aggregation result and sends it to the first participant, and the first participant processes it according to the first aggregation result and multiple second aggregation results to obtain The final aggregation result.
  • the graph data processing method provided in this embodiment can process graph data including a first subgraph and a second subgraph, the first subgraph includes nodes belonging to the first participant, and the second subgraph Including nodes belonging to the second participant, in any iteration process, the first participant can traverse the nodes in the first subgraph, and for each node traversed, search for the nodes in the first subgraph
  • the neighbor nodes in the sub-graph are aggregated according to the eigenvectors of the last iteration process of the neighbor nodes to obtain the first aggregation result.
  • the node has a connection relationship with the nodes in the second sub-graph, then Determine the final aggregated result of the node according to the first aggregated result and the second aggregated result, wherein the second aggregated result is based on the neighbors of the node in the second subgraph by the second participant
  • the eigenvector of the previous round of iteration process of the node is determined, and according to the final aggregation result, the eigenvector of the current round of iteration process of the node is determined, and after the number of iterations meets the requirements, the eigenvector of the last round of iteration process is used for Calculate the prediction results corresponding to the nodes, because in the process of analyzing the nodes, the aggregation results of the first participant on the neighbor nodes of the first subgraph and the aggregation results of the second participant on the neighbor nodes of the second subgraph are integrated, In this way, the characteristics of nodes can be extracted more comprehensively and accurately, and in the case of barriers to data interoperability
  • a SecureAggregate (secure aggregation) function may be used to perform a secure aggregation operation on the first aggregation result and the second aggregation result to obtain a final aggregation result.
  • the implementation principle of the SecureAggregate function is described below.
  • determining the final aggregation result of the node according to the first aggregation result and the second aggregation result may include: if the node has a connection relationship with a node in the second subgraph, then send request information to the second participant, where the request information is used to request the second participant to calculate the second aggregation result corresponding to the node and Encrypting the second aggregation result; receiving the encrypted second aggregation result sent by the second participant; and determining the final aggregation result of the node according to the encrypted second aggregation result.
  • the request information is sent to the second subgraph to request the second participant to process it, which can save unnecessary
  • the amount of calculation can effectively improve the efficiency of graph data processing.
  • the encrypted aggregation result can be transmitted between the first participant and the second participant, which effectively improves the security of data transmission.
  • the first participant may perform processing according to the encrypted second aggregation result to obtain a final aggregation result.
  • a third party may also be introduced to process the aggregation result.
  • the encrypted second aggregation result is a second aggregation result encrypted with a public key.
  • determining the final aggregation result of the node may include: using the public key to encrypt the first aggregation result; based on the random mask, encrypting the encrypted first aggregation result The result is calculated with the encrypted second aggregated result to obtain the encrypted final aggregated result; the encrypted final aggregated result is sent to the third participant, so that the third participant uses the private key to pair The encrypted final aggregated result is decrypted; the decrypted result sent by the third participant is received, and a random masking operation is performed on the decrypted result to obtain the final aggregated result.
  • FIG. 4 is a system architecture diagram of graph data processing provided by an embodiment of the present application.
  • the first participant processes the first subgraph
  • the second participant processes the second subgraph
  • the third party holds the keys, which may include public keys and private keys.
  • the calculated results can be encrypted using homomorphic encryption.
  • the public key used for encryption is sent by the third party to the first party and the second party, and the private key is held by the third party alone. Have.
  • the first participant uses the public key to encrypt the first aggregation result
  • the second participant uses the public key to encrypt the second aggregation result
  • the second participant encrypts the encrypted
  • the second aggregation result of is sent to the first participant, and the first participant calculates the sum of the two and sends it to the third participant after adding a random mask.
  • the third participant receives the data from the first participant, it decrypts it with its own private key, and sends it to the first participant for de-random masking to obtain the final aggregation result of the node.
  • the first participant uses the public key to encrypt the first aggregation result
  • the second participant uses the public key to encrypt the second aggregation result
  • the first participant The encrypted first aggregation result is sent to the second participant, and the second participant calculates the sum of the two and sends it to the third participant after adding a random mask.
  • the third participant receives the data from the second participant, it decrypts it with its own private key, and sends the decrypted result to the second participant, and the second participant removes the random mask from the result and obtains The final aggregation result of the node.
  • FIG. 5 is a schematic flowchart of another graph data processing method provided by the embodiment of the present application. This embodiment provides a specific implementation scheme for joint processing of graph data by the first participant and the second participant. As shown in Figure 5, the method may include:
  • Step 501 Construct graph data according to the user accounts and transfer records of the first participant and the second participant.
  • the graph data includes a first subgraph and a second subgraph, which are respectively constructed by the first participant and the second participant and initialize the nodes.
  • the first participant may construct and initialize the first subgraph through the following methods: construct nodes in the first subgraph according to user accounts belonging to the first participant; The transfer record of the user account of the participant constructs the connection relationship of the nodes in the first sub-graph, and the connection relationship is used to determine the neighbor nodes.
  • the first participant can be a bank
  • the user account can be a bank card number
  • each node in the first sub-graph represents a bank card number
  • the transfer relationship between bank card numbers can be used to form a connection relationship between nodes, for example , there is a transfer record between bank card number A and bank card number B, then there may be a connection line between the node corresponding to bank card number A and the node corresponding to bank card number B.
  • two directly connected nodes can be neighbor nodes.
  • the second participant can also use a similar method to construct the second subgraph.
  • the bank card number of the first participant may have a transfer relationship with the bank card number of the second participant, there may be at least some nodes in the first subgraph that are connected to nodes in the second subgraph , the first participant and the second participant store corresponding transfer records, so they can know the connection relationship between their own nodes and nodes of other participants, and these connection relationships can be used for aggregation operations in subsequent steps.
  • graph data can be constructed based on user accounts and transfer relationships, so that user accounts can be analyzed based on the graph data, and the efficiency of monitoring user accounts can be improved.
  • Step 502 Initialize all nodes in the graph data to obtain initial feature vectors of the nodes.
  • an initialization operation may be performed according to its corresponding attribute information.
  • the first participant may determine the initial feature vector corresponding to the node in the first sub-graph according to the attribute information of the user account of the first participant; similarly, the second participant may determine the initial feature vector according to the second The attribute information of the user account of the participant determines the initial feature vector corresponding to the node in the second subgraph.
  • the eigenvector of the last iterative process used in the first iterative process is the initial eigenvector.
  • the attribute information may include any information used to characterize the attributes of the user account, such as but not limited to: the user's region, age, gender, education, occupation, income, card opening time, card balance, etc.
  • the initialization operation may refer to performing an assignment operation according to attribute information.
  • attribute information A simple example would be 000 for ages between 21 and 30 and 001 for ages 31-40.
  • Constructing the initial feature vector through the attribute information of the user account can quickly and effectively sort out the user's characteristics and apply it to the subsequent iterative process, so as to comprehensively analyze and process it based on the attribute information of the user account and the transfer relationship, and improve the business.
  • the predictive effect of risk information can quickly and effectively sort out the user's characteristics and apply it to the subsequent iterative process, so as to comprehensively analyze and process it based on the attribute information of the user account and the transfer relationship, and improve the business.
  • Step 504 in the kth iteration process, traverse the nodes in the graph data, for each node, the first participant calculates the first aggregation result of the node in the first sub-graph, and the second participant calculates the For the second aggregation result of the node in the second subgraph, the first participant or the second participant determines the feature vector of the k-th round of the node according to the first aggregation result and the second aggregation result.
  • each node in the graph data may be traversed, and for each traversed node, the following steps a to d may be performed.
  • Step a Aggregate the neighbor nodes of the currently traversed node in the first subgraph through an Aggregate (aggregation) function to obtain a first aggregation result.
  • step a may be performed by the first participant.
  • the first participant may search for the neighbor nodes of the node in the first subgraph, perform an aggregation operation according to the feature vectors of the last iteration of the neighbor nodes, and obtain a first aggregation result.
  • the eigenvector of the last iteration process is the initial eigenvector, that is, the first aggregation result of the first round of the current node can be calculated based on the initial eigenvectors of the neighbor nodes of the current node.
  • searching for the neighbor nodes of the node in the first subgraph, and performing an aggregation operation according to the feature vectors of the last iteration process of the neighbor nodes may include: searching for the nodes in the first subgraph All neighbor nodes in the subgraph; based on the sampling operation with replacement, select a preset number of neighbor nodes from the found neighbor nodes; calculate the The first aggregation result.
  • the sampling operation with replacement means that after selecting any neighbor node from the set of all neighbor nodes, put the selected neighbor node back into the set, and continue sampling until the sampling times meet the requirements, that is, any Neighbor nodes may be drawn one or more times.
  • the neighbor nodes of node v include node a, node b and node c, the preset number is 3, then after the sampling operation with replacement, the final selected neighbor nodes may be node a, node b, Node a, that is, node a is selected twice, based on the selected three neighbor nodes (two of which are the same), the first aggregation result corresponding to node v can be calculated.
  • the first aggregation result may be calculated through an Aggregate function.
  • the Aggregate function can be designed according to actual needs, for example, it can be a mean function, that is, the feature vectors of the last iteration process of the selected neighbor nodes are averaged, and used as the first aggregation result of the node.
  • the second participant can send request information to the first participant, and the first participant can determine the nodes in the second subgraph according to the request information
  • the neighbor node in the first subgraph, and the aggregation result of the node determined by the above method is sent to the second participant. Due to the sampling operation with replacement, the second participant cannot understand each node in the first subgraph.
  • the feature vectors corresponding to the neighbor nodes can reduce the risk of data leakage of the first participant and effectively improve the security of graph data processing.
  • Step b Aggregate the neighbor nodes of the node in the second subgraph through an Aggregate function to obtain a second aggregation result.
  • step b may be performed by the second participant.
  • step a For the specific implementation principle and process, please refer to step a.
  • the corresponding participant may not calculate its corresponding aggregation result, or consider it to be in the first subgraph or the second subgraph
  • the aggregation result in is 0.
  • Step c Perform secure aggregation on the first aggregation result and the second aggregation result of the nodes to obtain the final aggregation result of the current round.
  • the SecureAggregate function may be used to perform a secure aggregation operation, and the specific implementation scheme may refer to the foregoing embodiments, which will not be repeated here.
  • Step d Determine the feature vector of the current round of the node according to the final aggregation result of the current round of the node.
  • the feature vector of the current round of the node may be determined according to the final aggregation result of the current round of the node and the feature vector of the previous round of the node.
  • the current round may refer to the kth round
  • the previous round may refer to the k-1th round
  • the previous round of the first round may refer to the initialization phase.
  • the final aggregation result of the k-th round of the node and the feature vector of the k-1th round of the node can be CONCATed, and the obtained result is the same as
  • the model parameters are multiplied, and the multiplied result is added with a nonlinear component through the ⁇ function to obtain the final result, and the feature vector of the current round is determined according to the final result.
  • model parameters of each round may be different, and during the iteration process of the kth round, the feature vector may be calculated using the model parameters corresponding to the kth round.
  • the model parameters may be model parameters obtained after training. Specifically, they may be obtained after a single participant uses a training sample set to train the model, or may be obtained after joint training of the model by multiple participants.
  • the model can refer to GraphSAGE or other graph neural network models.
  • the final results of each node can be normalized or regularized to obtain the feature vector of the current round of each node, thereby completing the iteration of the kth round operate.
  • Step 505 judging whether k is equal to K. If not, execute step 506; if yes, execute step 507.
  • K is the number of iterations required, which can be set according to actual needs.
  • Step 506 after increasing the value of k by 1, re-execute step 504.
  • Step 507 according to the eigenvector of the last round of iterative process of each node in the graph data, determine the corresponding prediction result.
  • the feature vector used in step 507 may be the final feature vector obtained after the last iterative process is completed, that is, the feature vector of the Kth round of the node.
  • the feature vector of the K-th round of the node can be input to the predictor function, or input to the sigmoid function to obtain the corresponding prediction result.
  • the graph data can be constructed first. After the construction is completed, for any node v, according to its corresponding attribute information , to initialize it, Indicates the value of node v after initialization.
  • the k-th iteration process for each node v in the graph data, find the neighbor node u of node v, according to the feature vector of the k-1th round of neighbor node u , calculate the final aggregation result of node v in the k-th round iteration process, and then according to the final aggregation result, the feature vector of the k-1th round And model parameters, etc., to get the eigenvector of the kth iteration process of node v .
  • the feature vector of round k can be determined according to the feature vector of round k-1, and the feature vector of round k is used to calculate the feature vector of round k+1. After the number of iterations reaches the preset number K, according to the final feature vector , to determine the prediction result.
  • the second participant may determine the business risk information of the corresponding user account according to the nodes in the second subgraph.
  • determining the business risk information of the user account may be specifically determining whether the user account is an abnormal account. If the user account is determined to be an abnormal account according to the feature vector, it indicates that the user account may have illegal Behavior needs to be reported for processing.
  • determining the business risk information of the user account can specifically determine whether the user account has an overdue risk. If it is determined that the user account has an overdue risk based on the feature vector, it needs to be closely monitored, or the user account must be adjusted. credit rating.
  • the type of business risk information that is finally predicted can be determined through a training process. For example, if the training process uses whether the account is abnormal as a label, the final model can be used to predict whether the account is abnormal; if the training process uses whether the account is overdue as the label, the final model can be used to predict whether the account will be overdue.
  • the graph data processing method provided in this embodiment can construct the nodes of the graph data according to the user account, and construct the connection relationship of the nodes in the graph data according to the transfer record of the user account, and the connection relationship is used to determine the neighbor nodes, according to
  • the attribute information of the user account constructs the initial feature vector corresponding to the node in the graph data, and finally after multiple iterations, the obtained feature vector can be used to predict the business risk information of the user account, which can integrate the information of different participants.
  • the information of user accounts together completes the prediction of business risk information, improves the accuracy of prediction of business risks, screens out abnormal user accounts in time, and effectively realizes the monitoring of user accounts.
  • FIG. 6 is a schematic structural diagram of an apparatus for processing graph data provided by an embodiment of the present application.
  • the graph data includes a first subgraph comprising nodes belonging to a first party and a second subgraph comprising nodes belonging to a second party.
  • the apparatus is applicable to a first party.
  • the processing means for the graph data may include:
  • An execution module 601, configured to traverse the nodes in the first subgraph during any iteration, and perform the following operations for each traversed node;
  • a search module 602 configured to search for neighbor nodes of the node in the first subgraph, perform an aggregation operation according to the feature vectors of the last round of iteration process of the neighbor nodes, and obtain a first aggregation result;
  • An aggregation module 603, configured to determine the final aggregation result of the node according to the first aggregation result and the second aggregation result when the node has a connection relationship with the nodes in the second sub-graph; wherein, the The second aggregation result is determined by the second participant according to the eigenvectors of the node's neighbor nodes in the second subgraph in the last round of iterative process;
  • the determination module 604 is configured to determine the eigenvector of the current round of iterative process of the node according to the final aggregation result; after the number of iterations meets the requirements, the eigenvector of the last round of iterative process is used to calculate the prediction result corresponding to the node.
  • the execution module 601 can traverse the nodes in the first sub-graph in any iteration process, and for each node traversed, the search module 602, the aggregation module 603, and the determination module 604 can be used to calculate its The corresponding eigenvectors.
  • the aggregation module 603 is specifically configured to:
  • the node has a connection relationship with a node in the second subgraph, send request information to the second participant, where the request information is used to request the second participant to calculate the first node corresponding to the node two aggregation results and encrypting said second aggregation result;
  • the encrypted second aggregation result is a second aggregation result encrypted with a public key; the aggregation module 603 determines the final aggregation result of the node according to the encrypted second aggregation result As a result, specifically for:
  • search module 602 is specifically configured to:
  • the first aggregation result is calculated according to the eigenvectors of the last round of iterative process of the selected neighbor nodes.
  • the executing module 601 is further configured to:
  • connection relationship of the nodes in the first subgraph constructs the connection relationship of the nodes in the first subgraph, and the connection relationship is used to determine the neighbor nodes;
  • the executing module 601 is further configured to:
  • the executing module 601 is further configured to:
  • the eigenvector of the last iterative process used in the first iterative process is the initial eigenvector.
  • the execution module 601 determines the business risk information of the user account corresponding to the node, it is specifically configured to:
  • the aggregation module 603 determines the final aggregation result of the node according to the first aggregation result and the second aggregation result, it is specifically used to:
  • a final aggregation result is determined through a nonlinear algorithm according to the first aggregation result and the second aggregation result.
  • the graph data is used to implement social behavior analysis
  • the nodes in the graph data are used to represent users
  • the preset association relationships include family relationships and employment relationships
  • the preset association relationships are used to determine neighbor nodes.
  • FIG. 7 is a schematic structural diagram of a graph data processing device provided by an embodiment of the present application.
  • the device may include: a memory 701, a processor 702, and a processing program of graph data stored on the memory 701 and operable on the processor 702, the processing program of the graph data When executed by the processor 702, the steps of the method for processing graph data as described in any of the foregoing embodiments are implemented.
  • the memory 701 can be independent or integrated with the processor 702 .
  • the embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium stores a processing program for image data. The steps of the processing method of the graph data described above.
  • An embodiment of the present application further provides a computer program product, including a computer program, and when the computer program is executed by a processor, the method described in any of the preceding embodiments is implemented.
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the modules is only a logical function division.
  • there may be other division methods for example, multiple modules can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the above-mentioned integrated modules implemented in the form of software function modules may be stored in a computer-readable storage medium.
  • the above-mentioned software function modules are stored in a storage medium, and include several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) or a processor execute some steps of the methods described in various embodiments of the present application.
  • processor may be a central processing unit (Central Processing Unit, referred to as CPU), and may also be other general-purpose processors, digital signal processors (Digital Signal Processor (DSP for short), Application Specific Integrated Circuit (ASIC for short), etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in conjunction with the invention can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
  • the storage may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk storage, and may also be a U disk, a mobile hard disk, a read-only memory, a magnetic disk, or an optical disk.
  • NVM non-volatile storage
  • the above-mentioned storage medium can be realized by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable In addition to programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory magnetic memory
  • flash memory magnetic disk or optical disk.
  • a storage media may be any available media that can be accessed by a general purpose or special purpose computer.
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • Processors and storage media can be located in application-specific integrated circuits (Application Specific Integrated Circuits, referred to as ASIC).
  • ASIC Application Specific Integrated Circuits
  • the processor and the storage medium can also exist in the electronic device or the main control device as discrete components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Economics (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application discloses a graph data processing method and apparatus, a device, a storage medium, and a program product. The method comprises: traversing nodes in a first sub-graph during any round of iterations; for each traversed node, searching for a neighbor node of the node in the first sub-graph, performing an aggregation operation according to an eigenvector of the neighbor node of the previous round of iterations to obtain a first aggregation result; if the node has a connection relationship with a node in a second sub-graph, determining a final aggregation result of the node according to the first aggregation result and a second aggregation result, wherein the second aggregation result is determined by a second participant according to the eigenvector of the node during the previous round of iterations of the neighbor node in the second sub-graph; and determining the eigenvector of the current round of iterations of the node according to the final aggregation result. Once the number of iterations meets requirements, the eigenvector of the last round of iterations is used to calculate a prediction result corresponding to the node. The prediction accuracy of graph data can be effectively increased.

Description

图数据的处理方法、装置、设备、存储介质及程序产品Graph data processing method, device, device, storage medium and program product
本申请要求于2021年5月10日提交中国专利局、申请号为202110507515.0、申请名称为“图数据的处理方法、装置、设备、存储介质及程序产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202110507515.0 and the application name "Graph data processing method, device, equipment, storage medium and program product" submitted to the China Patent Office on May 10, 2021, all of which The contents are incorporated by reference in this application.
技术领域technical field
本申请涉及数据处理技术领域,尤其涉及一种图数据的处理方法、装置、设备、存储介质及程序产品。The present application relates to the technical field of data processing, and in particular to a method, device, equipment, storage medium, and program product for processing graph data.
背景技术Background technique
随着计算机技术和大数据处理技术的不断发展,深度学习的应用越来越广泛,图神经网络逐渐替代传统的人工设计的图特征,来提取图数据背后所隐藏的价值,进行相关的识别预测处理。例如,基于金融机构构建的图数据,可以识别用户是否存在逾期风险等。With the continuous development of computer technology and big data processing technology, the application of deep learning is becoming more and more extensive. The graph neural network gradually replaces the traditional artificially designed graph features to extract the hidden value behind the graph data and make related recognition predictions. deal with. For example, based on the graph data constructed by financial institutions, it is possible to identify whether users have overdue risks, etc.
在实际应用中,海量的图数据往往分布在不同的机构,但是由于数据隐私的要求,单个机构无法使用其它机构的数据进行分析处理,导致针对图数据进行预测的准确性较差。In practical applications, massive graph data are often distributed in different organizations, but due to data privacy requirements, a single organization cannot use data from other organizations for analysis and processing, resulting in poor prediction accuracy for graph data.
技术问题technical problem
本申请的主要目的在于提供一种图数据的处理方法、装置、设备、存储介质及程序产品,旨在提高针对图数据进行预测的准确性。The main purpose of the present application is to provide a processing method, device, equipment, storage medium and program product for graph data, aiming at improving the accuracy of prediction for graph data.
为实现上述目的,本申请提供一种图数据的处理方法,所述图数据包括第一子图和第二子图,所述第一子图包括属于第一参与方的节点,所述第二子图包括属于第二参与方的节点;所述方法应用于第一参与方,所述方法包括:To achieve the above purpose, the present application provides a method for processing graph data, the graph data includes a first subgraph and a second subgraph, the first subgraph includes nodes belonging to a first participant, and the second The subgraph includes nodes belonging to a second party; the method is applied to the first party, the method comprising:
在任意一轮迭代过程中,遍历所述第一子图中的节点,对于遍历到的每一节点,执行如下操作:During any round of iteration, traverse the nodes in the first subgraph, and perform the following operations for each traversed node:
查找所述节点在所述第一子图中的邻居节点,根据所述邻居节点的上一轮迭代过程的特征向量进行聚合操作,得到第一聚合结果;Finding the neighbor nodes of the node in the first subgraph, and performing an aggregation operation according to the feature vector of the last round of iteration process of the neighbor nodes, to obtain a first aggregation result;
若所述节点与所述第二子图中的节点具有连接关系,则根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果;其中,所述第二聚合结果是所述第二参与方根据所述节点在所述第二子图中的邻居节点的上一轮迭代过程的特征向量确定的;If the node has a connection relationship with the nodes in the second subgraph, then determine the final aggregation result of the node according to the first aggregation result and the second aggregation result; wherein the second aggregation result is the The second participant is determined according to the eigenvectors of the last round of iterative process of the neighbor nodes of the node in the second subgraph;
根据所述最终聚合结果,确定所述节点的本轮迭代过程的特征向量;在迭代次数满足要求后,最后一轮迭代过程的特征向量用于计算节点对应的预测结果。According to the final aggregation result, the eigenvector of the current round of iterative process of the node is determined; after the number of iterations meets the requirement, the eigenvector of the last round of iterative process is used to calculate the prediction result corresponding to the node.
可选的,若所述节点与所述第二子图中的节点具有连接关系,则根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果,包括:Optionally, if the node has a connection relationship with the nodes in the second subgraph, determining the final aggregation result of the node according to the first aggregation result and the second aggregation result includes:
若所述节点与所述第二子图中的节点具有连接关系,则向所述第二参与方发送请求信息,所述请求信息用于请求所述第二参与方计算所述节点对应的第二聚合结果并对所述第二聚合结果进行加密;If the node has a connection relationship with a node in the second subgraph, send request information to the second participant, where the request information is used to request the second participant to calculate the first node corresponding to the node two aggregation results and encrypting said second aggregation result;
接收所述第二参与方发送的加密后的第二聚合结果;receiving the encrypted second aggregation result sent by the second participant;
根据所述加密后的第二聚合结果,确定所述节点的最终聚合结果。Determine the final aggregation result of the node according to the encrypted second aggregation result.
可选的,所述加密后的第二聚合结果为利用公钥进行加密后的第二聚合结果;根据所述加密后的第二聚合结果,确定所述节点的最终聚合结果,包括:Optionally, the encrypted second aggregation result is a second aggregation result encrypted with a public key; according to the encrypted second aggregation result, determining the final aggregation result of the node includes:
利用所述公钥对所述第一聚合结果进行加密;encrypting the first aggregation result using the public key;
基于随机掩码,对加密后的第一聚合结果与所述加密后的第二聚合结果进行计算,得到加密后的最终聚合结果;Based on the random mask, calculating the encrypted first aggregation result and the encrypted second aggregation result to obtain the encrypted final aggregation result;
将所述加密后的最终聚合结果发送给第三参与方,以使所述第三参与方利用私钥对所述加密后的最终聚合结果进行解密;sending the encrypted final aggregation result to a third party, so that the third party uses a private key to decrypt the encrypted final aggregation result;
接收第三参与方发送的解密结果,并对所述解密结果进行去随机掩码操作,得到最终聚合结果。Receive the decryption result sent by the third participant, and perform a random masking operation on the decryption result to obtain the final aggregation result.
可选的,查找所述节点在所述第一子图中的邻居节点,根据所述邻居节点的上一轮迭代过程的特征向量进行聚合操作,包括:Optionally, searching for neighbor nodes of the node in the first subgraph, and performing an aggregation operation according to the feature vectors of the last iteration process of the neighbor nodes, including:
查找所述节点在所述第一子图中的所有邻居节点;Find all neighbor nodes of the node in the first subgraph;
基于有放回的抽样操作,从查找到的邻居节点中选择预设数量的邻居节点;Based on the sampling operation with replacement, select a preset number of neighbor nodes from the found neighbor nodes;
根据所选择的邻居节点的上一轮迭代过程的特征向量,计算所述第一聚合结果。The first aggregation result is calculated according to the eigenvectors of the last round of iterative process of the selected neighbor nodes.
可选的,所述方法还包括:Optionally, the method also includes:
根据属于所述第一参与方的用户账号,构建所述第一子图中的节点;constructing nodes in the first subgraph according to user accounts belonging to the first participant;
根据所述第一参与方的用户账号的转账记录,构建第一子图中的节点的连接关系,所述连接关系用于确定邻居节点;According to the transfer record of the user account of the first participant, construct the connection relationship of the nodes in the first subgraph, and the connection relationship is used to determine the neighbor nodes;
相应的,在迭代次数满足预设要求后,所述方法还包括:Correspondingly, after the number of iterations meets the preset requirements, the method further includes:
根据所述第一子图中任一节点在最后一轮迭代过程对应的特征向量,确定所述节点对应的用户账号的业务风险信息。According to the feature vector corresponding to any node in the first subgraph in the last iteration process, determine the service risk information of the user account corresponding to the node.
可选的,所述方法还包括:Optionally, the method also includes:
根据所述第一参与方的用户账号的属性信息,确定所述第一子图中节点对应的初始特征向量;determining an initial feature vector corresponding to a node in the first subgraph according to the attribute information of the user account of the first participant;
其中,第一轮迭代过程中使用的上一轮迭代过程的特征向量为所述初始特征向量。Wherein, the eigenvector of the last iterative process used in the first iterative process is the initial eigenvector.
可选的,确定所述节点对应的用户账号的业务风险信息,包括:Optionally, determining the business risk information of the user account corresponding to the node includes:
确定所述用户账号是否为异常账号,若根据特征向量确定所述用户账号属于异常账号,则进行上报处理;或者,Determine whether the user account is an abnormal account, and if it is determined according to the feature vector that the user account belongs to an abnormal account, perform reporting processing; or,
确定所述用户账号是否有逾期风险,若根据特征向量确定所述用户账号存在逾期风险,则对所述用户账号进行监控,或者调整所述用户账号的信用等级。Determine whether the user account has an overdue risk, and if it is determined according to the feature vector that the user account has an overdue risk, monitor the user account, or adjust the credit level of the user account.
可选的,根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果,包括:Optionally, determining the final aggregation result of the node according to the first aggregation result and the second aggregation result includes:
根据第一聚合结果和第二聚合结果,通过非线性算法,确定最终聚合结果。A final aggregation result is determined through a nonlinear algorithm according to the first aggregation result and the second aggregation result.
可选的,所述图数据用于实现社交行为分析,图数据中的节点用于表示用户,预设关联关系包括家人关系、雇佣关系,所述预设关联关系用于确定邻居节点。Optionally, the graph data is used to implement social behavior analysis, the nodes in the graph data are used to represent users, and the preset association relationships include family relationships and employment relationships, and the preset association relationships are used to determine neighbor nodes.
本申请还提供一种图数据的处理装置,所述图数据包括第一子图和第二子图,所述第一子图包括属于第一参与方的节点,所述第二子图包括属于第二参与方的节点;所述装置应用于第一参与方,所述装置包括:The present application also provides a graph data processing device, the graph data includes a first subgraph and a second subgraph, the first subgraph includes nodes belonging to the first participant, and the second subgraph includes nodes belonging to A node of a second participant; the device is applied to the first participant, the device comprising:
执行模块,用于在任意一轮迭代过程中,遍历所述第一子图中的节点,对于遍历到的每一节点,执行如下操作;An execution module, configured to traverse the nodes in the first subgraph during any iteration, and perform the following operations for each traversed node;
查找模块,用于查找所述节点在所述第一子图中的邻居节点,根据所述邻居节点的上一轮迭代过程的特征向量进行聚合操作,得到第一聚合结果;A search module, configured to search for neighbor nodes of the node in the first subgraph, perform an aggregation operation according to the feature vectors of the last round of iteration process of the neighbor nodes, and obtain a first aggregation result;
聚合模块,用于在所述节点与所述第二子图中的节点具有连接关系时,根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果;其中,所述第二聚合结果是所述第二参与方根据所述节点在所述第二子图中的邻居节点的上一轮迭代过程的特征向量确定的;An aggregation module, configured to determine the final aggregation result of the node according to the first aggregation result and the second aggregation result when the node has a connection relationship with the nodes in the second sub-graph; wherein, the first The second aggregation result is determined by the second participant according to the eigenvectors of the last iteration process of the neighbor nodes of the node in the second subgraph;
确定模块,用于根据所述最终聚合结果,确定所述节点的本轮迭代过程的特征向量;在迭代次数满足要求后,最后一轮迭代过程的特征向量用于计算节点对应的预测结果。The determination module is configured to determine the eigenvector of the current round of iterative process of the node according to the final aggregation result; after the number of iterations meets the requirement, the eigenvector of the last round of iterative process is used to calculate the prediction result corresponding to the node.
可选的,所述聚合模块具体用于:Optionally, the aggregation module is specifically used for:
若所述节点与所述第二子图中的节点具有连接关系,则向所述第二参与方发送请求信息,所述请求信息用于请求所述第二参与方计算所述节点对应的第二聚合结果并对所述第二聚合结果进行加密;If the node has a connection relationship with a node in the second subgraph, send request information to the second participant, where the request information is used to request the second participant to calculate the first node corresponding to the node two aggregation results and encrypting said second aggregation result;
接收所述第二参与方发送的加密后的第二聚合结果;receiving the encrypted second aggregation result sent by the second participant;
根据所述加密后的第二聚合结果,确定所述节点的最终聚合结果。Determine the final aggregation result of the node according to the encrypted second aggregation result.
可选的,所述加密后的第二聚合结果为利用公钥进行加密后的第二聚合结果;所述聚合模块在根据所述加密后的第二聚合结果,确定所述节点的最终聚合结果时,具体用于:Optionally, the encrypted second aggregation result is a second aggregation result encrypted with a public key; the aggregation module determines the final aggregation result of the node according to the encrypted second aggregation result , specifically for:
利用所述公钥对所述第一聚合结果进行加密;encrypting the first aggregation result using the public key;
基于随机掩码,对加密后的第一聚合结果与所述加密后的第二聚合结果进行计算,得到加密后的最终聚合结果;Based on the random mask, calculating the encrypted first aggregation result and the encrypted second aggregation result to obtain the encrypted final aggregation result;
将所述加密后的最终聚合结果发送给第三参与方,以使所述第三参与方利用私钥对所述加密后的最终聚合结果进行解密;sending the encrypted final aggregation result to a third party, so that the third party uses a private key to decrypt the encrypted final aggregation result;
接收第三参与方发送的解密结果,并对所述解密结果进行去随机掩码操作,得到最终聚合结果。Receive the decryption result sent by the third participant, and perform a random masking operation on the decryption result to obtain the final aggregation result.
可选的,所述查找模块具体用于:Optionally, the search module is specifically used for:
查找所述节点在所述第一子图中的所有邻居节点;Find all neighbor nodes of the node in the first subgraph;
基于有放回的抽样操作,从查找到的邻居节点中选择预设数量的邻居节点;Based on the sampling operation with replacement, select a preset number of neighbor nodes from the found neighbor nodes;
根据所选择的邻居节点的上一轮迭代过程的特征向量,计算所述第一聚合结果。The first aggregation result is calculated according to the eigenvectors of the last round of iterative process of the selected neighbor nodes.
可选的,所述执行模块还用于:Optionally, the execution module is also used for:
根据属于所述第一参与方的用户账号,构建所述第一子图中的节点;constructing nodes in the first subgraph according to user accounts belonging to the first participant;
根据所述第一参与方的用户账号的转账记录,构建第一子图中的节点的连接关系,所述连接关系用于确定邻居节点;According to the transfer record of the user account of the first participant, construct the connection relationship of the nodes in the first subgraph, and the connection relationship is used to determine the neighbor nodes;
相应的,在迭代次数满足预设要求后,所述执行模块还用于:Correspondingly, after the number of iterations meets the preset requirements, the executing module is also used for:
根据所述第一子图中任一节点在最后一轮迭代过程对应的特征向量,确定所述节点对应的用户账号的业务风险信息。According to the feature vector corresponding to any node in the first subgraph in the last iteration process, determine the service risk information of the user account corresponding to the node.
可选的,所述执行模块还用于:Optionally, the execution module is also used for:
根据所述第一参与方的用户账号的属性信息,确定所述第一子图中节点对应的初始特征向量;determining an initial feature vector corresponding to a node in the first subgraph according to the attribute information of the user account of the first participant;
其中,第一轮迭代过程中使用的上一轮迭代过程的特征向量为所述初始特征向量。Wherein, the eigenvector of the last iterative process used in the first iterative process is the initial eigenvector.
可选的,所述执行模块在确定所述节点对应的用户账号的业务风险信息时,具体用于:Optionally, when the execution module determines the business risk information of the user account corresponding to the node, it is specifically used to:
确定所述用户账号是否为异常账号,若根据特征向量确定所述用户账号属于异常账号,则进行上报处理;或者,Determine whether the user account is an abnormal account, and if it is determined according to the feature vector that the user account belongs to an abnormal account, perform reporting processing; or,
确定所述用户账号是否有逾期风险,若根据特征向量确定所述用户账号存在逾期风险,则对所述用户账号进行监控,或者调整所述用户账号的信用等级。Determine whether the user account has an overdue risk, and if it is determined according to the feature vector that the user account has an overdue risk, monitor the user account, or adjust the credit level of the user account.
可选的,所述聚合模块在根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果时,具体用于:Optionally, when the aggregation module determines the final aggregation result of the node according to the first aggregation result and the second aggregation result, it is specifically used to:
根据第一聚合结果和第二聚合结果,通过非线性算法,确定最终聚合结果。A final aggregation result is determined through a nonlinear algorithm according to the first aggregation result and the second aggregation result.
本申请还提供一种图数据的处理设备,所述图数据的处理设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的图数据的处理程序,所述图数据的处理程序被所述处理器执行时实现如前述任一项所述的图数据的处理方法的步骤。The present application also provides a graph data processing device, the graph data processing device includes: a memory, a processor, and a graph data processing program stored in the memory and operable on the processor, the When the graph data processing program is executed by the processor, the steps of the graph data processing method described in any one of the preceding items are realized.
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有图数据的处理程序,所述图数据的处理程序被处理器执行时实现如前述任一项所述的图数据的处理方法的步骤。The present application also provides a computer-readable storage medium, the computer-readable storage medium stores a processing program for graph data, and when the processing program for graph data is executed by a processor, the graph as described in any one of the preceding items is realized. The steps of the data processing method.
本申请还提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现前述任一项所述的方法。The present application also provides a computer program product, including a computer program. When the computer program is executed by a processor, the method described in any one of the preceding items is implemented.
本申请中,可以对包含第一子图和第二子图的图数据进行处理,所述第一子图包括属于第一参与方的节点,所述第二子图包括属于第二参与方的节点,在任意一轮迭代过程中,第一参与方可以遍历所述第一子图中的节点,对于遍历到的每一节点,查找所述节点在所述第一子图中的邻居节点,根据所述邻居节点的上一轮迭代过程的特征向量进行聚合操作,得到第一聚合结果,若所述节点与所述第二子图中的节点具有连接关系,则根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果,其中,所述第二聚合结果是所述第二参与方根据所述节点在所述第二子图中的邻居节点的上一轮迭代过程的特征向量确定的,根据所述最终聚合结果,确定所述节点的本轮迭代过程的特征向量,在迭代次数满足要求后,最后一轮迭代过程的特征向量用于计算节点对应的预测结果,由于在对节点的分析过程中,融合了第一参与方对第一子图的邻居节点的聚合结果以及第二参与方对第二子图的邻居节点的聚合结果,从而可以更加全面、准确地提取节点的特征,能够在数据互通存在壁垒的情况下,综合利用各方数据,共同实现图数据的处理,有效提高基于图数据进行预测的准确性。In this application, graph data including a first subgraph including nodes belonging to a first participant and a second subgraph including nodes belonging to a second participant may be processed. node, in any round of iteration process, the first participant can traverse the nodes in the first subgraph, and for each node traversed, find the neighbor nodes of the node in the first subgraph, Perform an aggregation operation according to the eigenvectors of the last iteration of the neighbor node to obtain a first aggregation result, if the node has a connection relationship with a node in the second sub-graph, then according to the first aggregation result and the second aggregation result to determine the final aggregation result of the node, wherein the second aggregation result is the last round of iteration process of the second participant according to the neighbor nodes of the node in the second subgraph Determined by the eigenvector of the node, according to the final aggregation result, determine the eigenvector of the current round of iterative process of the node, after the number of iterations meets the requirements, the eigenvector of the last round of iterative process is used to calculate the prediction result corresponding to the node, In the process of node analysis, the aggregation results of the first participant on the neighbor nodes of the first subgraph and the aggregation results of the second participant on the neighbor nodes of the second subgraph are integrated, so that the results can be more comprehensive and accurate Extracting the characteristics of nodes can comprehensively utilize the data of all parties in the case of barriers to data interoperability, jointly realize the processing of graph data, and effectively improve the accuracy of prediction based on graph data.
附图说明Description of drawings
图1为本申请实施例提供的一种应用场景示意图;FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application;
图2为本申请实施例提供的一种图数据的示意图;Fig. 2 is a schematic diagram of a kind of graph data provided by the embodiment of the present application;
图3为本申请实施例提供的一种图数据的处理方法的流程示意图;FIG. 3 is a schematic flowchart of a method for processing graph data provided in an embodiment of the present application;
图4为本申请实施例提供的一种图数据处理的系统架构图;FIG. 4 is a system architecture diagram of graph data processing provided by an embodiment of the present application;
图5为本申请实施例提供的另一种图数据的处理方法的流程示意图;FIG. 5 is a schematic flowchart of another graph data processing method provided by the embodiment of the present application;
图6为本申请实施例提供的一种图数据的处理装置的结构示意图;FIG. 6 is a schematic structural diagram of a graph data processing device provided in an embodiment of the present application;
图7为本申请实施例提供的一种图数据的处理设备的结构示意图。FIG. 7 is a schematic structural diagram of a graph data processing device provided by an embodiment of the present application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional features and advantages of the present application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
本发明的实施方式Embodiments of the present invention
下面将参照附图更详细地描述本申请的示例性实施例。虽然附图中显示了本申请的示例性实施例,然而应当理解,可以以各种形式实现本申请而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本申请,并且能够将本申请的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that the present application can be more thoroughly understood, and the scope of the present application can be fully conveyed to those skilled in the art.
关联分析,或者图分析是一类重要的分析方法。通过这类方法,操作人员可以方便地将实体间的关系进行建模分析,判断网络中是否存在特定的连通形式或者重要节点。传统的分析方法主要是以人工设计的图特征为分析对象,诸如PageRank(网页排名)、中心度等等。近些年来,随着深度学习的兴起,图神经网络逐渐替代传统的人工设计的图特征,来提取图数据背后所隐藏的价值。Association analysis, or graph analysis, is an important class of analysis methods. Through such methods, operators can easily model and analyze the relationship between entities, and determine whether there is a specific connection form or important node in the network. Traditional analysis methods mainly use artificially designed graph features as the analysis object, such as PageRank (web page ranking), centrality and so on. In recent years, with the rise of deep learning, graph neural networks have gradually replaced traditional artificially designed graph features to extract the value hidden behind graph data.
图神经网络的应用需要依赖图数据来实现,但实际操作中,图数据往往涉及多个机构。受制于数据隐私的相关要求,这些数据无法汇集形成有效的网络以得到更加准确的预测结果。The application of graph neural network needs to rely on graph data, but in actual operation, graph data often involves multiple institutions. Subject to the relevant requirements of data privacy, these data cannot be collected to form an effective network to obtain more accurate prediction results.
图1为本申请实施例提供的一种应用场景示意图。如图1所示,机构1和机构2均为银行,且机构1和机构2均具有多个用户账号,例如,机构1包括用户账号A、用户账号B、用户账号C等,机构2包括用户账号D、用户账号E、用户账号F等。所述用户账号具体可以为银行卡号等。不同用户账号之间的连线代表之间存在转账记录。FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application. As shown in Figure 1, Institution 1 and Institution 2 are both banks, and both Institution 1 and Institution 2 have multiple user accounts. For example, Institution 1 includes User Account A, User Account B, User Account C, etc. Institution 2 includes User Account Account D, user account E, user account F, etc. The user account number may specifically be a bank card number or the like. Connections between different user accounts represent transfer records between them.
基于用户账号之间的转账关系以及用户账号的基本属性信息,可以构建图数据,因为图数据中包含了用户的基本属性信息以及相关的金融特征,因此根据图数据可以对用户账号进行逾期风险预测或者识别非正常账号等,满足监控需求。Based on the transfer relationship between user accounts and the basic attribute information of user accounts, graph data can be constructed, because the graph data contains the user's basic attribute information and related financial characteristics, so the overdue risk prediction of user accounts can be performed based on the graph data Or identify abnormal accounts, etc., to meet monitoring requirements.
在实际应用中,机构1中的用户账号不仅会与本机构内的其它用户账号有转账关系,可能还会与机构2中的用户账号有转账关系。如图1所示,机构1中的用户账号B与本机构的用户账号A、用户账号C有转账关系,还与机构2中的用户账号E具有转账关系。而由于机构间的数据不互通,因此,机构1不能获知机构2中的用户账号E的详细信息,只能够依据本机构的用户账号A和C的详细信息对用户账号B进行分析处理,丢失了用户账号B在其它机构的有效信息,导致预测识别的准确性较差。In practical applications, the user account in organization 1 will not only have a transfer relationship with other user accounts in the organization, but may also have a transfer relationship with the user account in organization 2. As shown in Figure 1, user account B in organization 1 has a transfer relationship with user account A and user account C of the organization, and also has a transfer relationship with user account E in organization 2. However, due to the incompatibility of data between institutions, institution 1 cannot obtain the detailed information of user account E in institution 2, and can only analyze and process user account B based on the detailed information of user accounts A and C of its own institution, thus losing The valid information of user account B in other institutions leads to poor accuracy of prediction and identification.
有鉴于此,本申请实施例提供一种图数据的处理方法,可以联合不同的参与方对图数据进行处理。所述图数据可以包括第一子图和第二子图,所述第一子图包括属于第一参与方的节点,所述第二子图包括属于第二参与方的节点。以所述图数据应用于金融机构为例,所述第一参与方和第二参与方可以均为金融机构如银行,图数据中的节点可以用于表示用户账号,节点间的连接关系可以用于表示用户账号之间的转账关系。In view of this, the embodiment of the present application provides a method for processing graph data, which can process graph data in cooperation with different participants. The graph data may comprise a first subgraph comprising nodes belonging to a first party and a second subgraph comprising nodes belonging to a second party. Taking the application of the graph data in financial institutions as an example, the first participant and the second participant can both be financial institutions such as banks, the nodes in the graph data can be used to represent user accounts, and the connection relationship between nodes can be represented by Indicates the transfer relationship between user accounts.
在对图数据中的节点进行处理时,第一参与方可以根据所述节点在所述第一子图中的邻居节点进行聚合操作,得到第一聚合结果;第二参与方可以根据所述节点在所述第二子图中的邻居节点进行聚合操作,得到第二聚合结果;根据第一聚合结果和第二聚合结果,可以得到节点对应的最终聚合结果;根据最终聚合结果进行后续的处理,可以得到节点对应的预测结果。When processing the nodes in the graph data, the first participant can perform an aggregation operation according to the neighbor nodes of the node in the first sub-graph to obtain the first aggregation result; the second participant can obtain the first aggregation result according to the node Neighboring nodes in the second subgraph perform an aggregation operation to obtain a second aggregation result; according to the first aggregation result and the second aggregation result, a final aggregation result corresponding to the node can be obtained; performing subsequent processing according to the final aggregation result, The prediction results corresponding to the nodes can be obtained.
图2为本申请实施例提供的一种图数据的示意图。如图2所示,图数据G可以包括第一子图G1和第二子图G2,图中小圆圈代表节点,第一子图G1中包括属于第一参与方的节点v11、v12、v13、v14、v15,第二子图G2这包括属于第二参与方的节点v21、v22、v23、v24,每一节点代表一个用户账号,图中的连线示出了各节点之间的连接关系。Fig. 2 is a schematic diagram of a kind of graph data provided by the embodiment of the present application. As shown in Figure 2, the graph data G may include a first subgraph G1 and a second subgraph G2, the small circles in the figure represent nodes, and the first subgraph G1 includes nodes v11, v12, v13, v14 belonging to the first participant , v15, the second subgraph G2 includes nodes v21, v22, v23, and v24 belonging to the second participant, each node represents a user account, and the connection lines in the figure show the connection relationship between the nodes.
在处理图数据中的节点时,可以有多轮迭代过程。在每一轮迭代过程中,可以遍历所有的节点,对于任一节点,均可以联合多方的数据对其进行分析处理。例如,对于节点v12来说,第一参与方可以根据其在第一子图G1中的邻居节点,即节点v11、v14进行聚合操作,得到第一聚合结果,第二参与方可以根据其在第二子图G2中的邻居节点,即节点v21、v22、v23进行聚合操作,得到第二聚合结果。根据第一聚合结果和第二聚合结果,可以得到节点v12对应的最终聚合结果,根据最终聚合结果,可以确定节点v12在本轮的特征向量,本轮的特征向量用于进行下一轮的聚合操作。在迭代次数满足要求后,最终得到的特征向量可以用于计算预测结果,例如节点v12是否存在逾期风险等。When processing nodes in graph data, there can be multiple rounds of iterative process. In each round of iteration, all nodes can be traversed, and for any node, data from multiple parties can be combined for analysis and processing. For example, for node v12, the first participant can perform an aggregation operation based on its neighbor nodes in the first subgraph G1, that is, nodes v11 and v14, to obtain the first aggregation result, and the second participant can obtain the first aggregation result based on its neighbor nodes in the first subgraph G1. Neighbor nodes in the second subgraph G2, ie, nodes v21, v22, and v23, perform an aggregation operation to obtain a second aggregation result. According to the first aggregation result and the second aggregation result, the final aggregation result corresponding to node v12 can be obtained. According to the final aggregation result, the feature vector of node v12 in this round can be determined, and the feature vector of this round is used for the next round of aggregation operate. After the number of iterations meets the requirements, the final feature vector can be used to calculate the prediction result, such as whether node v12 has an overdue risk, etc.
在本申请实施例提供的方法中,各参与方的数据不出本地,能够在保证数据安全的情况下,联合多方进行图数据的预测。并且,由于在对节点的分析过程中,融合了第一参与方对第一子图的邻居节点的聚合结果以及第二参与方对第二子图的邻居节点的聚合结果,因此,可以更加全面、准确地反映节点的转账关系,从而更加精准地提取节点的资金流转特征,在保证数据安全的基础上,有效提高预测的准确性。In the method provided by the embodiment of this application, the data of each participant does not leave the local area, and it is possible to jointly predict graph data with multiple parties while ensuring data security. Moreover, since the aggregation result of the first participant on the neighbor nodes of the first subgraph and the aggregation result of the second participant on the neighbor nodes of the second subgraph are integrated during the node analysis process, it can be more comprehensive , Accurately reflect the transfer relationship of the nodes, so as to extract the fund flow characteristics of the nodes more accurately, and effectively improve the accuracy of predictions on the basis of ensuring data security.
下面结合附图,对本申请的一些实施方式作详细说明。在各实施例之间不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。Some implementations of the present application will be described in detail below in conjunction with the accompanying drawings. Under the condition that there is no conflict between the various embodiments, the following embodiments and the features in the embodiments can be combined with each other.
图3为本申请实施例提供的一种图数据的处理方法的流程示意图。所述图数据包括第一子图和第二子图,所述第一子图包括属于第一参与方的节点,所述第二子图包括属于第二参与方的节点。本实施例提供的方法可以应用于第一参与方。所述第一参与方可以通过多轮迭代过程对所述图数据进行处理。如图3所示,在任意一轮迭代过程中,可以遍历所述第一子图中的节点,对于遍历到的每一节点,执行如下操作:FIG. 3 is a schematic flowchart of a method for processing graph data provided by an embodiment of the present application. The graph data includes a first subgraph comprising nodes belonging to a first party and a second subgraph comprising nodes belonging to a second party. The method provided in this embodiment may be applied to the first participant. The first participant may process the graph data through multiple rounds of iterative process. As shown in Figure 3, in any round of iteration, the nodes in the first subgraph can be traversed, and for each node traversed, the following operations are performed:
步骤301、查找当前遍历到的节点在所述第一子图中的邻居节点,根据所述邻居节点的上一轮迭代过程的特征向量进行聚合操作,得到第一聚合结果。Step 301 : Find the neighbor nodes of the currently traversed node in the first subgraph, perform an aggregation operation according to the eigenvectors of the last iteration process of the neighbor nodes, and obtain a first aggregation result.
以当前迭代过程为第k轮迭代过程为例,对于当前遍历到的节点,可以查找该当前节点在第一子图中的邻居节点,其中,邻居节点可以是与当前节点具有直接连接关系的节点,根据邻居节点在第k-1轮的特征向量,可以确定当前节点的第k轮的第一聚合结果。Taking the current iteration process as the kth iteration process as an example, for the currently traversed node, you can find the neighbor nodes of the current node in the first subgraph, where the neighbor nodes can be nodes that have a direct connection relationship with the current node , according to the eigenvectors of neighbor nodes in the k-1th round, the first aggregation result of the k-th round of the current node can be determined.
在k=1时,上一轮迭代过程的特征向量可以为初始特征向量。可选的,可以在进行迭代过程之前,先对图数据中的各个节点进行初始化操作,确定每个节点的初始特征向量。When k=1, the eigenvector of the previous round of iterative process can be the initial eigenvector. Optionally, before performing the iterative process, each node in the graph data may be initialized to determine the initial feature vector of each node.
本申请实施例中,节点的特征向量可以是能够表征节点特征的任意信息。可选的,在得到图数据后,可以根据图数据中的每一节点的属性信息,为所述节点进行赋值,得到对应的初始特征向量。In the embodiment of the present application, the feature vector of a node may be any information that can characterize the feature of the node. Optionally, after the graph data is obtained, the node may be assigned a value according to the attribute information of each node in the graph data to obtain a corresponding initial feature vector.
在第1次迭代过程中,第一参与方可以遍历第一子图中的每一节点,根据当前节点的邻居节点对应的初始特征向量,进行聚合操作,得到当前节点的第一聚合结果。In the first iteration process, the first participant can traverse each node in the first subgraph, and perform an aggregation operation according to the initial feature vectors corresponding to the neighbor nodes of the current node to obtain the first aggregation result of the current node.
步骤302、若所述节点与所述第二子图中的节点具有连接关系,则根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果。Step 302: If the node has a connection relationship with the nodes in the second sub-graph, determine the final aggregation result of the node according to the first aggregation result and the second aggregation result.
其中,所述第二聚合结果是所述第二参与方根据所述节点在所述第二子图中的邻居节点的上一轮迭代过程的特征向量确定的。Wherein, the second aggregation result is determined by the second participant according to the eigenvectors of the node's neighbor nodes in the second subgraph in the last round of iterative process.
可选的,判断任意两个节点是否具有连接关系,可以通过如下方式来实现:判断所述两个节点是否存在预设关联关系,其中,所述预设关联关系可以根据实际需要来设置,有预设关联关系的两个节点在图数据中可以认为是具有连接关系的。所述连接关系可以用于确定邻居节点。Optionally, judging whether any two nodes have a connection relationship can be achieved in the following manner: judging whether the two nodes have a preset association relationship, wherein the preset association relationship can be set according to actual needs, there are Two nodes with a preset association relationship can be regarded as having a connection relationship in graph data. The connection relationship can be used to determine neighbor nodes.
例如,在风控场景中,所述预设关联关系可以为转账关系。具体的,图数据中的节点可以用于表示用户账号,节点间的连接关系可以用于表示用户账号之间的转账关系;若第一子图中的某节点与第二子图中的某节点之间有转账记录,则可以认为两者具有连接关系。又例如,在社交行为分析场景中,图数据中的节点可以用于表示用户,所述预设关联关系可以为家人关系、雇佣关系等。For example, in a risk control scenario, the preset association relationship may be a transfer relationship. Specifically, the nodes in the graph data can be used to represent user accounts, and the connection relationship between nodes can be used to represent the transfer relationship between user accounts; if a certain node in the first sub-graph and a certain node in the second sub-graph If there is a transfer record between them, it can be considered that the two have a connection relationship. For another example, in a social behavior analysis scenario, nodes in graph data may be used to represent users, and the preset association relationship may be family relationship, employment relationship, and the like.
在第1次迭代过程中,对于第一参与方遍历到的当前节点,若其在第二子图中有邻居节点,则第二参与方可以根据当前节点在第二子图中的邻居节点对应的初始特征向量,进行聚合操作,得到当前节点的第二聚合结果。During the first iteration, if the current node traversed by the first participant has neighbor nodes in the second subgraph, the second participant can correspond to the current node according to the neighbor nodes in the second subgraph. The initial eigenvector of , perform an aggregation operation, and obtain the second aggregation result of the current node.
第一参与方可以根据当前节点的第1轮迭代过程的第一聚合结果和第二聚合结果,确定所述当前节点的第1轮迭代过程的最终聚合结果。The first participant may determine the final aggregation result of the first round of iteration process of the current node according to the first aggregation result and the second aggregation result of the first round of iteration process of the current node.
可选的,所述最终聚合结果可以为第一聚合结果与第二聚合结果之和。Optionally, the final aggregation result may be the sum of the first aggregation result and the second aggregation result.
或者,在根据第一聚合结果和第二聚合结果计算最终聚合结果时,可以通过非线性算法,例如通过log函数、指数函数、取最大值、取最小值等,确定最终聚合结果,从而使得结果具有更高的非线性,可以拟合更为复杂的情况。Or, when calculating the final aggregation result according to the first aggregation result and the second aggregation result, the final aggregation result can be determined through a nonlinear algorithm, such as a log function, an exponential function, taking a maximum value, taking a minimum value, etc., so that the result With higher nonlinearity, it can fit more complex situations.
步骤303、根据所述最终聚合结果,确定所述节点的本轮迭代过程的特征向量。Step 303 , according to the final aggregation result, determine the feature vector of the current round of iteration process of the node.
例如,在确定当前节点第1轮迭代过程的最终聚合结果后,可以根据最终聚合结果,确定当前节点的第1轮迭代过程的特征向量。在第一子图的所有节点遍历完成后,得到所有节点的第1轮迭代过程的特征向量。For example, after the final aggregation result of the first round of iteration process of the current node is determined, the feature vector of the first round of iteration process of the current node may be determined according to the final aggregation result. After traversing all nodes of the first subgraph, the feature vectors of the first round of iterative process of all nodes are obtained.
在第2轮迭代过程中,根据第1轮迭代过程的特征向量,重复执行上述步骤,得到第2轮迭代过程的特征向量,以此类推,直至得到最后一轮迭代过程的特征向量。在迭代次数满足要求后,最后一轮迭代过程的特征向量可以用于计算节点对应的预测结果。In the second round of iteration, according to the feature vector of the first round of iteration, repeat the above steps to obtain the feature vector of the second round of iteration, and so on, until the feature vector of the last round of iteration is obtained. After the number of iterations meets the requirements, the eigenvectors of the last round of iterative process can be used to calculate the prediction results corresponding to the nodes.
可选的,对于任一节点来说,可以将该节点最后一轮迭代过程的特征向量输入到预测器,或者,输入到Sigmoid函数,得到对应的预测结果。Optionally, for any node, the feature vector of the last round of iterative process of the node can be input to the predictor, or input to the Sigmoid function to obtain the corresponding prediction result.
在实际应用中,第一参与方和第二参与方可以分别构建图数据中的第一子图和第二子图,第一子图中的节点和第二子图中的节点可以有连接关系。第一参与方可以对第一子图中的节点进行处理,计算该节点在第一子图中的邻居节点的聚合结果,同时从第二参与方获取该节点在第二子图中的邻居节点的聚合结果,并根据各子图对应的聚合结果确定最终聚合结果。In practical applications, the first participant and the second participant can respectively construct the first subgraph and the second subgraph in the graph data, and the nodes in the first subgraph and the nodes in the second subgraph can have a connection relationship . The first participant can process the node in the first subgraph, calculate the aggregation result of the neighbor nodes of the node in the first subgraph, and obtain the neighbor nodes of the node in the second subgraph from the second participant , and determine the final aggregation result according to the aggregation results corresponding to each subgraph.
同理,第二参与方也可以采用类似的方法对第二子图进行处理,遍历第二子图中的节点,计算该节点在第二子图中的邻居节点的聚合结果,同时从第一参与方获取该节点在第一子图中的邻居节点的聚合结果,并根据各子图对应的聚合结果确定最终聚合结果。通过第一参与方和第二参与方的上述操作,可以得到图数据中所有节点对应的结果,在节点的原始数据不出本地的情况下完成对图数据的处理。Similarly, the second participant can also use a similar method to process the second subgraph, traverse the nodes in the second subgraph, calculate the aggregation result of the node's neighbor nodes in the second subgraph, and at the same time from the first The participant obtains the aggregation result of the neighbor nodes of the node in the first subgraph, and determines the final aggregation result according to the aggregation results corresponding to each subgraph. Through the above-mentioned operations of the first participant and the second participant, the results corresponding to all the nodes in the graph data can be obtained, and the processing of the graph data can be completed without the original data of the nodes leaving the local area.
可选的,本申请实施例中第二参与方的数量可以为多个,相应的,第二子图的数量也可以为多个,每一第二参与方均可以根据第一参与方当前遍历到的节点在对应的第二子图中的邻居节点,计算对应的第二聚合结果并发送给第一参与方,第一参与方根据第一聚合结果以及多个第二聚合结果进行处理,得到最终聚合结果。Optionally, the number of second participants in the embodiment of the present application may be multiple, and correspondingly, the number of second subgraphs may also be multiple, and each second participant may The neighbor node of the arrived node in the corresponding second subgraph calculates the corresponding second aggregation result and sends it to the first participant, and the first participant processes it according to the first aggregation result and multiple second aggregation results to obtain The final aggregation result.
本实施例提供的图数据的处理方法,可以对包含第一子图和第二子图的图数据进行处理,所述第一子图包括属于第一参与方的节点,所述第二子图包括属于第二参与方的节点,在任意一轮迭代过程中,第一参与方可以遍历所述第一子图中的节点,对于遍历到的每一节点,查找所述节点在所述第一子图中的邻居节点,根据所述邻居节点的上一轮迭代过程的特征向量进行聚合操作,得到第一聚合结果,若所述节点与所述第二子图中的节点具有连接关系,则根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果,其中,所述第二聚合结果是所述第二参与方根据所述节点在所述第二子图中的邻居节点的上一轮迭代过程的特征向量确定的,根据所述最终聚合结果,确定所述节点的本轮迭代过程的特征向量,在迭代次数满足要求后,最后一轮迭代过程的特征向量用于计算节点对应的预测结果,由于在对节点的分析过程中,融合了第一参与方对第一子图的邻居节点的聚合结果以及第二参与方对第二子图的邻居节点的聚合结果,从而可以更加全面、准确地提取节点的特征,能够在数据互通存在壁垒的情况下,综合利用各方数据,共同实现图数据的处理,有效提高针对图数据的预测准确性。The graph data processing method provided in this embodiment can process graph data including a first subgraph and a second subgraph, the first subgraph includes nodes belonging to the first participant, and the second subgraph Including nodes belonging to the second participant, in any iteration process, the first participant can traverse the nodes in the first subgraph, and for each node traversed, search for the nodes in the first subgraph The neighbor nodes in the sub-graph are aggregated according to the eigenvectors of the last iteration process of the neighbor nodes to obtain the first aggregation result. If the node has a connection relationship with the nodes in the second sub-graph, then Determine the final aggregated result of the node according to the first aggregated result and the second aggregated result, wherein the second aggregated result is based on the neighbors of the node in the second subgraph by the second participant The eigenvector of the previous round of iteration process of the node is determined, and according to the final aggregation result, the eigenvector of the current round of iteration process of the node is determined, and after the number of iterations meets the requirements, the eigenvector of the last round of iteration process is used for Calculate the prediction results corresponding to the nodes, because in the process of analyzing the nodes, the aggregation results of the first participant on the neighbor nodes of the first subgraph and the aggregation results of the second participant on the neighbor nodes of the second subgraph are integrated, In this way, the characteristics of nodes can be extracted more comprehensively and accurately, and in the case of barriers to data interoperability, data from all parties can be comprehensively utilized to jointly realize the processing of graph data and effectively improve the prediction accuracy of graph data.
在上述实施例提供的技术方案的基础上,可选的是,可以采用SecureAggregate(安全聚合)函数,对第一聚合结果和第二聚合结果进行安全聚合操作,得到最终聚合结果。下面对SecureAggregate函数的实现原理进行描述。On the basis of the technical solutions provided in the foregoing embodiments, optionally, a SecureAggregate (secure aggregation) function may be used to perform a secure aggregation operation on the first aggregation result and the second aggregation result to obtain a final aggregation result. The implementation principle of the SecureAggregate function is described below.
具体的,若所述节点与所述第二子图中的节点具有连接关系,则根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果,可以包括:若所述节点与所述第二子图中的节点具有连接关系,则向所述第二参与方发送请求信息,所述请求信息用于请求所述第二参与方计算所述节点对应的第二聚合结果并对所述第二聚合结果进行加密;接收所述第二参与方发送的加密后的第二聚合结果;根据所述加密后的第二聚合结果,确定所述节点的最终聚合结果。Specifically, if the node has a connection relationship with the nodes in the second subgraph, determining the final aggregation result of the node according to the first aggregation result and the second aggregation result may include: if the node has a connection relationship with a node in the second subgraph, then send request information to the second participant, where the request information is used to request the second participant to calculate the second aggregation result corresponding to the node and Encrypting the second aggregation result; receiving the encrypted second aggregation result sent by the second participant; and determining the final aggregation result of the node according to the encrypted second aggregation result.
其中,在第一子图中遍历到与第二子图中的节点具有连接关系的节点时,才向第二子图发送请求信息以请求第二参与方对其进行处理,能够节约不必要的计算量,有效提高图数据处理的效率。另外,第一参与方与第二参与方之间可以相互传输加密后的聚合结果,有效提高数据传输的安全性。Wherein, when a node having a connection relationship with a node in the second subgraph is traversed in the first subgraph, the request information is sent to the second subgraph to request the second participant to process it, which can save unnecessary The amount of calculation can effectively improve the efficiency of graph data processing. In addition, the encrypted aggregation result can be transmitted between the first participant and the second participant, which effectively improves the security of data transmission.
在一种可选的实现方式中,第一参与方在获取到第二参与方发送的加密后的第二聚合结果后,可以根据加密后的第二聚合结果进行处理,得到最终的聚合结果。In an optional implementation manner, after obtaining the encrypted second aggregation result sent by the second participant, the first participant may perform processing according to the encrypted second aggregation result to obtain a final aggregation result.
在另一种可选的实现方式中,还可以引入第三参与方对聚合结果进行处理。In another optional implementation manner, a third party may also be introduced to process the aggregation result.
可选的,所述加密后的第二聚合结果为利用公钥进行加密后的第二聚合结果。根据所述加密后的第二聚合结果,确定所述节点的最终聚合结果,可以包括:利用所述公钥对所述第一聚合结果进行加密;基于随机掩码,对加密后的第一聚合结果与所述加密后的第二聚合结果进行计算,得到加密后的最终聚合结果;将所述加密后的最终聚合结果发送给第三参与方,以使所述第三参与方利用私钥对所述加密后的最终聚合结果进行解密;接收第三参与方发送的解密结果,并对所述解密结果进行去随机掩码操作,得到最终聚合结果。Optionally, the encrypted second aggregation result is a second aggregation result encrypted with a public key. According to the encrypted second aggregation result, determining the final aggregation result of the node may include: using the public key to encrypt the first aggregation result; based on the random mask, encrypting the encrypted first aggregation result The result is calculated with the encrypted second aggregated result to obtain the encrypted final aggregated result; the encrypted final aggregated result is sent to the third participant, so that the third participant uses the private key to pair The encrypted final aggregated result is decrypted; the decrypted result sent by the third participant is received, and a random masking operation is performed on the decrypted result to obtain the final aggregated result.
图4为本申请实施例提供的一种图数据处理的系统架构图。如图4所示,第一参与方对第一子图进行处理,第二参与方对第二子图进行处理,并引入第三参与方作为协作方。第三参与方持有密钥,密钥可以包括公钥和私钥。可选的,可以采用同态加密的方式对计算得到的结果进行加密,加密使用的公钥由第三参与方发送给第一参与方和第二参与方,私钥由第三参与方单独持有。FIG. 4 is a system architecture diagram of graph data processing provided by an embodiment of the present application. As shown in Fig. 4, the first participant processes the first subgraph, the second participant processes the second subgraph, and introduces the third participant as a collaborator. The third party holds the keys, which may include public keys and private keys. Optionally, the calculated results can be encrypted using homomorphic encryption. The public key used for encryption is sent by the third party to the first party and the second party, and the private key is held by the third party alone. Have.
在对第一子图中的节点进行处理时,第一参与方利用公钥将第一聚合结果进行加密,第二参与方利用公钥将第二聚合结果进行加密,第二参与方将加密后的第二聚合结果发送给第一参与方,第一参与方计算两者之和并加入随机掩码后发送给第三参与方。第三参与方从第一参与方接收到数据后,利用自身的私钥对其进行解密,并发送给第一参与方进行去随机掩码,得到节点的最终聚合结果。When processing the nodes in the first subgraph, the first participant uses the public key to encrypt the first aggregation result, the second participant uses the public key to encrypt the second aggregation result, and the second participant encrypts the encrypted The second aggregation result of is sent to the first participant, and the first participant calculates the sum of the two and sends it to the third participant after adding a random mask. After the third participant receives the data from the first participant, it decrypts it with its own private key, and sends it to the first participant for de-random masking to obtain the final aggregation result of the node.
类似的,在对第二子图中的节点进行处理时,第一参与方利用公钥将第一聚合结果进行加密,第二参与方利用公钥将第二聚合结果进行加密,第一参与方将加密后的第一聚合结果发送给第二参与方,第二参与方计算两者之和并加入随机掩码后发送给第三参与方。第三参与方从第二参与方接收到数据后,利用自身的私钥对其进行解密,并将解密后的结果发送给第二参与方,第二参与方对结果进行去随机掩码后得到节点的最终聚合结果。Similarly, when processing the nodes in the second subgraph, the first participant uses the public key to encrypt the first aggregation result, the second participant uses the public key to encrypt the second aggregation result, and the first participant The encrypted first aggregation result is sent to the second participant, and the second participant calculates the sum of the two and sends it to the third participant after adding a random mask. After the third participant receives the data from the second participant, it decrypts it with its own private key, and sends the decrypted result to the second participant, and the second participant removes the random mask from the result and obtains The final aggregation result of the node.
通过引入独立的第三参与方辅助进行聚合结果的加解密操作,能够有效提高对图数据进行处理的安全性,减少数据泄露风险。By introducing an independent third party to assist in the encryption and decryption of aggregation results, it can effectively improve the security of graph data processing and reduce the risk of data leakage.
图5为本申请实施例提供的另一种图数据的处理方法的流程示意图。本实施例给出了第一参与方和第二参与方联合进行图数据处理的具体实现方案。如图5所示,所述方法可以包括:FIG. 5 is a schematic flowchart of another graph data processing method provided by the embodiment of the present application. This embodiment provides a specific implementation scheme for joint processing of graph data by the first participant and the second participant. As shown in Figure 5, the method may include:
步骤501、根据第一参与方和第二参与方的用户账号及转账记录,构建图数据。Step 501. Construct graph data according to the user accounts and transfer records of the first participant and the second participant.
其中,图数据包括第一子图和第二子图,分别由第一参与方和第二参与方进行构建并对节点进行初始化。Wherein, the graph data includes a first subgraph and a second subgraph, which are respectively constructed by the first participant and the second participant and initialize the nodes.
可选的,第一参与方可以通过如下方法进行第一子图的构建和初始化:根据属于所述第一参与方的用户账号,构建所述第一子图中的节点;根据所述第一参与方的用户账号的转账记录,构建第一子图中的节点的连接关系,所述连接关系用于确定邻居节点。Optionally, the first participant may construct and initialize the first subgraph through the following methods: construct nodes in the first subgraph according to user accounts belonging to the first participant; The transfer record of the user account of the participant constructs the connection relationship of the nodes in the first sub-graph, and the connection relationship is used to determine the neighbor nodes.
举例来说,第一参与方可以为银行,用户账号可以为银行卡号,第一子图中每一节点代表一个银行卡号,银行卡号之间的转账关系可以用于形成节点间的连接关系,例如,银行卡号A和银行卡号B之间有转账记录,则银行卡号A对应的节点和银行卡号B对应的节点之间可以具有连接线。在后续处理时,直接相连的两个节点可以互为邻居节点。For example, the first participant can be a bank, the user account can be a bank card number, each node in the first sub-graph represents a bank card number, and the transfer relationship between bank card numbers can be used to form a connection relationship between nodes, for example , there is a transfer record between bank card number A and bank card number B, then there may be a connection line between the node corresponding to bank card number A and the node corresponding to bank card number B. In subsequent processing, two directly connected nodes can be neighbor nodes.
第二参与方也可以使用类似的方法进行第二子图的构建。The second participant can also use a similar method to construct the second subgraph.
需要说明的是,由于第一参与方的银行卡号可能会与第二参与方的银行卡号有转账关系,所以第一子图中可能会存在至少部分节点与第二子图中的节点具有连接关系,第一参与方和第二参与方存储有相应的转账记录,因此可以知道自身具有的节点与其它参与方的节点的连接关系,这些连接关系可以在后续步骤中用于进行聚合操作。It should be noted that since the bank card number of the first participant may have a transfer relationship with the bank card number of the second participant, there may be at least some nodes in the first subgraph that are connected to nodes in the second subgraph , the first participant and the second participant store corresponding transfer records, so they can know the connection relationship between their own nodes and nodes of other participants, and these connection relationships can be used for aggregation operations in subsequent steps.
通过上述方法,可以实现基于用户账号及转账关系构建图数据,从而可以基于图数据实现对用户账号的分析,提高了对用户账号监控效率。Through the above method, graph data can be constructed based on user accounts and transfer relationships, so that user accounts can be analyzed based on the graph data, and the efficiency of monitoring user accounts can be improved.
步骤502、对图数据中的所有节点进行初始化,得到节点的初始特征向量。Step 502: Initialize all nodes in the graph data to obtain initial feature vectors of the nodes.
其中,对于任一节点,可以根据其对应的属性信息进行初始化操作。具体的,第一参与方可以根据所述第一参与方的用户账号的属性信息,确定所述第一子图中节点对应的初始特征向量;类似的,第二参与方可以根据所述第二参与方的用户账号的属性信息,确定所述第二子图中节点对应的初始特征向量。其中,第一轮迭代过程中使用的上一轮迭代过程的特征向量为所述初始特征向量。Wherein, for any node, an initialization operation may be performed according to its corresponding attribute information. Specifically, the first participant may determine the initial feature vector corresponding to the node in the first sub-graph according to the attribute information of the user account of the first participant; similarly, the second participant may determine the initial feature vector according to the second The attribute information of the user account of the participant determines the initial feature vector corresponding to the node in the second subgraph. Wherein, the eigenvector of the last iterative process used in the first iterative process is the initial eigenvector.
所述属性信息可以包括用于表征用户账号的属性的任意信息,例如可以包括但不限于:用户的地域、年龄、性别、学历、职业、收入、开卡时间、卡内余额等。The attribute information may include any information used to characterize the attributes of the user account, such as but not limited to: the user's region, age, gender, education, occupation, income, card opening time, card balance, etc.
可选的,所述初始化操作可以是指根据属性信息进行赋值操作。一个简单的示例是,年龄在21到30之间,则赋值为000,年龄在31-40之间,则赋值为001。在将属性信息的每一项都完成赋值后,得到对应的初始特征向量。Optionally, the initialization operation may refer to performing an assignment operation according to attribute information. A simple example would be 000 for ages between 21 and 30 and 001 for ages 31-40. After the assignment of each item of attribute information is completed, the corresponding initial feature vector is obtained.
通过用户账号的属性信息构建初始特征向量,可以快速、有效地梳理出用户的特征并应用于后续的迭代过程,从而基于用户账号的属性信息以及转账关系全面地对其进行分析处理,提高对业务风险信息的预测效果。Constructing the initial feature vector through the attribute information of the user account can quickly and effectively sort out the user's characteristics and apply it to the subsequent iterative process, so as to comprehensively analyze and process it based on the attribute information of the user account and the transfer relationship, and improve the business. The predictive effect of risk information.
步骤503、设置迭代次数k=1。Step 503, set the number of iterations k=1.
步骤504、在第k次迭代过程中,遍历图数据中的节点,对于每一节点,第一参与方计算所述节点在第一子图中的第一聚合结果,第二参与方计算所述节点在第二子图中的第二聚合结果,第一参与方或第二参与方根据所述第一聚合结果和第二聚合结果,确定节点的第k轮的特征向量。Step 504, in the kth iteration process, traverse the nodes in the graph data, for each node, the first participant calculates the first aggregation result of the node in the first sub-graph, and the second participant calculates the For the second aggregation result of the node in the second subgraph, the first participant or the second participant determines the feature vector of the k-th round of the node according to the first aggregation result and the second aggregation result.
具体的,可以遍历图数据中的每一个节点,对于遍历到的每一个节点,均可以执行如下步骤a至步骤d。Specifically, each node in the graph data may be traversed, and for each traversed node, the following steps a to d may be performed.
步骤a、通过Aggregate(聚合)函数对当前遍历到的节点在第一子图内的邻居节点进行聚合,得到第一聚合结果。Step a. Aggregate the neighbor nodes of the currently traversed node in the first subgraph through an Aggregate (aggregation) function to obtain a first aggregation result.
具体的,步骤a可以由第一参与方来执行。第一参与方可以查找所述节点在所述第一子图中的邻居节点,根据所述邻居节点的上一轮迭代过程的特征向量进行聚合操作,得到第一聚合结果。Specifically, step a may be performed by the first participant. The first participant may search for the neighbor nodes of the node in the first subgraph, perform an aggregation operation according to the feature vectors of the last iteration of the neighbor nodes, and obtain a first aggregation result.
在k=1时,上一轮迭代过程的特征向量为初始特征向量,即,可以根据当前节点的邻居节点的初始特征向量,计算当前节点的第1轮的第一聚合结果。When k=1, the eigenvector of the last iteration process is the initial eigenvector, that is, the first aggregation result of the first round of the current node can be calculated based on the initial eigenvectors of the neighbor nodes of the current node.
可选的,查找所述节点在所述第一子图中的邻居节点,根据所述邻居节点的上一轮迭代过程的特征向量进行聚合操作,可以包括:查找所述节点在所述第一子图中的所有邻居节点;基于有放回的抽样操作,从查找到的邻居节点中选择预设数量的邻居节点;根据所选择的邻居节点的上一轮迭代过程的特征向量,计算所述第一聚合结果。Optionally, searching for the neighbor nodes of the node in the first subgraph, and performing an aggregation operation according to the feature vectors of the last iteration process of the neighbor nodes may include: searching for the nodes in the first subgraph All neighbor nodes in the subgraph; based on the sampling operation with replacement, select a preset number of neighbor nodes from the found neighbor nodes; calculate the The first aggregation result.
其中,有放回的抽样操作,是指从所有邻居节点的集合中选出任一邻居节点后,将选中的邻居节点再放回集合,继续进行抽样,直至抽样次数满足要求,也就是说,任一邻居节点都有可能被抽中一次或多次。Among them, the sampling operation with replacement means that after selecting any neighbor node from the set of all neighbor nodes, put the selected neighbor node back into the set, and continue sampling until the sampling times meet the requirements, that is, any Neighbor nodes may be drawn one or more times.
举例来说,节点v的邻居节点包括节点a、节点b和节点c,所述预设数量为3,则经过有放回的抽样操作后,最终选择的邻居节点可能为节点a、节点b、节点a,即,节点a被选中两次,基于选中的三个邻居节点(其中有两个是相同的),可以计算节点v对应的第一聚合结果。For example, the neighbor nodes of node v include node a, node b and node c, the preset number is 3, then after the sampling operation with replacement, the final selected neighbor nodes may be node a, node b, Node a, that is, node a is selected twice, based on the selected three neighbor nodes (two of which are the same), the first aggregation result corresponding to node v can be calculated.
其中,第一聚合结果可以通过Aggregate函数来计算。Aggregate函数可以根据实际需要来设计,例如,可以为均值函数,即,将选择的邻居节点的上一轮迭代过程的特征向量求平均,作为节点的第一聚合结果。Wherein, the first aggregation result may be calculated through an Aggregate function. The Aggregate function can be designed according to actual needs, for example, it can be a mean function, that is, the feature vectors of the last iteration process of the selected neighbor nodes are averaged, and used as the first aggregation result of the node.
通过在所有邻居节点中进行有放回地抽样操作,可以快速找出预设数量的邻居节点并进行聚合操作,提高了聚合操作的效率,实现了规范化的聚合操作。By performing a sampling operation with replacement in all neighbor nodes, a preset number of neighbor nodes can be quickly found and aggregated, which improves the efficiency of the aggregated operation and realizes a standardized aggregated operation.
进一步的,当需要针对第二子图中的节点进行聚合操作时,第二参与方可以向第一参与方发送请求信息,第一参与方可以根据该请求信息,确定第二子图中的节点在第一子图中的邻居节点,并通过上述方法确定该节点的聚合结果发送给第二参与方,由于采用了有放回地抽样操作,第二参与方不能了解第一子图中每一邻居节点对应的特征向量,从而可以减少第一参与方的数据泄露风险,有效提高了图数据处理的安全性。Further, when an aggregation operation needs to be performed on nodes in the second subgraph, the second participant can send request information to the first participant, and the first participant can determine the nodes in the second subgraph according to the request information The neighbor node in the first subgraph, and the aggregation result of the node determined by the above method is sent to the second participant. Due to the sampling operation with replacement, the second participant cannot understand each node in the first subgraph. The feature vectors corresponding to the neighbor nodes can reduce the risk of data leakage of the first participant and effectively improve the security of graph data processing.
步骤b、通过Aggregate函数对所述节点在第二子图内的邻居节点进行聚合,得到第二聚合结果。Step b. Aggregate the neighbor nodes of the node in the second subgraph through an Aggregate function to obtain a second aggregation result.
具体的,步骤b可以由第二参与方来执行。具体的实现原理和过程可以参见步骤a。Specifically, step b may be performed by the second participant. For the specific implementation principle and process, please refer to step a.
可选的,若某一节点在第一子图或第二子图内没有邻居节点,则相应的参与方可以不计算其对应的聚合结果,或者认为其在第一子图或第二子图中的聚合结果为0。Optionally, if a node has no neighbor nodes in the first subgraph or the second subgraph, the corresponding participant may not calculate its corresponding aggregation result, or consider it to be in the first subgraph or the second subgraph The aggregation result in is 0.
步骤c、对所述节点的第一聚合结果和第二聚合结果进行安全聚合,得到本轮的最终聚合结果。Step c. Perform secure aggregation on the first aggregation result and the second aggregation result of the nodes to obtain the final aggregation result of the current round.
具体的,可以采用SecureAggregate函数进行安全聚合操作,具体的实现方案可以参见前述实施例,此处不再赘述。Specifically, the SecureAggregate function may be used to perform a secure aggregation operation, and the specific implementation scheme may refer to the foregoing embodiments, which will not be repeated here.
本实施例中,通过将Aggregate函数分布在不同的参与方,以及通过SecureAggregate函数聚合各方结果,达到综合利用各方数据实现图数据处理的功能。In this embodiment, by distributing the Aggregate function to different participants and aggregating the results of all parties through the SecureAggregate function, the function of comprehensively utilizing data from all parties to realize graph data processing is achieved.
步骤d、根据所述节点的本轮的最终聚合结果确定所述节点的本轮的特征向量。Step d. Determine the feature vector of the current round of the node according to the final aggregation result of the current round of the node.
可选的,对于每一节点来说,可以根据所述节点的本轮的最终聚合结果与所述节点的上一轮的特征向量,确定所述节点的本轮的特征向量。Optionally, for each node, the feature vector of the current round of the node may be determined according to the final aggregation result of the current round of the node and the feature vector of the previous round of the node.
可以理解的是,在第k轮迭代过程中,本轮可以是指第k轮,上一轮可以是指第k-1轮,第1轮的上一轮可以是指初始化阶段。It can be understood that, in the iteration process of the kth round, the current round may refer to the kth round, the previous round may refer to the k-1th round, and the previous round of the first round may refer to the initialization phase.
具体的,在第k轮迭代过程中,对于每一节点,可以将所述节点的第k轮的最终聚合结果与所述节点的第k-1轮的特征向量进行CONCAT操作,得到的结果与模型参数相乘,相乘的结果再通过σ函数添加非线性成分,得到最终的结果,根据所述最终的结果确定本轮的特征向量。Specifically, in the iterative process of the k-th round, for each node, the final aggregation result of the k-th round of the node and the feature vector of the k-1th round of the node can be CONCATed, and the obtained result is the same as The model parameters are multiplied, and the multiplied result is added with a nonlinear component through the σ function to obtain the final result, and the feature vector of the current round is determined according to the final result.
可选的,每轮的模型参数可以不同,在第k轮迭代过程中,可以使用第k轮对应的模型参数计算特征向量。Optionally, the model parameters of each round may be different, and during the iteration process of the kth round, the feature vector may be calculated using the model parameters corresponding to the kth round.
其中,所述模型参数可以是经过训练后得到的模型参数,具体可以是单个参与方利用训练样本集对模型进行训练后得到的,也可以多个参与方对模型进行联合训练后得到的,训练的模型可以参考GraphSAGE或其它图神经网络模型。Wherein, the model parameters may be model parameters obtained after training. Specifically, they may be obtained after a single participant uses a training sample set to train the model, or may be obtained after joint training of the model by multiple participants. The model can refer to GraphSAGE or other graph neural network models.
可选的,在得到所有节点的最终的结果后,可以对每一节点的最终的结果进行归一化或正则化操作,得到每一节点的本轮的特征向量,从而完成第k轮的迭代操作。Optionally, after obtaining the final results of all nodes, the final results of each node can be normalized or regularized to obtain the feature vector of the current round of each node, thereby completing the iteration of the kth round operate.
步骤505、判断k是否等于K。若否,则执行步骤506;若是,则执行步骤507。Step 505, judging whether k is equal to K. If not, execute step 506; if yes, execute step 507.
其中,K为需要迭代的次数,可以根据实际需要来设置。Wherein, K is the number of iterations required, which can be set according to actual needs.
步骤506、将k的值增加1之后,重新执行步骤504。Step 506, after increasing the value of k by 1, re-execute step 504.
步骤507、根据所述图数据中每一节点的最后一轮迭代过程的特征向量,确定对应的预测结果。Step 507 , according to the eigenvector of the last round of iterative process of each node in the graph data, determine the corresponding prediction result.
其中,步骤507中所使用的特征向量可以是最后一次迭代过程完成后得到的最终的特征向量,即,节点的第K轮的特征向量。对于任一节点来说,可以将该节点的第K轮的特征向量输入到预测器函数,或者输入到sigmoid函数,得到对应的预测结果。Wherein, the feature vector used in step 507 may be the final feature vector obtained after the last iterative process is completed, that is, the feature vector of the Kth round of the node. For any node, the feature vector of the K-th round of the node can be input to the predictor function, or input to the sigmoid function to obtain the corresponding prediction result.
在实际应用中,可以首先进行图数据的构建,构建完成后,对于任一节点v,根据其对应的属性信息
Figure 895777dest_path_image001
,对其进行初始化操作,
Figure 891415dest_path_image002
表示节点v初始化后的值。在第k轮迭代过程中,对于图数据中的每一个节点v,查找节点v的邻居节点u,根据邻居节点u的第k-1轮的特征向量
Figure 357032dest_path_image003
,计算得到节点v在第k轮迭代过程中的最终聚合结果,再根据最终聚合结果、第k-1轮的特征向量
Figure 412712dest_path_image004
以及模型参数等,得到节点v的第k轮迭代过程的特征向量
Figure 963779dest_path_image005
。这样,根据第k-1轮的特征向量可以确定第k轮的特征向量,第k轮的特征向量再用于计算第k+1轮的特征向量。在迭代次数达到预设次数K后,根据最终得到的特征向量
Figure 497529dest_path_image006
,确定预测结果。
In practical applications, the graph data can be constructed first. After the construction is completed, for any node v, according to its corresponding attribute information
Figure 895777dest_path_image001
, to initialize it,
Figure 891415dest_path_image002
Indicates the value of node v after initialization. In the k-th iteration process, for each node v in the graph data, find the neighbor node u of node v, according to the feature vector of the k-1th round of neighbor node u
Figure 357032dest_path_image003
, calculate the final aggregation result of node v in the k-th round iteration process, and then according to the final aggregation result, the feature vector of the k-1th round
Figure 412712dest_path_image004
And model parameters, etc., to get the eigenvector of the kth iteration process of node v
Figure 963779dest_path_image005
. In this way, the feature vector of round k can be determined according to the feature vector of round k-1, and the feature vector of round k is used to calculate the feature vector of round k+1. After the number of iterations reaches the preset number K, according to the final feature vector
Figure 497529dest_path_image006
, to determine the prediction result.
可选的,对于第一参与方来说,可以在迭代次数满足预设要求后,根据所述第一子图中任一节点在最后一轮迭代过程对应的特征向量,确定所述节点对应的用户账号的业务风险信息。类似的,第二参与方可以根据第二子图中的节点确定对应的用户账号的业务风险信息。Optionally, for the first participant, after the number of iterations meets the preset requirements, according to the eigenvector corresponding to any node in the first subgraph in the last iteration process, determine the corresponding Business risk information of the user account. Similarly, the second participant may determine the business risk information of the corresponding user account according to the nodes in the second subgraph.
在一种可选的实现方案中,确定用户账号的业务风险信息,可以具体为确定用户账号是否为异常账号,若根据特征向量确定用户账号属于异常账号,则说明该用户账号可能存在不合法的行为,需要上报处理。In an optional implementation solution, determining the business risk information of the user account may be specifically determining whether the user account is an abnormal account. If the user account is determined to be an abnormal account according to the feature vector, it indicates that the user account may have illegal Behavior needs to be reported for processing.
在另一种可选的实现方案中,确定用户账号的业务风险信息,可以具体为确定用户账号是否有逾期风险,若根据特征向量确定用户账号存在逾期风险,则需要严密监控,或者调整用户账号的信用等级。In another optional implementation scheme, determining the business risk information of the user account can specifically determine whether the user account has an overdue risk. If it is determined that the user account has an overdue risk based on the feature vector, it needs to be closely monitored, or the user account must be adjusted. credit rating.
最终预测的业务风险信息的类型可以通过训练过程来确定。例如,若训练过程使用账号是否异常作为标签,则最终得到的模型可以用于预测账号是否异常,若训练过程使用是否逾期作为标签,则最终得到的模型可以用于预测账号是否会出现逾期。The type of business risk information that is finally predicted can be determined through a training process. For example, if the training process uses whether the account is abnormal as a label, the final model can be used to predict whether the account is abnormal; if the training process uses whether the account is overdue as the label, the final model can be used to predict whether the account will be overdue.
本实施例提供的图数据的处理方法,可以根据用户账号构建所述图数据的节点,根据用户账号的转账记录构建图数据中的节点的连接关系,所述连接关系用于确定邻居节点,根据所述用户账号的属性信息构建所述图数据中节点对应的初始特征向量,最终在经过多次迭代过程后,得到的特征向量可以用于预测用户账号的业务风险信息,能够融合不同参与方的用户账号的信息,共同完成业务风险信息的预测,提高了对业务风险的预测的准确性,及时筛查出异常的用户账号,有效实现了对用户账号的监控。The graph data processing method provided in this embodiment can construct the nodes of the graph data according to the user account, and construct the connection relationship of the nodes in the graph data according to the transfer record of the user account, and the connection relationship is used to determine the neighbor nodes, according to The attribute information of the user account constructs the initial feature vector corresponding to the node in the graph data, and finally after multiple iterations, the obtained feature vector can be used to predict the business risk information of the user account, which can integrate the information of different participants. The information of user accounts together completes the prediction of business risk information, improves the accuracy of prediction of business risks, screens out abnormal user accounts in time, and effectively realizes the monitoring of user accounts.
图6为本申请实施例提供的一种图数据的处理装置的结构示意图。所述图数据包括第一子图和第二子图,所述第一子图包括属于第一参与方的节点,所述第二子图包括属于第二参与方的节点。所述装置可以应用于第一参与方。如图6所示,所述图数据的处理装置可以包括:FIG. 6 is a schematic structural diagram of an apparatus for processing graph data provided by an embodiment of the present application. The graph data includes a first subgraph comprising nodes belonging to a first party and a second subgraph comprising nodes belonging to a second party. The apparatus is applicable to a first party. As shown in Figure 6, the processing means for the graph data may include:
执行模块601,用于在任意一轮迭代过程中,遍历所述第一子图中的节点,对于遍历到的每一节点,执行如下操作;An execution module 601, configured to traverse the nodes in the first subgraph during any iteration, and perform the following operations for each traversed node;
查找模块602,用于查找所述节点在所述第一子图中的邻居节点,根据所述邻居节点的上一轮迭代过程的特征向量进行聚合操作,得到第一聚合结果;A search module 602, configured to search for neighbor nodes of the node in the first subgraph, perform an aggregation operation according to the feature vectors of the last round of iteration process of the neighbor nodes, and obtain a first aggregation result;
聚合模块603,用于在所述节点与所述第二子图中的节点具有连接关系时,根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果;其中,所述第二聚合结果是所述第二参与方根据所述节点在所述第二子图中的邻居节点的上一轮迭代过程的特征向量确定的;An aggregation module 603, configured to determine the final aggregation result of the node according to the first aggregation result and the second aggregation result when the node has a connection relationship with the nodes in the second sub-graph; wherein, the The second aggregation result is determined by the second participant according to the eigenvectors of the node's neighbor nodes in the second subgraph in the last round of iterative process;
确定模块604,用于根据所述最终聚合结果,确定所述节点的本轮迭代过程的特征向量;在迭代次数满足要求后,最后一轮迭代过程的特征向量用于计算节点对应的预测结果。The determination module 604 is configured to determine the eigenvector of the current round of iterative process of the node according to the final aggregation result; after the number of iterations meets the requirements, the eigenvector of the last round of iterative process is used to calculate the prediction result corresponding to the node.
其中,所述执行模块601可以在任意一轮迭代过程中,遍历所述第一子图中的节点,对于遍历到的每一节点,可以通过查找模块602、聚合模块603、确定模块604计算其对应的特征向量。Wherein, the execution module 601 can traverse the nodes in the first sub-graph in any iteration process, and for each node traversed, the search module 602, the aggregation module 603, and the determination module 604 can be used to calculate its The corresponding eigenvectors.
可选的,所述聚合模块603具体用于:Optionally, the aggregation module 603 is specifically configured to:
若所述节点与所述第二子图中的节点具有连接关系,则向所述第二参与方发送请求信息,所述请求信息用于请求所述第二参与方计算所述节点对应的第二聚合结果并对所述第二聚合结果进行加密;If the node has a connection relationship with a node in the second subgraph, send request information to the second participant, where the request information is used to request the second participant to calculate the first node corresponding to the node two aggregation results and encrypting said second aggregation result;
接收所述第二参与方发送的加密后的第二聚合结果;receiving the encrypted second aggregation result sent by the second participant;
根据所述加密后的第二聚合结果,确定所述节点的最终聚合结果。Determine the final aggregation result of the node according to the encrypted second aggregation result.
可选的,所述加密后的第二聚合结果为利用公钥进行加密后的第二聚合结果;所述聚合模块603在根据所述加密后的第二聚合结果,确定所述节点的最终聚合结果时,具体用于:Optionally, the encrypted second aggregation result is a second aggregation result encrypted with a public key; the aggregation module 603 determines the final aggregation result of the node according to the encrypted second aggregation result As a result, specifically for:
利用所述公钥对所述第一聚合结果进行加密;encrypting the first aggregation result using the public key;
基于随机掩码,对加密后的第一聚合结果与所述加密后的第二聚合结果进行计算,得到加密后的最终聚合结果;Based on the random mask, calculating the encrypted first aggregation result and the encrypted second aggregation result to obtain the encrypted final aggregation result;
将所述加密后的最终聚合结果发送给第三参与方,以使所述第三参与方利用私钥对所述加密后的最终聚合结果进行解密;sending the encrypted final aggregation result to a third party, so that the third party uses a private key to decrypt the encrypted final aggregation result;
接收第三参与方发送的解密结果,并对所述解密结果进行去随机掩码操作,得到最终聚合结果。Receive the decryption result sent by the third participant, and perform a random masking operation on the decryption result to obtain the final aggregation result.
可选的,所述查找模块602具体用于:Optionally, the search module 602 is specifically configured to:
查找所述节点在所述第一子图中的所有邻居节点;Find all neighbor nodes of the node in the first subgraph;
基于有放回的抽样操作,从查找到的邻居节点中选择预设数量的邻居节点;Based on the sampling operation with replacement, select a preset number of neighbor nodes from the found neighbor nodes;
根据所选择的邻居节点的上一轮迭代过程的特征向量,计算所述第一聚合结果。The first aggregation result is calculated according to the eigenvectors of the last round of iterative process of the selected neighbor nodes.
可选的,所述执行模块601还用于:Optionally, the executing module 601 is further configured to:
根据属于所述第一参与方的用户账号,构建所述第一子图中的节点;constructing nodes in the first subgraph according to user accounts belonging to the first participant;
根据所述第一参与方的用户账号的转账记录,构建第一子图中的节点的连接关系,所述连接关系用于确定邻居节点;According to the transfer record of the user account of the first participant, construct the connection relationship of the nodes in the first subgraph, and the connection relationship is used to determine the neighbor nodes;
相应的,在迭代次数满足预设要求后,所述执行模块601还用于:Correspondingly, after the number of iterations meets the preset requirement, the executing module 601 is further configured to:
根据所述第一子图中任一节点在最后一轮迭代过程对应的特征向量,确定所述节点对应的用户账号的业务风险信息。According to the feature vector corresponding to any node in the first subgraph in the last iteration process, determine the service risk information of the user account corresponding to the node.
可选的,所述执行模块601还用于:Optionally, the executing module 601 is further configured to:
根据所述第一参与方的用户账号的属性信息,确定所述第一子图中节点对应的初始特征向量;determining an initial feature vector corresponding to a node in the first subgraph according to the attribute information of the user account of the first participant;
其中,第一轮迭代过程中使用的上一轮迭代过程的特征向量为所述初始特征向量。Wherein, the eigenvector of the last iterative process used in the first iterative process is the initial eigenvector.
可选的,所述执行模块601在确定所述节点对应的用户账号的业务风险信息时,具体用于:Optionally, when the execution module 601 determines the business risk information of the user account corresponding to the node, it is specifically configured to:
确定所述用户账号是否为异常账号,若根据特征向量确定所述用户账号属于异常账号,则进行上报处理;或者,Determine whether the user account is an abnormal account, and if it is determined according to the feature vector that the user account belongs to an abnormal account, perform reporting processing; or,
确定所述用户账号是否有逾期风险,若根据特征向量确定所述用户账号存在逾期风险,则对所述用户账号进行监控,或者调整所述用户账号的信用等级。Determine whether the user account has an overdue risk, and if it is determined according to the feature vector that the user account has an overdue risk, monitor the user account, or adjust the credit level of the user account.
可选的,所述聚合模块603在根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果时,具体用于:Optionally, when the aggregation module 603 determines the final aggregation result of the node according to the first aggregation result and the second aggregation result, it is specifically used to:
根据第一聚合结果和第二聚合结果,通过非线性算法,确定最终聚合结果。A final aggregation result is determined through a nonlinear algorithm according to the first aggregation result and the second aggregation result.
可选的,所述图数据用于实现社交行为分析,图数据中的节点用于表示用户,预设关联关系包括家人关系、雇佣关系,所述预设关联关系用于确定邻居节点。Optionally, the graph data is used to implement social behavior analysis, the nodes in the graph data are used to represent users, and the preset association relationships include family relationships and employment relationships, and the preset association relationships are used to determine neighbor nodes.
前述任一实施例提供的图数据的处理装置,用于执行前述任一方法实施例的技术方案,其实现原理和技术效果类似,在此不再赘述。The image data processing device provided by any of the foregoing embodiments is used to implement the technical solution of any of the foregoing method embodiments, and its implementation principles and technical effects are similar, so details are not repeated here.
图7为本申请实施例提供的一种图数据的处理设备的结构示意图。如图7所示,所述设备可以包括:存储器701、处理器702及存储在所述存储器701上并可在所述处理器702上运行的图数据的处理程序,所述图数据的处理程序被所述处理器702执行时实现如前述任一实施例所述的图数据的处理方法的步骤。FIG. 7 is a schematic structural diagram of a graph data processing device provided by an embodiment of the present application. As shown in FIG. 7 , the device may include: a memory 701, a processor 702, and a processing program of graph data stored on the memory 701 and operable on the processor 702, the processing program of the graph data When executed by the processor 702, the steps of the method for processing graph data as described in any of the foregoing embodiments are implemented.
可选地,存储器701既可以是独立的,也可以跟处理器702集成在一起。Optionally, the memory 701 can be independent or integrated with the processor 702 .
本实施例提供的设备的实现原理和技术效果可以参见前述各实施例,此处不再赘述。For the implementation principles and technical effects of the device provided in this embodiment, reference may be made to the foregoing embodiments, and details are not repeated here.
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有图数据的处理程序,所述图数据的处理程序被处理器执行时实现如前述任一实施例所述的图数据的处理方法的步骤。The embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium stores a processing program for image data. The steps of the processing method of the graph data described above.
本申请实施例还提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现前述任一实施例所述的方法。An embodiment of the present application further provides a computer program product, including a computer program, and when the computer program is executed by a processor, the method described in any of the preceding embodiments is implemented.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be other division methods, for example, multiple modules can be combined or integrated. to another system, or some features may be ignored, or not implemented.
上述以软件功能模块的形式实现的集成的模块,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行本申请各个实施例所述方法的部分步骤。The above-mentioned integrated modules implemented in the form of software function modules may be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) or a processor execute some steps of the methods described in various embodiments of the present application.
应理解,上述处理器可以是中央处理单元(Central Processing Unit,简称CPU),还可以是其它通用处理器、数字信号处理器(Digital Signal Processor,简称DSP)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合发明所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。It should be understood that the above-mentioned processor may be a central processing unit (Central Processing Unit, referred to as CPU), and may also be other general-purpose processors, digital signal processors (Digital Signal Processor (DSP for short), Application Specific Integrated Circuit (ASIC for short), etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the method disclosed in conjunction with the invention can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
存储器可能包含高速RAM存储器,也可能还包括非易失性存储NVM,例如至少一个磁盘存储器,还可以为U盘、移动硬盘、只读存储器、磁盘或光盘等。The storage may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk storage, and may also be a U disk, a mobile hard disk, a read-only memory, a magnetic disk, or an optical disk.
上述存储介质可以是由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。存储介质可以是通用或专用计算机能够存取的任何可用介质。The above-mentioned storage medium can be realized by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable In addition to programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于专用集成电路(Application Specific Integrated Circuits,简称ASIC)中。当然,处理器和存储介质也可以作为分立组件存在于电子设备或主控设备中。An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be a component of the processor. Processors and storage media can be located in application-specific integrated circuits (Application Specific Integrated Circuits, referred to as ASIC). Of course, the processor and the storage medium can also exist in the electronic device or the main control device as discrete components.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on this understanding, the essence of the technical solution of this application or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products are stored in a storage medium (such as ROM/RAM, disk, CD-ROM), including several instructions to enable a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. All equivalent structures or equivalent process transformations made by using the description of the application and the accompanying drawings are directly or indirectly used in other related technical fields. , are all included in the patent protection scope of the present application in the same way.

Claims (20)

  1. 一种图数据的处理方法,其特征在于,所述图数据包括第一子图和第二子图,所述第一子图包括属于第一参与方的节点,所述第二子图包括属于第二参与方的节点;所述方法应用于第一参与方,所述方法包括:A method for processing graph data, wherein the graph data includes a first subgraph and a second subgraph, the first subgraph includes nodes belonging to a first participant, and the second subgraph includes nodes belonging to A node of a second party; the method applied to the first party, the method comprising:
    在任意一轮迭代过程中,遍历所述第一子图中的节点,对于遍历到的每一节点,执行如下操作:During any round of iteration, traverse the nodes in the first subgraph, and perform the following operations for each traversed node:
    查找所述节点在所述第一子图中的邻居节点,根据所述邻居节点的上一轮迭代过程的特征向量进行聚合操作,得到第一聚合结果;Finding the neighbor nodes of the node in the first subgraph, and performing an aggregation operation according to the feature vector of the last round of iteration process of the neighbor nodes, to obtain a first aggregation result;
    若所述节点与所述第二子图中的节点具有连接关系,则根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果;其中,所述第二聚合结果是所述第二参与方根据所述节点在所述第二子图中的邻居节点的上一轮迭代过程的特征向量确定的;If the node has a connection relationship with the nodes in the second subgraph, then determine the final aggregation result of the node according to the first aggregation result and the second aggregation result; wherein the second aggregation result is the The second participant is determined according to the eigenvectors of the last round of iterative process of the neighbor nodes of the node in the second subgraph;
    根据所述最终聚合结果,确定所述节点的本轮迭代过程的特征向量;在迭代次数满足要求后,最后一轮迭代过程的特征向量用于计算节点对应的预测结果。According to the final aggregation result, the eigenvector of the current round of iterative process of the node is determined; after the number of iterations meets the requirement, the eigenvector of the last round of iterative process is used to calculate the prediction result corresponding to the node.
  2. 根据权利要求1所述的方法,其特征在于,若所述节点与所述第二子图中的节点具有连接关系,则根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果,包括:The method according to claim 1, wherein if the node has a connection relationship with the nodes in the second sub-graph, then determine the final result of the node according to the first aggregation result and the second aggregation result Aggregated results, including:
    若所述节点与所述第二子图中的节点具有连接关系,则向所述第二参与方发送请求信息,所述请求信息用于请求所述第二参与方计算所述节点对应的第二聚合结果并对所述第二聚合结果进行加密;If the node has a connection relationship with a node in the second subgraph, send request information to the second participant, where the request information is used to request the second participant to calculate the first node corresponding to the node two aggregation results and encrypting said second aggregation result;
    接收所述第二参与方发送的加密后的第二聚合结果;receiving the encrypted second aggregation result sent by the second participant;
    根据所述加密后的第二聚合结果,确定所述节点的最终聚合结果。Determine the final aggregation result of the node according to the encrypted second aggregation result.
  3. 根据权利要求2所述的方法,其特征在于,所述加密后的第二聚合结果为利用公钥进行加密后的第二聚合结果;根据所述加密后的第二聚合结果,确定所述节点的最终聚合结果,包括:The method according to claim 2, wherein the encrypted second aggregation result is a second aggregation result encrypted with a public key; according to the encrypted second aggregation result, the node is determined The final aggregation results of , including:
    利用所述公钥对所述第一聚合结果进行加密;encrypting the first aggregation result using the public key;
    基于随机掩码,对加密后的第一聚合结果与所述加密后的第二聚合结果进行计算,得到加密后的最终聚合结果;Based on the random mask, calculating the encrypted first aggregation result and the encrypted second aggregation result to obtain the encrypted final aggregation result;
    将所述加密后的最终聚合结果发送给第三参与方,以使所述第三参与方利用私钥对所述加密后的最终聚合结果进行解密;sending the encrypted final aggregation result to a third party, so that the third party uses a private key to decrypt the encrypted final aggregation result;
    接收第三参与方发送的解密结果,并对所述解密结果进行去随机掩码操作,得到最终聚合结果。Receive the decryption result sent by the third participant, and perform a random masking operation on the decryption result to obtain the final aggregation result.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,查找所述节点在所述第一子图中的邻居节点,根据所述邻居节点的上一轮迭代过程的特征向量进行聚合操作,包括:The method according to any one of claims 1-3, characterized in that, searching for the neighbor nodes of the node in the first subgraph, and performing aggregation according to the feature vector of the last round of iterative process of the neighbor nodes operations, including:
    查找所述节点在所述第一子图中的所有邻居节点;Find all neighbor nodes of the node in the first subgraph;
    基于有放回的抽样操作,从查找到的邻居节点中选择预设数量的邻居节点;Based on the sampling operation with replacement, select a preset number of neighbor nodes from the found neighbor nodes;
    根据所选择的邻居节点的上一轮迭代过程的特征向量,计算所述第一聚合结果。The first aggregation result is calculated according to the eigenvectors of the last round of iterative process of the selected neighbor nodes.
  5. 根据权利要求1-4任一项所述的方法,其特征在于,还包括:The method according to any one of claims 1-4, further comprising:
    根据属于所述第一参与方的用户账号,构建所述第一子图中的节点;constructing nodes in the first subgraph according to user accounts belonging to the first participant;
    根据所述第一参与方的用户账号的转账记录,构建第一子图中的节点的连接关系,所述连接关系用于确定邻居节点;According to the transfer record of the user account of the first participant, construct the connection relationship of the nodes in the first subgraph, and the connection relationship is used to determine the neighbor nodes;
    相应的,在迭代次数满足预设要求后,所述方法还包括:Correspondingly, after the number of iterations meets the preset requirements, the method further includes:
    根据所述第一子图中任一节点在最后一轮迭代过程对应的特征向量,确定所述节点对应的用户账号的业务风险信息。According to the feature vector corresponding to any node in the first subgraph in the last iteration process, determine the service risk information of the user account corresponding to the node.
  6. 根据权利要求5所述的方法,其特征在于,还包括:The method according to claim 5, further comprising:
    根据所述第一参与方的用户账号的属性信息,确定所述第一子图中节点对应的初始特征向量;determining an initial feature vector corresponding to a node in the first subgraph according to the attribute information of the user account of the first participant;
    其中,第一轮迭代过程中使用的上一轮迭代过程的特征向量为所述初始特征向量。Wherein, the eigenvector of the last iterative process used in the first iterative process is the initial eigenvector.
  7. 根据权利要求5或6所述的方法,其特征在于,确定所述节点对应的用户账号的业务风险信息,包括:The method according to claim 5 or 6, wherein determining the business risk information of the user account corresponding to the node comprises:
    确定所述用户账号是否为异常账号,若根据特征向量确定所述用户账号属于异常账号,则进行上报处理;或者,Determine whether the user account is an abnormal account, and if it is determined according to the feature vector that the user account belongs to an abnormal account, perform reporting processing; or,
    确定所述用户账号是否有逾期风险,若根据特征向量确定所述用户账号存在逾期风险,则对所述用户账号进行监控,或者调整所述用户账号的信用等级。Determine whether the user account has an overdue risk, and if it is determined according to the feature vector that the user account has an overdue risk, monitor the user account, or adjust the credit level of the user account.
  8. 根据权利要求1-7任一项所述的方法,其特征在于,根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果,包括:The method according to any one of claims 1-7, wherein determining the final aggregation result of the node according to the first aggregation result and the second aggregation result includes:
    根据第一聚合结果和第二聚合结果,通过非线性算法,确定最终聚合结果。A final aggregation result is determined through a nonlinear algorithm according to the first aggregation result and the second aggregation result.
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述图数据用于实现社交行为分析,图数据中的节点用于表示用户,预设关联关系包括家人关系、雇佣关系,所述预设关联关系用于确定邻居节点。The method according to any one of claims 1-8, wherein the graph data is used to implement social behavior analysis, the nodes in the graph data are used to represent users, and the preset association relationships include family relationships, employment relationships, The preset association relationship is used to determine neighbor nodes.
  10. 一种图数据的处理装置,其特征在于,所述图数据包括第一子图和第二子图,所述第一子图包括属于第一参与方的节点,所述第二子图包括属于第二参与方的节点;所述装置应用于第一参与方,所述装置包括:A device for processing graph data, characterized in that the graph data includes a first subgraph and a second subgraph, the first subgraph includes nodes belonging to a first participant, and the second subgraph includes nodes belonging to A node of a second participant; the device is applied to the first participant, the device comprising:
    执行模块,用于在任意一轮迭代过程中,遍历所述第一子图中的节点,对于遍历到的每一节点,执行如下操作;An execution module, configured to traverse the nodes in the first subgraph during any iteration, and perform the following operations for each traversed node;
    查找模块,用于查找所述节点在所述第一子图中的邻居节点,根据所述邻居节点的上一轮迭代过程的特征向量进行聚合操作,得到第一聚合结果;A search module, configured to search for neighbor nodes of the node in the first subgraph, perform an aggregation operation according to the feature vectors of the last round of iteration process of the neighbor nodes, and obtain a first aggregation result;
    聚合模块,用于在所述节点与所述第二子图中的节点具有连接关系时,根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果;其中,所述第二聚合结果是所述第二参与方根据所述节点在所述第二子图中的邻居节点的上一轮迭代过程的特征向量确定的;An aggregation module, configured to determine the final aggregation result of the node according to the first aggregation result and the second aggregation result when the node has a connection relationship with the nodes in the second sub-graph; wherein, the first The two-aggregation result is determined by the second participant according to the eigenvectors of the last iteration process of the neighbor nodes of the node in the second subgraph;
    确定模块,用于根据所述最终聚合结果,确定所述节点的本轮迭代过程的特征向量;在迭代次数满足要求后,最后一轮迭代过程的特征向量用于计算节点对应的预测结果。The determination module is configured to determine the eigenvector of the current round of iterative process of the node according to the final aggregation result; after the number of iterations meets the requirement, the eigenvector of the last round of iterative process is used to calculate the prediction result corresponding to the node.
  11. 根据权利要求10所述的装置,其特征在于,所述聚合模块具体用于:The device according to claim 10, wherein the aggregation module is specifically used for:
    若所述节点与所述第二子图中的节点具有连接关系,则向所述第二参与方发送请求信息,所述请求信息用于请求所述第二参与方计算所述节点对应的第二聚合结果并对所述第二聚合结果进行加密;If the node has a connection relationship with a node in the second subgraph, send request information to the second participant, where the request information is used to request the second participant to calculate the first node corresponding to the node two aggregation results and encrypting said second aggregation result;
    接收所述第二参与方发送的加密后的第二聚合结果;receiving the encrypted second aggregation result sent by the second participant;
    根据所述加密后的第二聚合结果,确定所述节点的最终聚合结果。Determine the final aggregation result of the node according to the encrypted second aggregation result.
  12. 根据权利要求11所述的装置,其特征在于,所述加密后的第二聚合结果为利用公钥进行加密后的第二聚合结果;所述聚合模块在根据所述加密后的第二聚合结果,确定所述节点的最终聚合结果时,具体用于:The device according to claim 11, wherein the encrypted second aggregation result is a second aggregation result encrypted with a public key; and the aggregation module is based on the encrypted second aggregation result , when determining the final aggregation result of the node, it is specifically used for:
    利用所述公钥对所述第一聚合结果进行加密;encrypting the first aggregation result using the public key;
    基于随机掩码,对加密后的第一聚合结果与所述加密后的第二聚合结果进行计算,得到加密后的最终聚合结果;Based on the random mask, calculating the encrypted first aggregation result and the encrypted second aggregation result to obtain the encrypted final aggregation result;
    将所述加密后的最终聚合结果发送给第三参与方,以使所述第三参与方利用私钥对所述加密后的最终聚合结果进行解密;sending the encrypted final aggregation result to a third party, so that the third party uses a private key to decrypt the encrypted final aggregation result;
    接收第三参与方发送的解密结果,并对所述解密结果进行去随机掩码操作,得到最终聚合结果。Receive the decryption result sent by the third participant, and perform a random masking operation on the decryption result to obtain the final aggregation result.
  13. 根据权利要求10-12任一项所述的装置,其特征在于,所述查找模块具体用于:The device according to any one of claims 10-12, wherein the search module is specifically used for:
    查找所述节点在所述第一子图中的所有邻居节点;Find all neighbor nodes of the node in the first subgraph;
    基于有放回的抽样操作,从查找到的邻居节点中选择预设数量的邻居节点;Based on the sampling operation with replacement, select a preset number of neighbor nodes from the found neighbor nodes;
    根据所选择的邻居节点的上一轮迭代过程的特征向量,计算所述第一聚合结果。The first aggregation result is calculated according to the eigenvectors of the last round of iterative process of the selected neighbor nodes.
  14. 根据权利要求10-13任一项所述的装置,其特征在于,所述执行模块还用于:The device according to any one of claims 10-13, wherein the execution module is further configured to:
    根据属于所述第一参与方的用户账号,构建所述第一子图中的节点;constructing nodes in the first subgraph according to user accounts belonging to the first participant;
    根据所述第一参与方的用户账号的转账记录,构建第一子图中的节点的连接关系,所述连接关系用于确定邻居节点;According to the transfer record of the user account of the first participant, construct the connection relationship of the nodes in the first subgraph, and the connection relationship is used to determine the neighbor nodes;
    相应的,在迭代次数满足预设要求后,所述执行模块还用于:Correspondingly, after the number of iterations meets the preset requirements, the execution module is also used to:
    根据所述第一子图中任一节点在最后一轮迭代过程对应的特征向量,确定所述节点对应的用户账号的业务风险信息。According to the eigenvector corresponding to any node in the first subgraph in the last iteration process, determine the business risk information of the user account corresponding to the node.
  15. 根据权利要求14所述的装置,其特征在于,所述执行模块还用于:The device according to claim 14, wherein the executing module is also used for:
    根据所述第一参与方的用户账号的属性信息,确定所述第一子图中节点对应的初始特征向量;determining an initial feature vector corresponding to a node in the first subgraph according to the attribute information of the user account of the first participant;
    其中,第一轮迭代过程中使用的上一轮迭代过程的特征向量为所述初始特征向量。Wherein, the eigenvector of the last iterative process used in the first iterative process is the initial eigenvector.
  16. 根据权利要求14或15所述的装置,其特征在于,所述执行模块在确定所述节点对应的用户账号的业务风险信息时,具体用于:The device according to claim 14 or 15, wherein the execution module is specifically configured to: when determining the business risk information of the user account corresponding to the node:
    确定所述用户账号是否为异常账号,若根据特征向量确定所述用户账号属于异常账号,则进行上报处理;或者,Determine whether the user account is an abnormal account, and if it is determined according to the feature vector that the user account belongs to an abnormal account, perform reporting processing; or,
    确定所述用户账号是否有逾期风险,若根据特征向量确定所述用户账号存在逾期风险,则对所述用户账号进行监控,或者调整所述用户账号的信用等级。Determine whether the user account has an overdue risk, and if it is determined according to the feature vector that the user account has an overdue risk, monitor the user account, or adjust the credit level of the user account.
  17. 根据权利要求10-16任一项所述的装置,其特征在于,所述聚合模块在根据所述第一聚合结果与第二聚合结果确定所述节点的最终聚合结果时,具体用于:The device according to any one of claims 10-16, wherein, when the aggregation module determines the final aggregation result of the node according to the first aggregation result and the second aggregation result, it is specifically used for:
    根据第一聚合结果和第二聚合结果,通过非线性算法,确定最终聚合结果。A final aggregation result is determined through a nonlinear algorithm according to the first aggregation result and the second aggregation result.
  18. 一种图数据的处理设备,其特征在于,所述图数据的处理设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的图数据的处理程序,所述图数据的处理程序被所述处理器执行时实现如权利要求1-9中任一项所述的图数据的处理方法的步骤。A graph data processing device, characterized in that the graph data processing device includes: a memory, a processor, and a graph data processing program stored in the memory and operable on the processor, the When the image data processing program is executed by the processor, the steps of the image data processing method according to any one of claims 1-9 are realized.
  19. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有图数据的处理程序,所述图数据的处理程序被处理器执行时实现如权利要求1-9中任一项所述的图数据的处理方法的步骤。A computer-readable storage medium, characterized in that a processing program for image data is stored on the computer-readable storage medium, and when the processing program for image data is executed by a processor, any one of claims 1-9 is implemented. The steps of the graph data processing method described in item.
  20. 一种计算机程序产品,包括计算机程序,其特征在于,该计算机程序被处理器执行时实现权利要求1-9中任一项所述的方法。A computer program product, comprising a computer program, characterized in that, when the computer program is executed by a processor, the method according to any one of claims 1-9 is implemented.
PCT/CN2021/140229 2021-05-10 2021-12-21 Graph data processing method and apparatus, device, storage medium, and program product WO2022237175A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110507515.0 2021-05-10
CN202110507515.0A CN113240505B (en) 2021-05-10 2021-05-10 Method, apparatus, device, storage medium and program product for processing graph data

Publications (1)

Publication Number Publication Date
WO2022237175A1 true WO2022237175A1 (en) 2022-11-17

Family

ID=77133059

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/140229 WO2022237175A1 (en) 2021-05-10 2021-12-21 Graph data processing method and apparatus, device, storage medium, and program product

Country Status (2)

Country Link
CN (1) CN113240505B (en)
WO (1) WO2022237175A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240505B (en) * 2021-05-10 2024-05-24 深圳前海微众银行股份有限公司 Method, apparatus, device, storage medium and program product for processing graph data
CN113672777B (en) * 2021-08-30 2023-09-08 上海飞旗网络技术股份有限公司 User intention exploration method and system based on flow correlation analysis
CN116150810B (en) * 2023-04-17 2023-06-20 北京数牍科技有限公司 Vector element pre-aggregation method, electronic device and computer readable storage medium
CN117273086B (en) * 2023-11-17 2024-03-08 支付宝(杭州)信息技术有限公司 Method and device for multi-party joint training of graph neural network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787403A (en) * 1995-03-08 1998-07-28 Huntington Bancshares, Inc. Bank-centric service platform, network and system
CN109918454A (en) * 2019-02-22 2019-06-21 阿里巴巴集团控股有限公司 The method and device of node insertion is carried out to relational network figure
CN109934706A (en) * 2017-12-15 2019-06-25 阿里巴巴集团控股有限公司 A kind of transaction risk control method, apparatus and equipment based on graph structure model
CN110188422A (en) * 2019-05-16 2019-08-30 深圳前海微众银行股份有限公司 A kind of method and device of feature vector that extracting node based on network data
CN110782044A (en) * 2019-10-29 2020-02-11 支付宝(杭州)信息技术有限公司 Method and device for multi-party joint training of neural network of graph
CN113240505A (en) * 2021-05-10 2021-08-10 深圳前海微众银行股份有限公司 Graph data processing method, device, equipment, storage medium and program product

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280755A (en) * 2018-02-28 2018-07-13 阿里巴巴集团控股有限公司 The recognition methods of suspicious money laundering clique and identification device
CN109598385A (en) * 2018-12-07 2019-04-09 深圳前海微众银行股份有限公司 Anti money washing combination learning method, apparatus, equipment, system and storage medium
CA3080373A1 (en) * 2019-05-10 2020-11-10 Royal Bank Of Canada System and method for machine learning architecture with privacy-preserving node embeddings
CN110287962B (en) * 2019-05-20 2023-10-27 平安科技(深圳)有限公司 Remote sensing image target extraction method, device and medium based on super object information
CN110210227B (en) * 2019-06-11 2021-05-14 百度在线网络技术(北京)有限公司 Risk detection method, device, equipment and storage medium
CN112069398A (en) * 2020-08-24 2020-12-11 腾讯科技(深圳)有限公司 Information pushing method and device based on graph network
CN111985729A (en) * 2020-09-07 2020-11-24 支付宝(杭州)信息技术有限公司 Method, system and device for prediction based on graph neural network
CN112084520B (en) * 2020-09-18 2021-03-23 支付宝(杭州)信息技术有限公司 Method and device for protecting business prediction model of data privacy through joint training of two parties
CN112070216B (en) * 2020-09-29 2023-06-02 支付宝(杭州)信息技术有限公司 Method and system for training graph neural network model based on graph computing system
CN112765412A (en) * 2021-01-19 2021-05-07 合肥鸿麒科技有限公司 Abstract display method of dynamic graph data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787403A (en) * 1995-03-08 1998-07-28 Huntington Bancshares, Inc. Bank-centric service platform, network and system
CN109934706A (en) * 2017-12-15 2019-06-25 阿里巴巴集团控股有限公司 A kind of transaction risk control method, apparatus and equipment based on graph structure model
CN109918454A (en) * 2019-02-22 2019-06-21 阿里巴巴集团控股有限公司 The method and device of node insertion is carried out to relational network figure
CN110188422A (en) * 2019-05-16 2019-08-30 深圳前海微众银行股份有限公司 A kind of method and device of feature vector that extracting node based on network data
CN110782044A (en) * 2019-10-29 2020-02-11 支付宝(杭州)信息技术有限公司 Method and device for multi-party joint training of neural network of graph
CN113240505A (en) * 2021-05-10 2021-08-10 深圳前海微众银行股份有限公司 Graph data processing method, device, equipment, storage medium and program product

Also Published As

Publication number Publication date
CN113240505B (en) 2024-05-24
CN113240505A (en) 2021-08-10

Similar Documents

Publication Publication Date Title
Liu et al. Alleviating the inconsistency problem of applying graph neural network to fraud detection
US11546373B2 (en) Cryptocurrency based malware and ransomware detection systems and methods
WO2021114911A1 (en) User risk assessment method and apparatus, electronic device, and storage medium
WO2022237175A1 (en) Graph data processing method and apparatus, device, storage medium, and program product
JP7095140B2 (en) Multi-model training methods and equipment based on feature extraction, electronic devices and media
US20200160344A1 (en) Blockchain Transaction Analysis and Anti-Money Laundering Compliance Systems and Methods
US11238364B2 (en) Learning from distributed data
JP2017091516A (en) Computer-implemented method, data processing system and computer program for identifying fraudulent transactions
CN111428887B (en) Model training control method, device and system based on multiple computing nodes
CN109615021B (en) Privacy information protection method based on k-means clustering
CN111046237B (en) User behavior data processing method and device, electronic equipment and readable medium
Bonawitz et al. Federated learning and privacy
EP1934923A2 (en) System and method for detecting fraudulent transactions
WO2022007321A1 (en) Longitudinal federal modeling optimization method, apparatus and device, and readable storage medium
JP2016511891A (en) Privacy against sabotage attacks on large data
Nerurkar et al. Detecting illicit entities in bitcoin using supervised learning of ensemble decision trees
US20230006819A1 (en) Systems and methods for homomorphic encryption-based triggering
Han et al. Data valuation for vertical federated learning: An information-theoretic approach
WO2022199473A1 (en) Service analysis method and apparatus based on differential privacy
US20220198579A1 (en) System and method for dimensionality reduction of vendor co-occurrence observations for improved transaction categorization
Upreti et al. Enhanced algorithmic modelling and architecture in deep reinforcement learning based on wireless communication Fintech technology
WO2022116491A1 (en) Dbscan clustering method based on horizontal federation, and related device therefor
KR102211549B1 (en) Method and device enabling expansion of primary payment methods
Yu et al. Robust clustering of ethereum transactions using time leakage from fixed nodes
Ampel et al. Disrupting ransomware actors on the bitcoin blockchain: A graph embedding approach

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21941737

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15.02.2024)