CN114595474A - Federal learning modeling optimization method, electronic device, medium, and program product - Google Patents

Federal learning modeling optimization method, electronic device, medium, and program product Download PDF

Info

Publication number
CN114595474A
CN114595474A CN202210240863.0A CN202210240863A CN114595474A CN 114595474 A CN114595474 A CN 114595474A CN 202210240863 A CN202210240863 A CN 202210240863A CN 114595474 A CN114595474 A CN 114595474A
Authority
CN
China
Prior art keywords
user
graph
extraction model
feature extraction
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210240863.0A
Other languages
Chinese (zh)
Inventor
魏文斌
范涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202210240863.0A priority Critical patent/CN114595474A/en
Publication of CN114595474A publication Critical patent/CN114595474A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a federated learning modeling optimization method, electronic equipment, a medium and a program product, which are applied to a first party, wherein the federated learning modeling optimization method comprises the following steps: embedding each user data as the input of a first layer neural network of a graph embedding feature extraction model to generate each user intermediate feature; requesting to a second party to acquire the aggregation characteristics of the neighbor nodes in an encrypted state according to the intermediate characteristics of the users; respectively taking the intermediate features of each user and the corresponding neighbor node aggregation features as the input of a neural network of the next layer of a graph embedding feature extraction model, and regenerating the intermediate features of each user; taking the intermediate features of each user output by the last layer of neural network of the graph embedding feature extraction model as target graph embedding features; and obtaining a federal graph embedded feature extraction model corresponding to the graph embedded feature extraction model according to the embedded features of the target graphs. The method and the device solve the technical problem that data privacy cannot be protected when the combined multi-party construction graph is embedded into the feature extraction model.

Description

Federal learning modeling optimization method, electronic device, medium, and program product
Technical Field
The present application relates to the field of artificial intelligence techniques for financial technology (Fintech), and in particular, to a method, an electronic device, a medium, and a program product for optimizing federal learning modeling.
Background
With the continuous development of financial science and technology, especially internet science and technology, more and more technologies (such as distributed technology, artificial intelligence and the like) are applied to the financial field, but the financial industry also puts higher requirements on the technologies, for example, higher requirements on the distribution of backlog in the financial industry are also put forward.
With the continuous development of artificial intelligence, the types of neural networks are more and more abundant, the graph neural network is mainly applied to a scene with graph relation data as a deep learning technology, and the graph neural network introduces relation chain data (connection edge data between users) besides user characteristic data (user node data). However, in many application scenarios, the user feature data and the relationship chain data are not usually held by a single party, for example, a company with social familiarity holds the relationship chain data, and a common business company can often only obtain the feature data of the user. At present, in order to build a graph embedding feature extraction model by combining relationship chain data and user feature data, one party with the relationship chain data needs to share data to the party with the user feature data, but for the party with the relationship chain data, the relationship chain data is usually private data, so that the problem that data privacy cannot be protected when the graph embedding feature extraction model is built by combining multiple parties exists.
Disclosure of Invention
The application mainly aims to provide a federated learning modeling optimization method, electronic equipment, a medium and a program product, and aims to solve the technical problem that data privacy cannot be protected when a joint multi-party construction diagram is embedded into a feature extraction model in the prior art.
In order to achieve the above object, the present application provides a federated learning modeling optimization method, which is applied to a first party, and the federated learning modeling optimization method includes:
acquiring user data of a graph embedding feature extraction model and user nodes corresponding to target users, wherein the target users are intersection users between the first party and the second party;
using the user data as the input of the first-layer neural network of the graph embedding feature extraction model to generate user intermediate features corresponding to the user nodes;
requesting to the second party to acquire neighbor node aggregation characteristics of all neighbor nodes of each user node in an encrypted state according to the user intermediate characteristics;
respectively taking each user intermediate feature and the corresponding neighbor node aggregation feature as the input of the next-layer neural network of the graph embedding feature extraction model, and regenerating the user intermediate features corresponding to each user node;
and returning to the execution step: requesting to the second party to acquire neighbor node aggregation characteristics of all neighbor nodes of each user node in an encrypted state according to each user intermediate characteristic until each user intermediate characteristic output by the last layer of neural network of the graph embedding characteristic extraction model is acquired, and taking the acquired user intermediate characteristic as a target graph embedding characteristic;
and obtaining a federal graph embedded feature extraction model corresponding to the graph embedded feature extraction model according to each target graph embedded feature.
In order to achieve the above object, the present application provides a federated learning modeling optimization method, which is applied to a second party, and the federated learning modeling optimization method includes:
acquiring user relation chain data which correspond to all target users together, wherein the target users are intersection users between the first party and the second party;
receiving the intermediate features of the secret state user sent by the first party, wherein the intermediate features of the secret state user are obtained by homomorphic encryption of the intermediate features of the user output by a neural network of a graph embedding feature extraction model;
according to the user relation chain data and the intermediate features of the secret-state users, respectively encrypting and aggregating the intermediate features of the secret-state users corresponding to all neighbor nodes of each user node to obtain the aggregation features of the secret-state neighbor nodes corresponding to each user node;
and sending each dense-state neighbor node aggregation feature to the first party for the first party to decrypt to obtain each neighbor node aggregation feature, respectively taking each user intermediate feature and the corresponding neighbor node aggregation feature as the input of the neural network of the next layer of the graph embedding feature extraction model, regenerating the user intermediate feature corresponding to each user node until each user intermediate feature output by the neural network of the last layer of the graph embedding feature extraction model is obtained as a target graph embedding feature, and obtaining the federated graph embedding feature extraction model corresponding to the graph embedding feature extraction model according to each target graph embedding feature.
The application also provides a federal learning optimization device that models, is applied to the first party, the federal learning optimization device that models includes:
the acquisition module is used for acquiring a graph embedding feature extraction model and user data of user nodes corresponding to target users, wherein the target users are intersection users between the first party and the second party;
the first intermediate feature generation module is used for taking the user data as the input of a first layer neural network of the graph embedding feature extraction model to generate user intermediate features corresponding to the user nodes;
a request acquisition module, configured to request the second party to acquire neighbor node aggregation characteristics of all neighbor nodes of each user node in an encrypted state according to the user intermediate characteristics;
a second intermediate feature generation module, configured to take each user intermediate feature and the corresponding neighbor node aggregation feature as input of a next-layer neural network of the graph-embedded feature extraction model, and regenerate the user intermediate feature corresponding to each user node;
an iterative loop module for returning to execute the steps of: requesting to the second party to acquire neighbor node aggregation characteristics of all neighbor nodes of each user node in an encrypted state according to each user intermediate characteristic until each user intermediate characteristic output by the last layer of neural network of the graph embedding characteristic extraction model is acquired, and taking the acquired user intermediate characteristic as a target graph embedding characteristic;
and the federal model determination module is used for obtaining a federal graph embedded feature extraction model corresponding to the graph embedded feature extraction model according to the embedded features of the target graphs.
The application also provides a federal learning optimization device that models, is applied to the second party, the federal learning optimization device that models includes:
the acquisition module is used for acquiring user relationship chain data which correspond to all target users together, wherein the target users are intersection users between the first party and the second party;
the receiving module is used for receiving the secret user intermediate features sent by the first party, wherein the secret user intermediate features are obtained by homomorphic encryption of user intermediate features output by a neural network of a graph embedding feature extraction model;
the encryption and aggregation module is used for respectively encrypting and aggregating the secret state user intermediate characteristics corresponding to all neighbor nodes of each user node according to the user relation chain data and the secret state user intermediate characteristics to obtain the secret state neighbor node aggregation characteristics corresponding to each user node;
and the sending module is used for sending the dense-state neighbor node aggregation features to the first party so as to enable the first party to decrypt to obtain the neighbor node aggregation features, respectively using the user intermediate features and the corresponding neighbor node aggregation features as the input of the next layer of neural network of the graph embedding feature extraction model, regenerating the user intermediate features corresponding to the user nodes until obtaining the user intermediate features output by the last layer of neural network of the graph embedding feature extraction model as target graph embedding features, and obtaining the federal graph embedding feature extraction model corresponding to the graph embedding feature extraction model according to the target graph embedding features.
The present application further provides an electronic device, the electronic device including: a memory, a processor, and a program of the federated learning modeling optimization method stored on the memory and executable on the processor, the program of the federated learning modeling optimization method when executed by the processor may implement the steps of the federated learning modeling optimization method as described above.
The present application also provides a computer-readable storage medium having stored thereon a program for implementing the federal learning modeling optimization method, the program implementing the steps of the federal learning modeling optimization method as described above when executed by a processor.
The present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the method of federated learning modeling optimization as described above.
Compared with the technical means that a graph embedding feature extraction model is established in a combined multi-way mode by sharing data through a party with user feature data in one direction of relation chain data in the prior art, the method comprises the steps of firstly obtaining a graph embedding feature extraction model and user data of user nodes corresponding to target users, wherein the target users are intersection users between a first party and a second party, and then using the user data as input of a first-layer neural network of the graph embedding feature extraction model to generate user intermediate features corresponding to the user nodes; requesting to obtain neighbor node aggregation characteristics of all neighbor nodes of each user node from the second party in an encrypted state according to the user intermediate characteristics; respectively taking each user intermediate feature and the corresponding neighbor node aggregation feature as the input of the next-layer neural network of the graph embedding feature extraction model, and regenerating the user intermediate features corresponding to each user node; and returning to the execution step: requesting to the second party to acquire neighbor node aggregation characteristics of all neighbor nodes of each user node in an encrypted state according to each user intermediate characteristic until each user intermediate characteristic output by the last layer of neural network of the graph embedding characteristic extraction model is acquired, and taking the acquired user intermediate characteristic as a target graph embedding characteristic; and obtaining a federal graph embedded feature extraction model corresponding to the graph embedded feature extraction model according to each target graph embedded feature. The intermediate features of the users sent by the first party to the second party are homomorphic encrypted data, the second party cannot know the data privacy of the first party, and the second party only needs to encrypt and aggregate the intermediate features of the users sent by the first party in a ciphertext state locally according to the data of the user relationship chain, the data of the user relationship chain is not needed to be sent to the first party by the second party, and the data of the user relationship chain is only used as a basis for determining which intermediate features of the users in the confidentiality state are encrypted and aggregated, so that the first party cannot reversely release the data of the user relationship chain of the second party through the aggregation features of the nodes in the confidentiality state after obtaining the aggregation features of the nodes in the confidentiality state sent by the second party, the purpose of protecting the data privacy of the second party is achieved, and the federated graph embedded feature extraction model is established by combining the first party and the second party on the premise of protecting the data privacy of the first party and the second party, therefore, the technical defect that data privacy can be revealed when a party sharing data with user feature data in one direction of the relation chain data is combined with the multi-party building diagram to embed the feature extraction model due to the fact that the relation chain data is usually private data is overcome, and the technical problem that data privacy cannot be protected when the combined multi-party building diagram is embedded into the feature extraction model is solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic flow chart diagram of a first embodiment of a federated learning modeling optimization method of the present application;
FIG. 2 is a schematic flow chart diagram of a second embodiment of the federated learning modeling optimization method of the present application;
FIG. 3 is a schematic flow chart diagram of a third embodiment of the Federal learning modeling optimization method of the present application;
fig. 4 is a schematic device structure diagram of a hardware operating environment related to the federal learning modeling optimization method in the embodiment of the present application.
The objectives, features, and advantages of the present application will be further described with reference to the accompanying drawings.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments of the present application are described in detail below with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Example one
The embodiment of the application provides a federated learning modeling optimization method, which is applied to a first party, and in the first embodiment of the federated learning modeling optimization method, referring to fig. 1, the federated learning modeling optimization method includes:
step S10, obtaining a graph embedding feature extraction model and user data of user nodes corresponding to target users, wherein the target users are intersection users between the first party and the second party;
in this embodiment, it should be noted that the first party is a party having user data in federal learning, the user data may be user feature data, or the user feature data and a user tag corresponding to the user feature data, the second party is a party having user relationship chain data in federal learning, and the user relationship chain data is data representing a relationship between users. The graph embedding feature is extracted as an untrained neural network model, and is used for extracting embedding corresponding to the user node from the user data, that is, graph embedding, for example, the embedding may be a low-dimensional space vector representing node features of the user node in the graph representing data. Before the first party and the second party start to build the federal graph embedded feature extraction model, sample alignment needs to be carried out between the first party and the second party to determine intersection users between the first party and the second party, namely target users, so that after the samples are aligned, the first party reserves user data corresponding to user nodes of the target users for building the federal graph embedded feature extraction model, and the second party reserves relation chain data corresponding to the target users in common for building the federal graph embedded feature extraction model.
As an example, in the graph representation data, the graph representation data includes nodes and connecting edges, the user data may be data representing nodes in the graph representation data, for example, may be a user representation, and the user relation chain data may be data representing connecting edges between user nodes in the graph representation data, for example, the user relation chain data may be an array, and connecting edges exist between 2 user nodes belonging to the same array.
Step S20, using the user data as the input of the first layer neural network of the graph embedding feature extraction model, and generating user intermediate features corresponding to the user nodes;
in this embodiment, it should be noted that the graph-embedded feature extraction model includes a plurality of layers of neural networks, as an example, a first layer of neural network of the graph-embedded feature extraction model may be an input layer, which is used to convert user data into a user feature matrix, where the user feature matrix is composed of a plurality of user feature values, and the neural network behind the input layer is each hidden layer, which is used to map the user feature matrix into embedding. The user intermediate features are feature matrixes output by an input layer or a hidden layer.
As an example, step S20 includes: and respectively embedding the user data into a first-layer neural network of a feature extraction model through the graph, and outputting user intermediate features corresponding to the user data.
Step S30, according to the intermediate characteristics of each user, requesting the second party to acquire the neighbor node aggregation characteristics of all neighbor nodes of each user node in an encrypted state;
as an example, step S30 includes: respectively carrying out homomorphic encryption on the intermediate features of the users to obtain the intermediate features of the secret users corresponding to the intermediate features of the users, wherein the homomorphic encryption mode can be semi-homomorphic encryption or fully homomorphic encryption; sending the intermediate features of the dense-state users to the second party, so that the second party retrieves neighbor relations among the user nodes through the user relation chains according to user relation chain data corresponding to the target users in common and determines neighbor node sets corresponding to the user nodes, wherein the neighbor node sets are sets of all neighbor nodes corresponding to the user nodes; determining the intermediate characteristics of the dense-state users corresponding to the neighbor nodes in each neighbor node set in each intermediate characteristic of the dense-state users to obtain intermediate characteristic sets of the dense-state users corresponding to each neighbor node set; the second party respectively carries out encryption aggregation on all the secret state user intermediate features in each secret state user intermediate feature set to obtain secret state neighbor node aggregation features corresponding to each user node; and the second party sends the aggregation characteristics of the secret neighbor nodes to the first party.
Step S40, respectively using each user intermediate feature and the corresponding neighbor node aggregation feature as the input of the next-layer neural network of the graph embedding feature extraction model, and regenerating the user intermediate features corresponding to each user node;
as an example, step S40 includes: respectively aggregating each user intermediate feature and the corresponding neighbor node aggregation feature to obtain a combined user intermediate feature; and passing the joint user intermediate feature through a next layer of neural network in the feature extraction model, and taking an output result of the next layer of neural network as a regenerated user intermediate feature. It should be noted that a plurality of neighbor nodes corresponding to the neighbor node aggregation features together are determined based on the user relationship chain data of the second party, so that the purpose of performing federated learning modeling by combining the user data corresponding to the user nodes and the user relationship chain data corresponding to the connection edges is indirectly achieved by taking each user intermediate feature and the corresponding neighbor node aggregation feature as the input of the next-layer neural network of the graph-embedded feature extraction model.
The step of regenerating the user intermediate features corresponding to the user nodes by respectively using the user intermediate features and the corresponding neighbor node aggregation features as the input of the next-layer neural network of the graph-embedded feature extraction model comprises the following steps:
step S41, respectively performing feature splicing on each user intermediate feature and the corresponding neighbor node aggregation feature to obtain each splicing feature;
step S42, inputting each splicing feature into the next layer of neural network of the graph embedding feature extraction model respectively to obtain the neural network output corresponding to each user node;
step S43, by normalizing each neural network output, a user intermediate feature corresponding to each user node is regenerated.
In this embodiment, it should be noted that the user intermediate features may be vectors, and the neighbor node aggregation features may also be vectors;
in this embodiment, it should be noted that the neural network includes a network parameter and an activation function, where the network parameter is a network parameter that needs to be trained and updated, for example, the network parameter may be a convolution kernel, and the activation function may be a sigmod function or a tanh function, which is not limited herein.
As an example, the steps S41 to S43 include: respectively carrying out vector splicing on each user intermediate feature and the corresponding neighbor node aggregation feature to obtain each splicing feature; inputting each splicing feature into a next-layer neural network of the graph embedding feature extraction model respectively, and outputting neural network output corresponding to each splicing feature according to network parameters corresponding to the next-layer neural network, corresponding activation functions and each splicing feature serving as input; and respectively normalizing the output of each neural network so as to respectively map the output of each neural network to a preset spatial dimension and regenerate the user intermediate characteristics corresponding to each user node.
As an example, a specific calculation formula of the neural network output corresponding to each of the splicing features is obtained as follows:
Figure BDA0003541533020000091
wherein v represents a user node v, N (v) represents a neighbor node set corresponding to the user node v,
Figure BDA0003541533020000092
the output of the k-1 layer neural network of the feature extraction model, i.e. the user intermediate features as input,
Figure BDA0003541533020000093
neighbor node aggregated features as input to the k-th layer neural network of the graph-embedded feature extraction model, WkIs the network parameter of the k-th layer neural network, sigma is the activation function,
Figure BDA0003541533020000094
and outputting the neural network corresponding to the k-th layer neural network, wherein CONCAT represents vector splicing operation.
As an example, the specific formula for normalizing the neural network output is as follows:
Figure BDA0003541533020000095
wherein, the left side of equal sign
Figure BDA0003541533020000096
For regenerated user intermediate features, right of equal sign
Figure BDA0003541533020000097
And outputting the neural network corresponding to the k-th layer neural network.
Step S50, return to the execution step: requesting to the second party to acquire neighbor node aggregation characteristics of all neighbor nodes of each user node in an encrypted state according to each user intermediate characteristic until each user intermediate characteristic output by the last layer of neural network of the graph embedding characteristic extraction model is acquired, and taking the acquired user intermediate characteristic as a target graph embedding characteristic;
as an example, step S50 includes: and returning to the execution step: requesting to obtain neighbor node aggregation characteristics of all neighbor nodes of each user node from the second party in an encrypted state according to the user intermediate characteristics; requesting the second party to acquire the dense-state neighbor node aggregation features of all neighbor nodes of each user node according to the dense-state user intermediate features to acquire user intermediate features regenerated by the next-layer neural network until each user intermediate feature output by the last-layer neural network of the graph-embedded feature extraction model is obtained, and taking the user intermediate features as target graph embedded features output by inputting the user data into the graph-embedded feature extraction model, wherein one piece of user data corresponds to one target graph embedded feature.
And step S60, obtaining a federal graph embedded feature extraction model corresponding to the graph embedded feature extraction model according to the embedded features of the target graphs.
As an example, step S60 includes: and iteratively optimizing the graph embedding feature extraction model according to each target graph embedding feature to obtain the federal graph embedding feature extraction model.
The step of obtaining a federal graph embedded feature extraction model corresponding to the graph embedded feature extraction model according to each target graph embedded feature comprises the following steps:
step S61 of determining whether or not the model loss calculated from each of the target map embedding features converges;
step S62, if the model loss is converged, the graph embedding feature extraction model obtained by updating at this time is used as the federal graph embedding feature extraction model;
step S63, if the model loss does not converge, updating the graph-embedded feature extraction model according to the model loss, and returning to execute the steps of: and acquiring the graph embedding feature extraction model and user data of user nodes corresponding to the target users.
As one example, steps S61 to S63 include: calculating corresponding model loss according to each target graph embedding feature, judging whether the model loss is converged, and if the model loss is converged, taking the graph embedding feature extraction model as a federal graph embedding feature extraction model; if the model loss is not converged, performing back propagation updating on the network parameters of each layer of neural network in the graph-embedded feature extraction model according to the gradient calculated by the model loss, and returning to the execution step: and acquiring the graph embedding feature extraction model and the user data of the user nodes corresponding to the target users until the calculated model loss is converged. The method for updating the back propagation may be a gradient descent method, a gradient ascent method, or the like, and the model loss may be a loss of unsupervised training or a loss of supervised training. Therefore, the purpose of carrying out federated learning modeling by indirectly combining the user data of the first party and the user relation chain data of the second party on the premise of protecting data privacy is achieved, the purpose of carrying out federated learning modeling by combining different types of graph representation data without revealing data privacy is achieved, and the problem of data island when a federated graph embedded feature extraction model is built by using different types of graph representation data is solved.
Wherein the model loss at least includes one of classification loss and unsupervised loss, and before the step of determining whether the model loss calculated according to each target graph embedding feature converges, the federal learning modeling optimization method further includes:
step A10, embedding the target graph into a feature input preset classification model to obtain an output classification label; acquiring a preset classification label corresponding to the user data, and calculating classification loss according to the preset classification label and the output classification label;
in this embodiment, it should be noted that the loss of the supervised training may be a classification loss. The preset classification label is a user label corresponding to a preset user node.
As an example, step a10 includes: inputting the target graph embedding features into a preset classification model, and classifying the target graph embedding features to obtain output classification labels corresponding to the target graph embedding features; and calculating the similarity between the output classification label and the preset classification label to obtain the classification loss.
And step B10, calculating the unsupervised loss corresponding to the embedding features of the target graphs according to a preset unsupervised loss function.
As an example, the preset unsupervised loss function is as follows:
Figure BDA0003541533020000111
wherein, JG(zu) A corresponding cost function when a user node u appears on a connecting edge (u, v), namely a preset unsupervised loss function, is given, and the sum of the values of the cost functions corresponding to all nodes is the unsupervised loss; z is a radical ofuEmbedding characteristics for a target graph corresponding to a user node u, namely embedding, zvEmbedding characteristics, namely embedding, into a target graph corresponding to a user node v;
Figure BDA0003541533020000112
the positive sampling part represents that the connecting edge (u, v) acts on the cost function when appearing, specifically, if the connecting edge (u, v) exists, the embedding corresponding to the user node u and the embeddin corresponding to the user node vg is close, at this time, the inner product between the embedding corresponding to the user node u and the embedding corresponding to the user node v is larger, and correspondingly, the generated cost is smaller;
Figure BDA0003541533020000113
is a negative sampling part, Q is the negative sampling times, Pn(v)V (n) -P to represent a set of nodes that are not neighbors of node vn(v)Represents a slave Pn(v)E represents desired.
As an example, the user data owned by the first party may be user portrait data, the user association chain data may be social relationship chain data between users, that is, a social relationship graph, and the user intermediate features may be user portrait intermediate features, so that a federal graph embedded feature extraction model for extracting user portrait features may be generated according to steps S10 to S60, and since the federal graph embedded feature extraction model is constructed by combining the user portrait data of the first party and the social relationship chain data of the second party in a vertical federal learning manner, information of the user features obtained by feature extraction according to the federal graph embedded feature extraction model is richer when performing sample prediction, more decision bases may be provided for the sample prediction, and accuracy of the sample prediction may be improved, where the federal graph embedded feature may be applied to the field of wind control, the Federal chart embedded feature extraction model and the loan risk probability prediction model jointly form a wind control model, and the sample prediction process is a loan risk detection process, so that the loan risk detection accuracy can be improved; the federal graph embedded feature can also be applied to the field of message recommendation, a federal graph embedded feature extraction model and a conversion rate prediction model jointly form a message recommendation model, the sample prediction process is a message recommendation process, and the user conversion rate during message recommendation is predicted through the federal graph embedded feature extraction model and the conversion rate prediction model, so that message recommendation is selectively performed according to the predicted user conversion rate, and the accuracy of message recommendation can be improved.
Compared with the technical means that a graph embedding feature extraction model is built jointly by sharing data through a party with user feature data in one direction of relation chain data in the prior art, the graph embedding feature extraction model and user data of user nodes corresponding to target users are obtained firstly in the embodiment of the application, wherein the target users are intersection users between a first party and a second party, and then the user data are used as input of a first layer of neural network of the graph embedding feature extraction model to generate user intermediate features corresponding to the user nodes; requesting to obtain neighbor node aggregation characteristics of all neighbor nodes of each user node from the second party in an encrypted state according to the user intermediate characteristics; respectively taking each user intermediate feature and the corresponding neighbor node aggregation feature as the input of the next-layer neural network of the graph embedding feature extraction model, and regenerating the user intermediate features corresponding to each user node; and returning to the execution step: requesting to the second party to acquire neighbor node aggregation characteristics of all neighbor nodes of each user node in an encrypted state according to each user intermediate characteristic until each user intermediate characteristic output by the last layer of neural network of the graph embedding characteristic extraction model is acquired, and taking the acquired user intermediate characteristic as a target graph embedding characteristic; and obtaining a federal graph embedded feature extraction model corresponding to the graph embedded feature extraction model according to each target graph embedded feature. The intermediate features of the users sent by the first party to the second party are homomorphic encrypted data, the second party cannot know the data privacy of the first party, and the second party only needs to encrypt and aggregate the intermediate features of the users sent by the first party in a ciphertext state locally according to the data of the user relationship chain, the data of the user relationship chain is not needed to be sent to the first party by the second party, and the data of the user relationship chain is only used as a basis for determining which intermediate features of the users in the confidentiality state are encrypted and aggregated, so that the first party cannot reversely release the data of the user relationship chain of the second party through the aggregation features of the nodes in the confidentiality state after obtaining the aggregation features of the nodes in the confidentiality state sent by the second party, the purpose of protecting the data privacy of the second party is achieved, and the federated graph embedded feature extraction model is established by combining the first party and the second party on the premise of protecting the data privacy of the first party and the second party, therefore, the technical defect that data privacy can be revealed when a party sharing data with user feature data in one direction of the relation chain data is combined with the multi-party building diagram to embed the feature extraction model due to the fact that the relation chain data is usually private data is overcome, and the technical problem that data privacy cannot be protected when the combined multi-party building diagram is embedded into the feature extraction model is solved.
Example two
Further, referring to fig. 2, based on the first embodiment of the present application, in another embodiment of the present application, the same or similar contents to those of the above embodiment may be referred to the above description, and are not repeated again in the following. On this basis, after the step of obtaining the federal graph embedded feature extraction model corresponding to the graph embedded feature extraction model according to each target graph embedded feature, the federal learning modeling optimization method further includes:
step C10, acquiring user data to be predicted on a target user node corresponding to the user to be predicted;
step C20, extracting a neighbor node aggregation feature set corresponding to all neighbor nodes of the target user node, wherein the neighbor node aggregation feature set is generated in the training process of the target graph embedded feature extraction model;
step C30, performing feature extraction based on federal learning on the user data to be predicted by taking the user data to be predicted and the neighbor node aggregation feature set as the input of the target graph embedding feature extraction model to obtain graph embedding features to be predicted corresponding to the target user node;
in this embodiment, it should be noted that, after the federal graph embedded feature extraction model is trained, in order to improve the efficiency of sample prediction, a neighbor node aggregation feature set obtained when the federal graph embedded feature extraction model is trained may be directly selected to participate in sample prediction, instead of requesting a second party to obtain the dense neighbor node aggregation features of all neighbor nodes corresponding to a target user node. Meanwhile, in order to ensure the accuracy of sample prediction, the federal graph embedded feature extraction model can be selected periodically, namely, the federal graph embedded feature extraction model is used as a graph embedded feature extraction model to be trained periodically, and the model training processes from the step S10 to the step S60 are executed again to ensure the effectiveness of the neighbor node aggregation feature set. The neighbor node aggregation feature set comprises neighbor node aggregation features corresponding to user intermediate features generated by the user data to be predicted in each layer of the neural network of the federal graph embedded feature extraction model. The user to be predicted is one member of each target user, and the target user node is one member of user nodes corresponding to each target user. The federal graph embedded feature extraction model comprises an input layer and hidden layers.
As an example, the step C10 to the step C30 include: acquiring user data to be predicted on a target user node corresponding to a user to be predicted; extracting a neighbor node aggregation feature set corresponding to all neighbor nodes of the target user node, and determining neighbor node aggregation features serving as input of each hidden layer in the neighbor node aggregation feature set; and taking the user data to be predicted and the aggregation characteristics of the neighbor nodes as the input of the federal embedded characteristic extraction model, and performing characteristic extraction based on federal learning on the user data to be predicted to obtain the graph embedded characteristics to be predicted corresponding to the target user node, wherein the user data to be predicted is input into the federal graph embedded characteristic extraction model through an input layer, and the aggregation characteristics of the neighbor nodes are communicated with the output of the upper network of the corresponding hidden layer to be used as the input of the corresponding hidden layer.
The target graph embedded feature extraction model comprises an input layer and hidden layers, the neighbor node aggregation feature set at least comprises neighbor node aggregation features corresponding to one hidden layer, and the step of obtaining the graph embedded features to be predicted corresponding to the target user nodes comprises the following steps of taking the user data to be predicted and the neighbor node aggregation feature set as the input of the target graph embedded feature extraction model, and carrying out feature extraction on the user data to be predicted based on federal learning:
step C31, passing the user data to be predicted through the input layer, and outputting the user characteristics to be predicted corresponding to the user data to be predicted;
step C32, determining neighbor node aggregation characteristics corresponding to a hidden layer behind the input layer, taking the user characteristics to be predicted and the neighbor node aggregation characteristics together as the input of the hidden layer, and outputting the intermediate characteristics of the user to be predicted corresponding to the user to be predicted;
and step C33, determining neighbor node aggregation characteristics corresponding to a next hidden layer, using the intermediate characteristics of the user to be predicted and the corresponding neighbor node aggregation characteristics as the input of the next hidden layer together, so as to regenerate the intermediate characteristics of the user to be predicted until the intermediate characteristics of the user to be predicted output by the last hidden layer are obtained, and using the intermediate characteristics of the user to be predicted as the graph embedding characteristics to be predicted.
As an example, the step C31 to the step C33 include: inputting the user data to be predicted into a first layer neural network of the federal diagram embedded feature extraction model to generate intermediate features of the user to be predicted, wherein the first layer neural network is an input layer of the federal diagram embedded feature extraction model, and the neural network behind the input layer is a hidden layer; extracting neighbor node aggregation characteristics corresponding to the intermediate characteristics of the user to be predicted, wherein the neighbor node aggregation characteristics are generated when the federal graph embedded characteristic extraction model is constructed; and using the intermediate features of the users to be predicted and the corresponding neighbor node aggregation features as the input of the neural network of the next layer of the Federal graph embedded feature extraction model, regenerating the intermediate features of the users to be predicted, extracting the neighbor node aggregation features corresponding to the regenerated intermediate features to be predicted, and returning to the execution step: and the intermediate features of the users to be predicted and the corresponding neighbor node aggregation features are jointly used as the input of the next layer of neural network of the federal graph embedded feature extraction model, the intermediate features of the users to be predicted are regenerated until the intermediate features of the users to be predicted output by the last layer of neural network of the federal graph embedded feature extraction model are obtained, and the obtained intermediate features of the users to be predicted are used as the embedded features of the graphs to be predicted corresponding to the target user nodes.
And step C40, performing sample prediction on the user characteristics to be predicted by inputting the embedded characteristics of the graph to be predicted into a preset sample prediction model to obtain a target prediction result.
As an example, the preset sample prediction model may be a user classification model, in which case the user data may be a user profile, the target prediction result may be a user classification label, and step C40 includes: and inputting the embedding characteristics of the graph to be predicted into a user classification model, and classifying the user to be predicted to obtain a user classification label. The neighbor node aggregation characteristics are determined by combining the user data of the first party and the user relationship chain data of the second party, so that when user classification is carried out according to the user data to be predicted and the corresponding neighbor node aggregation characteristic set in the embodiment of the application, the user data to be predicted of the first party and the user relationship chain data of the second party are indirectly combined to carry out federal learning prediction, so that the user classification has more decision bases, and the accuracy of the user classification is improved.
As an example, the preset sample prediction model may be a risk detection model in the field of wind control, and is used to predict a risk of loan or repayment of the user, in this case, the user data to be predicted may be a user portrait, or a loan repayment record of the user, and the target prediction result is a risk detection result, where step C40 includes: and embedding the graph to be predicted into a characteristic input risk detection model, and carrying out risk detection on the user to be predicted to obtain a risk detection result. Since the neighbor node aggregation feature is determined by combining the user data of the first party and the user relationship chain data of the second party, in the embodiment of the application, when risk detection is performed according to the user data to be predicted and the corresponding neighbor node aggregation feature set, federal learning prediction is performed by indirectly combining the user data to be predicted of the first party and the user relationship chain data of the second party, so that risk detection has more decision bases, and the accuracy of risk detection is improved.
The embodiment of the application provides a sample prediction method based on graph representation data, namely, user data to be predicted on a target user node corresponding to a user to be predicted is obtained at first; extracting a neighbor node aggregation feature set corresponding to all neighbor nodes of the target user node, wherein the neighbor node aggregation feature set is generated in the training process of the target graph embedding feature extraction model; performing feature extraction based on federal learning on the user data to be predicted by taking the user data to be predicted and the neighbor node aggregation feature set as the input of the target graph embedding feature extraction model to obtain graph embedding features to be predicted corresponding to the target user node; and inputting the embedded features of the graph to be predicted into a preset sample prediction model, and performing sample prediction on the features of the user to be predicted to obtain a target prediction result. The neighbor node aggregation characteristics are determined by combining the user data of the first party and the user relation chain data of the second party, so that when sample prediction is performed according to the user data to be predicted and the corresponding neighbor node aggregation characteristic set in the embodiment of the application, sample prediction based on federal learning is performed by indirectly combining the user data to be predicted of the first party and the user relation chain data of the second party, so that the sample prediction has more decision bases, and the accuracy of sample prediction of graph representation data is improved.
EXAMPLE III
The embodiment of the present application provides a federated learning modeling optimization method, which is applied to a second party, and in the first embodiment of the federated learning modeling optimization method of the present application, referring to fig. 3, the federated learning modeling optimization method includes:
step D10, acquiring user relationship chain data corresponding to each target user, wherein the target user is an intersection user between the first party and the second party;
step D20, receiving the secret state user intermediate characteristics sent by the first party, wherein the secret state user intermediate characteristics are obtained by homomorphic encryption of user intermediate characteristics output by a neural network of a graph embedding characteristic extraction model;
step D30, according to the user relation chain data and the intermediate features of each secret user, respectively encrypting and aggregating the intermediate features of the secret users corresponding to all the neighbor nodes of each user node to obtain the aggregated features of the secret neighbor nodes corresponding to each user node;
and D40, sending the dense-state neighbor node aggregation features to the first party for decryption by the first party to obtain neighbor node aggregation features, taking the user intermediate features and the corresponding neighbor node aggregation features as input of a next-layer neural network of the graph-embedded feature extraction model, regenerating the user intermediate features corresponding to the user nodes until obtaining user intermediate features output by a last-layer neural network of the graph-embedded feature extraction model as target graph embedding features, and obtaining a federal graph embedded feature extraction model corresponding to the graph-embedded feature extraction model according to the target graph embedding features.
In this embodiment, it should be noted that, for a specific process of determining an intersection user between the first party and the second party, reference may be made to the explanation content part corresponding to the step S10, and details are not described herein again.
As an example, the steps D10 to D40 include: acquiring user relation chain data which correspond to all target users together, wherein the target users are intersection users between the first party and the second party; receiving the secret state user intermediate features sent by the first party, wherein the secret state user intermediate features are obtained by homomorphic encryption of user intermediate features output by a neural network of a graph embedding feature extraction model, and the specific process of generating the secret state user intermediate features by the first party can refer to the specific implementation processes in the steps S10 to S60, which are not described herein again; retrieving neighbor relations among the user nodes through the user relation chain, determining a neighbor node set corresponding to the user nodes, wherein the neighbor node set is a set of all neighbor nodes corresponding to the user nodes, and determining a dense-state user intermediate feature set corresponding to each neighbor node set according to the dense-state user intermediate features; respectively carrying out encryption aggregation on all the intermediate features of the secret state users in the intermediate feature set of the secret state users to obtain the aggregation features of the secret state neighbor nodes corresponding to the user nodes; and sending each dense-state neighbor node aggregation feature to the first party for decryption by the first party to obtain each neighbor node aggregation feature, taking each user intermediate feature and the corresponding neighbor node aggregation feature as the input of the next layer of neural network of the graph embedding feature extraction model, regenerating the user intermediate features corresponding to each user node until each user intermediate feature output by the last layer of neural network of the graph embedding feature extraction model is obtained as a target graph embedding feature, and obtaining the federal graph embedding feature extraction model corresponding to the graph embedding feature extraction model according to each target graph embedding feature. The specific implementation process of the first party generating the federal diagram embedded feature extraction model may refer to the contents in the steps S10 to S60, and will not be described herein again. The encryption aggregation may be performed by averaging, summing, or the like.
Compared with the technical means that a joint multi-party construction graph embedding feature extraction model is realized by sharing data through a party with user feature data in one direction of relationship chain data in the prior art, the method comprises the steps of firstly acquiring user relationship chain data commonly corresponding to target users, wherein the target users are intersection users between a first party and a second party; receiving the intermediate features of the secret state user sent by the first party, wherein the intermediate features of the secret state user are obtained by homomorphic encryption of the intermediate features of the user output by a neural network of a graph embedding feature extraction model; according to the user relation chain data and the intermediate features of the secret-state users, respectively encrypting and aggregating the intermediate features of the secret-state users corresponding to all neighbor nodes of each user node to obtain the aggregation features of the secret-state neighbor nodes corresponding to each user node; and sending each dense-state neighbor node aggregation feature to the first party for decryption by the first party to obtain each neighbor node aggregation feature, taking each user intermediate feature and the corresponding neighbor node aggregation feature as the input of the next layer of neural network of the graph embedding feature extraction model, regenerating the user intermediate features corresponding to each user node until each user intermediate feature output by the last layer of neural network of the graph embedding feature extraction model is obtained as a target graph embedding feature, and obtaining the federal graph embedding feature extraction model corresponding to the graph embedding feature extraction model according to each target graph embedding feature. The intermediate features of the users sent by the first party to the second party are homomorphic encrypted data, the second party cannot know the data privacy of the first party, and the second party only needs to encrypt and aggregate the intermediate features of the users sent by the first party in a ciphertext state locally according to the data of the user relationship chain, the data of the user relationship chain is not needed to be sent to the first party by the second party, and the data of the user relationship chain is only used as a basis for determining which intermediate features of the users in the confidentiality state are encrypted and aggregated, so that the first party cannot reversely release the data of the user relationship chain of the second party through the aggregation features of the nodes in the confidentiality state after obtaining the aggregation features of the nodes in the confidentiality state sent by the second party, the purpose of protecting the data privacy of the second party is achieved, and the federated graph embedded feature extraction model is established by combining the first party and the second party on the premise of protecting the data privacy of the first party and the second party, therefore, the technical defect that data privacy can be revealed when a party sharing data with user feature data in one direction of the relation chain data is combined with the multi-party building diagram to embed the feature extraction model due to the fact that the relation chain data is usually private data is overcome, and the technical problem that data privacy cannot be protected when the combined multi-party building diagram is embedded into the feature extraction model is solved.
Example four
The embodiment of the present application further provides a federated learning modeling optimization device, which is applied to the first party, the federated learning modeling optimization device includes:
the acquisition module is used for acquiring a graph embedding feature extraction model and user data of user nodes corresponding to target users, wherein the target users are intersection users between the first party and the second party;
the first intermediate feature generation module is used for taking the user data as the input of a first layer neural network of the graph embedding feature extraction model to generate user intermediate features corresponding to the user nodes;
a request acquisition module, configured to request the second party to acquire neighbor node aggregation characteristics of all neighbor nodes of each user node in an encrypted state according to the user intermediate characteristics;
the second intermediate feature generation module is used for respectively taking each user intermediate feature and the corresponding neighbor node aggregation feature as the input of the next-layer neural network of the graph embedding feature extraction model, and regenerating the user intermediate features corresponding to each user node;
an iterative loop module for returning to the execution step: requesting to the second party to acquire neighbor node aggregation characteristics of all neighbor nodes of each user node in an encrypted state according to each user intermediate characteristic until each user intermediate characteristic output by the last layer of neural network of the graph embedding characteristic extraction model is acquired, and taking the acquired user intermediate characteristic as a target graph embedding characteristic;
and the federal model determination module is used for obtaining a federal graph embedded feature extraction model corresponding to the graph embedded feature extraction model according to the embedded features of the target graphs.
Optionally, the second intermediate feature generation module is further configured to:
respectively performing feature splicing on each user intermediate feature and the corresponding neighbor node aggregation feature to obtain each splicing feature;
inputting each splicing feature into the next layer of neural network of the graph embedding feature extraction model respectively to obtain the neural network output corresponding to each user node;
and regenerating user intermediate characteristics corresponding to the user nodes by respectively normalizing the output of each neural network.
Optionally, the request obtaining module is further configured to:
homomorphically encrypting each user intermediate feature into a secret user intermediate feature;
sending each secret-state user intermediate feature to the second party, so that the second party encrypts and aggregates the secret-state user intermediate features corresponding to all neighbor nodes of each user node into a secret-state neighbor node aggregation feature according to user relationship chain data and each secret-state user intermediate feature which are commonly corresponding to each target user;
and receiving the aggregation characteristics of the secret-state neighbor nodes sent by the second party, and decrypting the aggregation characteristics of the secret-state neighbor nodes respectively to obtain the aggregation characteristics of the neighbor nodes.
Optionally, the federal learning modeling optimization device is further configured to:
acquiring user data to be predicted on a target user node corresponding to a user to be predicted;
extracting a neighbor node aggregation feature set corresponding to all neighbor nodes of the target user node, wherein the neighbor node aggregation feature set is generated in the training process of the target graph embedding feature extraction model;
performing feature extraction based on federal learning on the user data to be predicted by taking the user data to be predicted and the neighbor node aggregation feature set as the input of the target graph embedding feature extraction model to obtain graph embedding features to be predicted corresponding to the target user node;
and inputting the embedded features of the graph to be predicted into a preset sample prediction model, and performing sample prediction on the features of the user to be predicted to obtain a target prediction result.
Optionally, the target graph embedded feature extraction model includes an input layer and hidden layers, the neighbor node aggregation feature set includes at least a neighbor node aggregation feature corresponding to a hidden layer, and the federal learning modeling optimization device is further configured to:
the user data to be predicted passes through the input layer, and user characteristics to be predicted corresponding to the user data to be predicted are output;
determining neighbor node aggregation characteristics corresponding to a hidden layer behind the input layer, taking the user characteristics to be predicted and the neighbor node aggregation characteristics as the input of the hidden layer together, and outputting the intermediate characteristics of the user to be predicted corresponding to the user to be predicted;
and determining neighbor node aggregation characteristics corresponding to a next hidden layer, using the intermediate characteristics of the user to be predicted and the corresponding neighbor node aggregation characteristics as the input of the next hidden layer together, so as to regenerate the intermediate characteristics of the user to be predicted until the intermediate characteristics of the user to be predicted output by the last hidden layer are obtained, and using the intermediate characteristics of the user to be predicted as the graph embedding characteristics to be predicted.
Optionally, the federal module determination module is further configured to:
judging whether the model loss calculated according to the embedding characteristics of each target graph is converged;
if the model loss is converged, taking the updated graph embedding feature extraction model as the federal graph embedding feature extraction model;
if the model loss is not converged, updating the graph embedding feature extraction model according to the model loss, and returning to the execution step: and acquiring the graph embedding feature extraction model and user data of user nodes corresponding to the target users.
Optionally, the model loss includes at least one of a classification loss and an unsupervised loss, and the federal learning modeling optimization apparatus is further configured to:
embedding features into the target graph, inputting the features into a preset classification model, and obtaining an output classification label; acquiring a preset classification label corresponding to the user data, and calculating classification loss according to the preset classification label and the output classification label; and/or
And calculating the unsupervised loss corresponding to the embedding features of the target graphs together according to a preset unsupervised loss function.
By adopting the federal learning modeling optimization device provided by the application, the technical problem that data privacy cannot be protected when a joint multi-party construction diagram is embedded into a feature extraction model is solved. Compared with the prior art, the beneficial effects of the federal learning modeling optimization device provided by the embodiment of the application are the same as those of the federal learning modeling optimization method provided by the embodiment, and other technical features in the federal learning modeling optimization device are the same as those disclosed by the embodiment method, which are not repeated herein.
EXAMPLE five
The embodiment of the application further provides a federated learning modeling optimization device, which is applied to the second party, the federated learning modeling optimization device includes:
the acquisition module is used for acquiring user relationship chain data which correspond to all target users together, wherein the target users are intersection users between the first party and the second party;
the receiving module is used for receiving the secret user intermediate features sent by the first party, wherein the secret user intermediate features are obtained by homomorphic encryption of user intermediate features output by a neural network of a graph embedding feature extraction model;
the encryption and aggregation module is used for respectively encrypting and aggregating the secret state user intermediate characteristics corresponding to all neighbor nodes of each user node according to the user relation chain data and the secret state user intermediate characteristics to obtain the secret state neighbor node aggregation characteristics corresponding to each user node;
and the sending module is used for sending the dense-state neighbor node aggregation features to the first party so as to enable the first party to decrypt to obtain the neighbor node aggregation features, respectively using the user intermediate features and the corresponding neighbor node aggregation features as the input of the next layer of neural network of the graph embedding feature extraction model, regenerating the user intermediate features corresponding to the user nodes until obtaining the user intermediate features output by the last layer of neural network of the graph embedding feature extraction model as target graph embedding features, and obtaining the federal graph embedding feature extraction model corresponding to the graph embedding feature extraction model according to the target graph embedding features.
By adopting the federal learning modeling optimization device provided by the application, the technical problem that data privacy cannot be protected when a joint multi-party construction diagram is embedded into a feature extraction model is solved. Compared with the prior art, the beneficial effects of the federal learning modeling optimization device provided by the embodiment of the application are the same as those of the federal learning modeling optimization method provided by the embodiment, and other technical features in the federal learning modeling optimization device are the same as those disclosed by the embodiment method, which are not repeated herein.
EXAMPLE six
An embodiment of the present application provides an electronic device, and the electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for federated learning modeling optimization in the first embodiment.
Referring now to FIG. 4, shown is a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 4, the electronic device may include a processing means (e.g., a central processing unit, a graphic processor, etc.) that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage means into a Random Access Memory (RAM). In the RAM, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing device, ROM and RAM are trained on each other via the bus. An input/output (I/O) interface is also connected to the bus.
Generally, the following systems may be connected to the I/O interface: input devices including, for example, touch screens, touch pads, keyboards, mice, image sensors, microphones, accelerometers, gyroscopes, and the like; output devices including, for example, Liquid Crystal Displays (LCDs), speakers, vibrators, and the like; storage devices including, for example, magnetic tape, hard disk, etc.; and a communication device. The communication means may allow the electronic device to communicate wirelessly or by wire with other devices to exchange data. While the figures illustrate an electronic device with various systems, it is to be understood that not all illustrated systems are required to be implemented or provided. More or fewer systems may alternatively be implemented or provided.
In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means, or installed from a storage means, or installed from a ROM. The computer program, when executed by a processing device, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
The electronic equipment provided by the application adopts the federated learning modeling optimization method in the embodiment, and solves the technical problem that data privacy cannot be protected when a joint multi-party construction diagram is embedded into a feature extraction model. Compared with the prior art, the beneficial effects of the electronic device provided by the embodiment of the application are the same as the beneficial effects of the federal learning modeling optimization method provided by the embodiment, and other technical features of the electronic device are the same as those disclosed by the embodiment method, which are not repeated herein.
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the foregoing description of embodiments, the particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
EXAMPLE seven
The present embodiment provides a computer readable storage medium having computer readable program instructions stored thereon for performing the method for federated learning modeling optimization in the first embodiment described above.
The computer readable storage medium provided by the embodiments of the present application may be, for example, a usb disk, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or a combination of any of the above. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present embodiment, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer-readable storage medium may be embodied in an electronic device; or may be present alone without being incorporated into the electronic device.
The computer readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring user data of a graph embedding feature extraction model and user nodes corresponding to target users, wherein the target users are intersection users between the first party and the second party; taking the user data as the input of a first-layer neural network of the graph embedding feature extraction model to generate user intermediate features corresponding to the user nodes; requesting to obtain neighbor node aggregation characteristics of all neighbor nodes of each user node from the second party in an encrypted state according to the user intermediate characteristics; respectively taking each user intermediate feature and the corresponding neighbor node aggregation feature as the input of the next-layer neural network of the graph embedding feature extraction model, and regenerating the user intermediate features corresponding to each user node; and returning to the execution step: requesting to the second party to acquire neighbor node aggregation characteristics of all neighbor nodes of each user node in an encrypted state according to each user intermediate characteristic until each user intermediate characteristic output by the last layer of neural network of the graph embedding characteristic extraction model is acquired, and taking the acquired user intermediate characteristic as a target graph embedding characteristic; and obtaining a federal graph embedded feature extraction model corresponding to the graph embedded feature extraction model according to each target graph embedded feature.
Or acquiring user relationship chain data corresponding to all target users together, wherein the target users are intersection users between the first party and the second party; receiving secret state user intermediate features sent by the first party, wherein the secret state user intermediate features are obtained by homomorphic encryption of user intermediate features output by a neural network of a graph embedding feature extraction model; according to the user relation chain data and the intermediate features of the secret-state users, respectively encrypting and aggregating the intermediate features of the secret-state users corresponding to all neighbor nodes of each user node to obtain the aggregation features of the secret-state neighbor nodes corresponding to each user node; and sending each dense-state neighbor node aggregation feature to the first party for decryption by the first party to obtain each neighbor node aggregation feature, taking each user intermediate feature and the corresponding neighbor node aggregation feature as the input of the next layer of neural network of the graph embedding feature extraction model, regenerating the user intermediate features corresponding to each user node until each user intermediate feature output by the last layer of neural network of the graph embedding feature extraction model is obtained as a target graph embedding feature, and obtaining the federal graph embedding feature extraction model corresponding to the graph embedding feature extraction model according to each target graph embedding feature.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the names of the modules do not in some cases constitute a limitation of the unit itself.
The computer-readable storage medium provided by the application stores computer-readable program instructions for executing the federated learning modeling optimization method, and solves the technical problem that data privacy cannot be protected when a joint multi-party construction graph is embedded in a feature extraction model. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the embodiment of the application are the same as the beneficial effects of the federal learning modeling optimization method provided by the embodiment, and are not repeated herein.
Example eight
The present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the method of federated learning modeling optimization as described above.
The computer program product solves the technical problem that data privacy cannot be protected when a joint multi-party construction diagram is embedded into a feature extraction model. Compared with the prior art, the beneficial effects of the computer program product provided by the embodiment of the application are the same as the beneficial effects of the federal learning modeling optimization method provided by the embodiment, and are not repeated herein.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims (11)

1. The federated learning modeling optimization method is characterized by being applied to a first party and comprising the following steps:
acquiring user data of a graph embedding feature extraction model and user nodes corresponding to target users, wherein the target users are intersection users between the first party and the second party;
using the user data as the input of the first-layer neural network of the graph embedding feature extraction model to generate user intermediate features corresponding to the user nodes;
requesting to obtain neighbor node aggregation characteristics of all neighbor nodes of each user node from the second party in an encrypted state according to the user intermediate characteristics;
respectively taking each user intermediate feature and the corresponding neighbor node aggregation feature as the input of the next-layer neural network of the graph embedding feature extraction model, and regenerating the user intermediate features corresponding to each user node;
and returning to the execution step: requesting to the second party to acquire neighbor node aggregation characteristics of all neighbor nodes of each user node in an encrypted state according to each user intermediate characteristic until each user intermediate characteristic output by the last layer of neural network of the graph embedding characteristic extraction model is acquired, and taking the acquired user intermediate characteristic as a target graph embedding characteristic;
and obtaining a federal graph embedded feature extraction model corresponding to the graph embedded feature extraction model according to each target graph embedded feature.
2. The federal learning modeling optimization method of claim 1, wherein the step of regenerating the user intermediate features corresponding to each of the user nodes by using each of the user intermediate features and the corresponding neighbor node aggregate features as inputs of a next-layer neural network of the graph-embedded feature extraction model comprises:
respectively performing feature splicing on each user intermediate feature and the corresponding neighbor node aggregation feature to obtain each splicing feature;
inputting each splicing feature into the next layer of neural network of the graph embedding feature extraction model respectively to obtain the neural network output corresponding to each user node;
and regenerating user intermediate characteristics corresponding to the user nodes by respectively normalizing the output of each neural network.
3. The federal learning modeling optimization method as claimed in claim 1, wherein said step of requesting the second party to obtain neighbor node aggregation characteristics of all neighbor nodes of each of the user nodes in an encrypted state according to each of the user intermediate characteristics comprises:
homomorphically encrypting each user intermediate feature into a secret user intermediate feature;
sending each secret-state user intermediate feature to the second party, so that the second party encrypts and aggregates the secret-state user intermediate features corresponding to all neighbor nodes of each user node into a secret-state neighbor node aggregation feature according to user relationship chain data and each secret-state user intermediate feature which are commonly corresponding to each target user;
and receiving the aggregation characteristics of the dense-state neighbor nodes sent by the second party, and decrypting the aggregation characteristics of the dense-state neighbor nodes respectively to obtain the aggregation characteristics of the neighbor nodes.
4. The federal learning modeling optimization method of claim 1, wherein after the step of obtaining a federal graph embedded feature extraction model corresponding to the graph embedded feature extraction model according to each target graph embedded feature, the federal learning modeling optimization method further comprises:
acquiring user data to be predicted on a target user node corresponding to a user to be predicted;
extracting a neighbor node aggregation feature set corresponding to all neighbor nodes of the target user node, wherein the neighbor node aggregation feature set is generated in the training process of the target graph embedding feature extraction model;
performing feature extraction based on federal learning on the user data to be predicted by taking the user data to be predicted and the neighbor node aggregation feature set as the input of the target graph embedding feature extraction model to obtain graph embedding features to be predicted corresponding to the target user node;
and inputting the embedded features of the graph to be predicted into a preset sample prediction model, and performing sample prediction on the features of the user to be predicted to obtain a target prediction result.
5. The federated learning modeling optimization method of claim 4, wherein the target graph embedded feature extraction model includes an input layer and hidden layers, the neighbor node aggregated feature set includes at least a neighbor node aggregated feature corresponding to a hidden layer,
the step of obtaining the embedding features of the graph to be predicted corresponding to the target user node by taking the user data to be predicted and the neighbor node aggregation feature set as the input of the target graph embedding feature extraction model and performing feature extraction based on federal learning on the user data to be predicted comprises the following steps of:
the user data to be predicted passes through the input layer, and the user characteristics to be predicted corresponding to the user data to be predicted are output;
determining neighbor node aggregation characteristics corresponding to a hidden layer behind the input layer, taking the user characteristics to be predicted and the neighbor node aggregation characteristics as the input of the hidden layer together, and outputting the intermediate characteristics of the user to be predicted corresponding to the user to be predicted;
and determining neighbor node aggregation characteristics corresponding to a next hidden layer, using the intermediate characteristics of the user to be predicted and the corresponding neighbor node aggregation characteristics as the input of the next hidden layer together, so as to regenerate the intermediate characteristics of the user to be predicted until the intermediate characteristics of the user to be predicted output by the last hidden layer are obtained, and using the intermediate characteristics of the user to be predicted as the graph embedding characteristics to be predicted.
6. The federal learning modeling optimization method of claim 1, wherein the step of obtaining the federal graph embedded feature extraction model corresponding to the graph embedded feature extraction model according to each target graph embedded feature includes:
judging whether the model loss calculated according to the embedding characteristics of each target graph is converged;
if the model loss is converged, taking the updated graph embedding feature extraction model as the federal graph embedding feature extraction model;
if the model loss is not converged, updating the graph embedding feature extraction model according to the model loss, and returning to the execution step: and acquiring the graph embedding feature extraction model and user data of user nodes corresponding to the target users.
7. The method for federated learning modeling optimization as set forth in claim 6, wherein the model losses include at least one of classification losses and unsupervised losses,
before the step of determining whether the model loss calculated according to each target graph embedding feature converges, the federal learning modeling optimization method further includes:
embedding features into the target graph, inputting the features into a preset classification model, and obtaining an output classification label; acquiring a preset classification label corresponding to the user data, and calculating classification loss according to the preset classification label and the output classification label; and/or
And calculating the unsupervised loss corresponding to the embedding features of the target graphs together according to a preset unsupervised loss function.
8. The federated learning modeling optimization method is applied to a second party, and comprises the following steps:
acquiring user relation chain data which correspond to all target users together, wherein the target users are intersection users between the first party and the second party;
receiving the intermediate features of the secret state user sent by the first party, wherein the intermediate features of the secret state user are obtained by homomorphic encryption of the intermediate features of the user output by a neural network of a graph embedding feature extraction model;
according to the user relation chain data and the intermediate features of the secret-state users, respectively encrypting and aggregating the intermediate features of the secret-state users corresponding to all neighbor nodes of each user node to obtain the aggregation features of the secret-state neighbor nodes corresponding to each user node;
and sending each dense-state neighbor node aggregation feature to the first party for decryption by the first party to obtain each neighbor node aggregation feature, taking each user intermediate feature and the corresponding neighbor node aggregation feature as the input of the next layer of neural network of the graph embedding feature extraction model, regenerating the user intermediate features corresponding to each user node until each user intermediate feature output by the last layer of neural network of the graph embedding feature extraction model is obtained as a target graph embedding feature, and obtaining the federal graph embedding feature extraction model corresponding to the graph embedding feature extraction model according to each target graph embedding feature.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the federal learning modeling optimization methodology of any of claims 1 to 8.
10. A computer-readable storage medium having stored thereon a program for implementing a federal learning modeling optimization method, the program being executable by a processor to perform the steps of the federal learning modeling optimization method as claimed in any one of claims 1 to 8.
11. A computer program product comprising a computer program, wherein the computer program when executed by a processor implements the steps of the federal learning modeling optimization method as claimed in any of claims 1 to 8.
CN202210240863.0A 2022-03-10 2022-03-10 Federal learning modeling optimization method, electronic device, medium, and program product Pending CN114595474A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210240863.0A CN114595474A (en) 2022-03-10 2022-03-10 Federal learning modeling optimization method, electronic device, medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210240863.0A CN114595474A (en) 2022-03-10 2022-03-10 Federal learning modeling optimization method, electronic device, medium, and program product

Publications (1)

Publication Number Publication Date
CN114595474A true CN114595474A (en) 2022-06-07

Family

ID=81818234

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210240863.0A Pending CN114595474A (en) 2022-03-10 2022-03-10 Federal learning modeling optimization method, electronic device, medium, and program product

Country Status (1)

Country Link
CN (1) CN114595474A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203487A (en) * 2022-09-15 2022-10-18 深圳市洞见智慧科技有限公司 Data processing method based on multi-party security graph and related device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203487A (en) * 2022-09-15 2022-10-18 深圳市洞见智慧科技有限公司 Data processing method based on multi-party security graph and related device
CN115203487B (en) * 2022-09-15 2022-12-20 深圳市洞见智慧科技有限公司 Data processing method based on multi-party security graph and related device

Similar Documents

Publication Publication Date Title
CN114091617B (en) Federal learning modeling optimization method, electronic device, storage medium, and program product
CN110245510B (en) Method and apparatus for predicting information
US20220230071A1 (en) Method and device for constructing decision tree
CN111428887B (en) Model training control method, device and system based on multiple computing nodes
CN113627085A (en) Method, apparatus, medium, and program product for optimizing horizontal federated learning modeling
CN112149706B (en) Model training method, device, equipment and medium
US20240249004A1 (en) Method and apparatus for data protection, readable medium and electronic device
CN112149174B (en) Model training method, device, equipment and medium
CN113051239A (en) Data sharing method, use method of model applying data sharing method and related equipment
CN111563267A (en) Method and device for processing federal characteristic engineering data
CN114006769A (en) Model training method and device based on horizontal federal learning
CN113722738B (en) Data protection method, device, medium and electronic equipment
CN114595474A (en) Federal learning modeling optimization method, electronic device, medium, and program product
WO2022012178A1 (en) Method for generating objective function, apparatus, electronic device and computer readable medium
CN115277197B (en) Model ownership verification method, electronic device, medium and program product
CN112149834B (en) Model training method, device, equipment and medium
CN115205089B (en) Image encryption method, training method and device of network model and electronic equipment
CN112149141A (en) Model training method, device, equipment and medium
CN112434064B (en) Data processing method, device, medium and electronic equipment
CN111709784A (en) Method, apparatus, device and medium for generating user retention time
CN115470908A (en) Model security inference method, electronic device, medium, and program product
CN115829729B (en) Three-chain architecture-based supply chain financial credit evaluation system and method
CN116758661B (en) Intelligent unlocking method, intelligent unlocking device, electronic equipment and computer readable medium
CN115311023A (en) Transverse federated model construction optimization method, electronic device, medium, and program product
CN116596092A (en) Model training method, instant pushing method, device, medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication