CN111950740B - Method and device for training federal learning model - Google Patents

Method and device for training federal learning model Download PDF

Info

Publication number
CN111950740B
CN111950740B CN202010651671.XA CN202010651671A CN111950740B CN 111950740 B CN111950740 B CN 111950740B CN 202010651671 A CN202010651671 A CN 202010651671A CN 111950740 B CN111950740 B CN 111950740B
Authority
CN
China
Prior art keywords
training
node
secret share
model
learning model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010651671.XA
Other languages
Chinese (zh)
Other versions
CN111950740A (en
Inventor
夏家骏
鲁颖
张珣
沈敏均
陈楚元
张佳辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhishu Beijing Technology Co ltd
Original Assignee
Guangzhishu Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhishu Beijing Technology Co ltd filed Critical Guangzhishu Beijing Technology Co ltd
Priority to CN202010651671.XA priority Critical patent/CN111950740B/en
Publication of CN111950740A publication Critical patent/CN111950740A/en
Application granted granted Critical
Publication of CN111950740B publication Critical patent/CN111950740B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Storage Device Security (AREA)

Abstract

The application discloses a method and a device for training a federated learning model, wherein an execution main body is a target node, and the method comprises the following steps: in the process of training of a federal learning model, a first training node in a disconnection state is obtained, and a secret share of a substitute intermediate result of the first training node is obtained from a local cache; obtaining a first loss function secret share of the first training node on the target node according to the substitute intermediate result secret share; receiving a second intermediate result secret share sent by the remaining participating nodes in the online state, and acquiring a second loss function secret share of the remaining participating nodes on the target node according to the second intermediate result secret share; and obtaining a loss function of the federal learning model according to the first gradient secret share, the second gradient secret share and the loss function of the target node. Therefore, if the training node is disconnected, the target node acquires the secret share substituting the intermediate result from the local cache to continue training, and the influence caused by the disconnection of the node is reduced.

Description

Method and device for training federal learning model
Technical Field
The application relates to the technical field of data processing, in particular to a method and a device for training a federated learning model.
Background
At present, in the process of enterprise digital transformation development, data-driven business innovation plays a crucial role in promotion. In order to break a data island and improve the data use quality, data cooperation between mechanisms is gradually frequent. Federal learning is a feasible solution which can meet privacy protection and data security, and private data of all parties cannot be locally generated through homomorphic encryption, secret sharing and other modes, so that joint calculation and modeling are realized. On the other hand, in the process of training the model, the connection state of the training nodes is also an important factor influencing the effectiveness of model training. Therefore, how to give consideration to effective training of the federal learning model and ensure that the training node is disconnected does not result in the fact that the model training process cannot be completed smoothly becomes one of important research directions.
Disclosure of Invention
The present application is directed to solving, at least in part, one of the technical problems in the related art.
Therefore, a first objective of the present application is to provide a method for training a federated learning model, which is used to solve the technical problem that the model training process cannot be smoothly completed due to the fact that training nodes are disconnected while effective training of the federated learning model cannot be considered in the existing method for training the federated learning model.
The second purpose of the invention is to provide a training device for the federated learning model.
A third object of the invention is to propose an electronic device.
A fourth object of the invention is to propose a computer-readable storage medium.
In order to achieve the above object, an embodiment of the first aspect of the present application provides a method for training a federated learning model, where an execution subject is a target node, where the target node is any one of a training node and a coordination node that participate in the federated learning model training and are in an online state, and the method includes the following steps: in the training process of the federated learning model, a first training node in a dropped line state is obtained, and a substitute intermediate result secret share of the first training node is obtained from a local cache, wherein the substitute intermediate result secret share is a first intermediate secret share generated when training is completed before the first training node cached in the local cache is dropped line; obtaining a first loss function secret share of the first training node on the target node according to the substitute intermediate result secret share; receiving a second intermediate result secret share sent by the remaining participating nodes in an online state, and acquiring a second loss function secret share of the remaining participating nodes on the target node according to the second intermediate result secret share, wherein the remaining participating nodes comprise remaining training nodes and the coordinating node; and obtaining a loss function on each training node according to the first loss function secret share, the second loss function secret share and a third loss function secret share of the target node.
In addition, the method for training the federal learning model according to the above embodiment of the present application may have the following additional technical features:
according to an embodiment of the present application, the obtaining a first training node in a dropped state and obtaining a substitute intermediate result secret share of the first training node from a local cache includes: acquiring first prompt information for prompting that the training node is disconnected, wherein the first prompt information at least comprises identification information and disconnection time of the first training node; and acquiring the substitute intermediate result secret share of the first training node from the local cache according to the offline time.
According to an embodiment of the present application, the obtaining the substitute intermediate result secret share of the first training node from the local cache according to the drop time includes: determining whether the current model training of the first training node is finished or not according to the offline time; if the disconnection time indicates that the model training is not finished, taking a first intermediate result secret share generated by the first training node in the last model training as the substitute intermediate result secret share; and if the offline time indicates that the current model training is completed, taking a first intermediate result secret share generated by the first training node during the current model training as the substitute intermediate result secret share.
According to an embodiment of the present application, when the target node is any training node, the obtaining a loss function of the federal learning model according to the first loss function secret shares, the second loss secret shares, and a third loss secret share of the target node itself includes: and sending the first loss function secret share, the second loss function secret share and the third loss function secret share to the coordination node, and obtaining the loss function of each training node by the coordination node according to the loss function secret share of each training node.
According to an embodiment of the present application, when the target node is a coordinating node, the method further includes: acquiring second prompt information of the first training node going online again; sending initial model parameters to the first training node; coordinating each participating node to send the cached gradient secret share of the first training node after model training each time before the first training node is disconnected, so that the first training node obtains a target model parameter of a first learning model on the first training node according to the initial model parameter and the gradient secret share of the first training node after model training each time before the first training node is disconnected.
According to one embodiment of the application, when the first training node is disconnected and is not on-line again all the time after the Federal learning model training is finished, whether the target node is an authorized node or not is identified; if the target node is the authorized node, obtaining initial model parameters of the first training node and gradient secret shares of each time of model training before the first training node is disconnected and cached by each participating node; obtaining target model parameters of the first training node according to the initial model parameters and the gradient secret shares after each model training before the disconnection; the authorization node is authorized by the first training node and used for recovering model parameters of a first learning model on the first training node, and the authorization node is the training node or the coordination node.
According to an embodiment of the present application, after obtaining the target model parameter of the first training node, the method further includes: acquiring prediction data sent by a prediction node; and generating a second intermediate result corresponding to the first training node according to the target model parameters and the prediction data, and feeding back the second intermediate result to the prediction node, so that the prediction node obtains the prediction result of the prediction data corresponding to the first training node according to the second intermediate result.
According to an embodiment of the present application, the obtaining a target model parameter of the first training node according to the initial model parameter and the gradient secret share after each model training before the disconnection includes: obtaining gradient information after each model training before the disconnection according to the gradient secret share after each model training before the disconnection of each participating node; based on a gradient descent algorithm, according to the gradient information before the disconnection, the initial model parameters are gradually updated to obtain the target model parameters.
According to an embodiment of the application, the method further comprises: if the target node is not the authorized node, obtaining a first parameter secret share of the initial model parameter of the first training node; and acquiring a second parameter secret share of the target model parameter according to the first parameter secret share and the gradient secret share after model training of each time before the first training node is disconnected, which is cached on the target node.
According to an embodiment of the present application, after obtaining the second parameter secret share of the target model parameter, the method further includes: acquiring a data secret share of prediction data sent by a prediction node and aiming at the first training node; and generating a second intermediate result secret share corresponding to the first training node according to the data secret share and the second parameter secret share corresponding to the first training node, and feeding back the second intermediate result secret share to the prediction node, so that the prediction node obtains a prediction result of the prediction data corresponding to the first training node according to a preset number of second intermediate result secret shares.
According to an embodiment of the present application, the obtaining a second parameter secret share of the target model parameter according to the first parameter secret share and a gradient secret share after each model training before the first training node is disconnected, which is cached on the target node, includes: based on a gradient descent algorithm, according to the gradient secret share after each model training before the disconnection, the first parameter secret share is updated successively to obtain the second parameter secret share.
According to an embodiment of the application, the method further comprises: before the first training node is not on-line, the first intermediate result secret share cached the last time before off-line is taken as the target intermediate result secret share.
According to an embodiment of the application, the method further comprises: and acquiring the number of the training nodes in an online state, and terminating the training of the federal learning model if the number of the training nodes is less than or equal to a preset number.
The embodiment of the first aspect of the application provides a method for training a federal learning model, which can be used for training the federal learning model based on a secret sharing algorithm on the premise of a semi-honest assumption. When a training node with a dropped wire is in a training process of the federal learning model, an intermediate result generated by training of the dropped training node is abnormal, at the moment, other online participating nodes can obtain a substitute intermediate result secret share of the first training node from a local cache, and continue a subsequent training process of the federal learning model according to the substitute intermediate result secret share, so that a loss function of the federal learning model can be finally obtained, the influence on the effectiveness of the model training process due to the dropped wire of the training node is reduced, and the fact that even if the training node in a dropped wire state exists is ensured, the training of the federal learning model can be normally and smoothly realized.
In order to achieve the above object, an embodiment of a second aspect of the present application provides a training apparatus for a federated learning model, where the training apparatus for the federated learning model is disposed on a target node, where the target node is any one of a training node and a coordination node that participate in the federated learning model training and are in an online state; the federated learning model is trained based on a secret sharing algorithm; the training device of the federal learning model comprises: a first obtaining module, configured to obtain, during a training process of the federal learning model, a first training node in a dropped state, and obtain, from a local cache, a substitute intermediate result secret share of the first training node, where the substitute intermediate result secret share is a first intermediate secret share generated when training is completed before the first training node is dropped, and is cached in the local cache; a second obtaining module, configured to obtain a first loss function secret share of the first training node on the target node according to the substitute intermediate result secret share; a third obtaining module, configured to receive a second intermediate result secret share sent by remaining participating nodes in an online state, and obtain a second loss function secret share of the remaining participating nodes on the target node according to the second intermediate result secret share, where the remaining participating nodes include remaining training nodes and the coordinating node; and the fourth obtaining module is used for obtaining the loss function of the federal learning model according to the first loss function secret share, the second loss function secret share and the third loss function secret share of the target node.
In addition, the training device of the federal learning model according to the above embodiment of the present application may also have the following additional technical features:
according to an embodiment of the present application, the first obtaining module includes: a prompt information obtaining unit, configured to obtain first prompt information used for prompting that the training node is disconnected, where the first prompt information at least includes identification information of the first training node and disconnection time; a cache obtaining unit, configured to obtain the substitute intermediate result secret share of the first training node from the local cache according to the offline time.
According to an embodiment of the present application, the cache retrieving unit includes: the determining subunit is used for determining whether the current model training of the first training node is finished or not according to the offline time; and the cache obtaining subunit is configured to, when the offline time indicates that the secondary model training is not completed, use a first intermediate result secret share generated by the first training node in the last model training as the substitute intermediate result secret share, and when the offline time indicates that the secondary model training is completed, use a first intermediate result secret share generated by the first training node in the secondary model training as the substitute intermediate result secret share.
According to an embodiment of the present application, when the target node is any training node, the fourth obtaining module is further configured to: respectively obtaining loss function secret shares corresponding to the first gradient secret share, the second gradient secret share and the third gradient secret share, sending the loss function secret shares to the coordination node, and obtaining the loss function by the coordination node according to the loss function secret shares.
According to an embodiment of the present application, when the target node is a coordinating node, the apparatus for training the federal learning model further includes: a fifth obtaining module, configured to obtain a second prompt message that the first training node is on-line again; a sending module, configured to send an initial model parameter to the first training node; and the coordination module is used for coordinating the cached gradient secret share of the first training node after each model training before the first training node is disconnected to the first training node, so that the first training node obtains the target model parameter of the first learning model on the first training node according to the initial model parameter and the gradient secret share of the first training node after each model training before the disconnection.
According to an embodiment of the present application, further comprising: the authorization identification module is used for identifying whether the target node is an authorized node or not when the first training node is disconnected and is not on-line again all the time after the Federal learning model training is finished; a sixth obtaining module, configured to obtain, if the target node is the authorized node, an initial model parameter of the first training node and a gradient secret share after each model training before the first training node is disconnected, where the gradient secret share is cached by each participating node; a seventh obtaining module, configured to obtain a target model parameter of the first training node according to the initial model parameter and the gradient secret share after each model training before the disconnection; the authorized node is authorized by the first training node and is used for recovering model parameters of a first learning model on the first training node, and the authorized node is the training node or the coordinating node.
According to an embodiment of the present application, further comprising: the eighth obtaining module is used for obtaining the prediction data sent by the prediction node after obtaining the target model parameter of the first training node; and the first prediction module is used for generating a second intermediate result corresponding to the first training node according to the target model parameter and the prediction data and feeding the second intermediate result back to the prediction node so that the prediction node can obtain the prediction result of the prediction data corresponding to the first training node according to the second intermediate result.
According to an embodiment of the present application, the seventh obtaining module includes: the gradient information acquisition unit is used for acquiring gradient information after each model training before the disconnection according to the gradient secret share of each participating node after each model training before the disconnection; and the parameter updating unit is used for gradually updating the initial model parameters according to the gradient information before disconnection based on a gradient descent algorithm so as to obtain the target model parameters.
According to an embodiment of the present application, further comprising: a ninth obtaining module, configured to obtain a first parameter secret share of an initial model parameter of the first training node if the target node is not the authorized node; a tenth obtaining module, configured to obtain a second parameter secret share of the target model parameter according to the first parameter secret share and a gradient secret share of the target node after each model training before the first training node is disconnected.
According to an embodiment of the present application, further comprising: an eleventh obtaining module, configured to obtain a data secret share of the prediction data sent by the prediction node for the first training node after obtaining the second parameter secret share of the target model parameter; and the second prediction module is used for generating a second intermediate result secret share corresponding to the first training node according to the data secret share corresponding to the first training node and the second parameter secret share, and feeding back the second intermediate result secret share to the prediction node, so that the prediction node obtains the prediction result of the prediction data corresponding to the first training node according to a preset number of the second intermediate result secret shares.
According to an embodiment of the present application, the tenth obtaining module is further configured to: based on a gradient descent algorithm, according to the gradient secret shares after each model training before the disconnection, the first parameter secret shares are updated successively to obtain the second parameter secret shares.
According to an embodiment of the application, the second obtaining module is further configured to: before the first training node is not on-line, taking the first intermediate result secret share cached the last time before the line was dropped as the target intermediate result secret share.
According to an embodiment of the present application, further comprising: and the training termination module is used for acquiring the number of the training nodes in an online state, and terminating the training of the federal learning model if the number of the training nodes is less than or equal to a preset number.
The embodiment of the second aspect of the application provides a device for training a federal learning model, which can train the federal learning model based on a secret sharing algorithm on the premise of a semi-honest assumption. When a training node with a dropped line is led out in the training process of the federal learning model, an intermediate result generated by the training of the dropped training node is lost, at the moment, other online participating nodes can obtain a substitute intermediate result secret share of the first training node from a local cache, and continue the subsequent training process of the federal learning model according to the substitute intermediate result secret share, so that a loss function of the federal learning model can be finally obtained, the influence on the effectiveness of the model training process due to the dropped line of the training node is reduced, and the training of the federal learning model can be normally and smoothly realized even if the training node in the dropped line state exists.
In order to achieve the above object, an embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for training a federated learning model as described in any of the embodiments of the first aspect of the present application when executing the program.
In order to achieve the above object, a fourth aspect of the present application provides a computer-readable storage medium, which when executed by a processor, implements a method for training a federal learning model as defined in any one of the first aspect of the present application.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a method for training a federated learning model as disclosed in one embodiment of the present application;
FIG. 2 is a schematic flow chart diagram illustrating a method for training a federated learning model as disclosed in another embodiment of the present application;
FIG. 3 is a schematic flow chart diagram illustrating a method for training a federated learning model as disclosed in another embodiment of the present application;
FIG. 4 is a schematic flow chart diagram illustrating a method for training a federated learning model as disclosed in another embodiment of the present application;
FIG. 5 is a schematic flow chart diagram illustrating a method for training a federated learning model as disclosed in another embodiment of the present application;
FIG. 6 is a schematic flow chart diagram illustrating a method for training a federated learning model as disclosed in another embodiment of the present application;
FIG. 7 is a schematic flow chart diagram illustrating a method for training a federated learning model as disclosed in another embodiment of the present application;
FIG. 8 is a schematic flow chart diagram illustrating a method for training a federated learning model as disclosed in another embodiment of the present application;
FIG. 9 is a schematic flow chart diagram illustrating a method for training a federated learning model as disclosed in another embodiment of the present application;
FIG. 10 is a schematic flow chart diagram illustrating a secret sharing algorithm disclosed in one embodiment of the present application;
FIG. 11 is a schematic flow chart diagram illustrating a secret sharing algorithm disclosed in another embodiment of the present application;
FIG. 12 is a schematic flow chart diagram illustrating a secret sharing algorithm disclosed in another embodiment of the present application;
FIG. 13 is a schematic flow chart diagram illustrating a secret sharing algorithm disclosed in another embodiment of the present application;
FIG. 14 is a schematic flow chart diagram illustrating a secret sharing algorithm disclosed in another embodiment of the present application;
FIG. 15 is a schematic flow chart diagram illustrating a secret sharing algorithm according to another embodiment of the present disclosure;
FIG. 16 is an architecture diagram of federated learning model training as disclosed in one embodiment of the present application;
FIG. 17 is a schematic illustration of a training apparatus for the federated learning model disclosed in one embodiment of the present application;
FIG. 18 is a schematic diagram of a training apparatus for a Federal learning model according to another embodiment disclosed herein;
FIG. 19 is an architecture diagram of a cloud platform disclosed in one embodiment of the present application;
fig. 20 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to better understand the above technical solution, exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The following describes a method and an apparatus for training a federal learning model according to an embodiment of the present application with reference to the drawings.
Fig. 1 is a flow chart illustrating a method for training a federal learning model according to an embodiment of the present application. And taking a target node participating in the training of the federal learning model as an execution subject, wherein the target node is any one of a training node and a coordination node participating in the training of the federal learning model and in an online state.
The federal learning model training method provided by the application trains a federal learning model based on a Secret Sharing (Secret Sharing) algorithm, such as a Shamir algorithm, on the premise of a semi-honest assumption. It should be noted that the federal learning model is trained based on a secret sharing algorithm, each training node participating in the training of the federal learning model takes local sample data and intermediate results generated by each training as secrets, then the secrets are shared with all the participating nodes participating in the training of the federal learning model, and it should be noted that the training nodes can also obtain a secret segment. That is, each of the participating nodes may obtain a data secret share and an intermediate secret result share of sample data that is shared by the training nodes in a secret manner. In the federal learning model, the training nodes are labeled training nodes, and at this time, the labeled training nodes also need to share sample data, perform secret sharing processing on the label data of the sample, and then share the sample data to all participating nodes in a secret manner.
The semi-honest assumption means that in the process of model training, all the participating nodes can accurately calculate according to the protocol, but all intermediate results can be recorded at the same time to derive additional information.
The secret sharing algorithm means that the shared secret can be reasonably distributed among a group, so that the secret can be commonly managed by all participants.
The federated learning model can be a longitudinal federated learning model, and the longitudinal federated learning model (hereinafter referred to as a federated learning model) refers to a federated learning model which is used for machine learning by combining different participating nodes and under the scene that the data dimensions of the participating nodes are inconsistent.
It should be noted that the method for training the federal learning Model provided by the application is suitable for the federal learning Model, and is also suitable for a federal learning system in which a plurality of machine learning models such as a Logistic Regression Model, a tree structure Model, a neural network Model and the like are established. Particularly, when a logistic regression model (namely a longitudinal federal learning model using the logistic regression model) is established on the federal learning system, the method provided by the application has more obvious effect.
It should be noted that, in the process of training the joint learning model, the related nodes may be preliminarily divided into: a helper node (non-data owner) and a training node (data owner), both types of nodes. The training nodes are corresponding to local learning models, so that the local learning models are trained by inputting respective samples into the corresponding local learning models. The local learning model is a part of the federal learning model, and the local learning models of the training nodes form the integral federal learning model.
The private user data owned by each training node comprises data aiming at different aspects of the same user. For example, one of the training nodes is company a, which has user data of salary data, working years and promotion opportunities of 100 employees in company a, and the other training node is bank B, which has user data of consumption records, fixed assets and personal credit lines of 100 employees in company a. The user data owned by company a and bank B are from the same group of users, that is, 100 employees in company a data are consistent with 100 employees in bank B data.
As shown in fig. 1, the method for training the federal learning model proposed in the embodiment of the present application is explained, which specifically includes the following steps:
s101, in the process of training a federal learning model, a first training node in a disconnection state is obtained, and a substituted intermediate result secret share of the first training node is obtained from a local cache, wherein the substituted intermediate result secret share is a first intermediate secret share generated when training is completed before the first training node cached in the local cache is disconnected.
The target node can be any one of a training node which participates in the training of the federal learning model and is in an online state, and can also be a coordination node which participates in the training of the federal learning model and is in the online state.
In the embodiment of the application, in the process of model training, the assisting node can detect the connection state of each training node in real time or periodically, and when the training node is identified to be disconnected, each training node is notified, so that the target node can acquire the first training node in the disconnected state after acquiring the notification of the coordinating node. Optionally, the training nodes may also monitor each other to determine whether other training nodes are online, and when a training node is offline, the other training nodes may also directly monitor the offline condition of the training node.
Before the first training node is disconnected, the secret share of the intermediate result generated when the first training node completes the last training before the disconnection is cached on the target node. After the first training node is disconnected, the intermediate result generated during the training of the first training node may be abnormal, and therefore, in order to ensure that the disconnection of the first training node does not affect the training of the subsequent federated learning model, the application obtains the last cached intermediate result secret share of the first training node from the local cache, replaces the intermediate result secret share generated in each training before the first training node is re-on-line with the cached intermediate result secret share, and herein, obtaining the intermediate result secret share from the cache is referred to as a replaced intermediate result secret share. In the application, the calculation of the gradient information and the loss function is participated by the substitute intermediate result secret share, so that the updating of model parameters on other training nodes is conveniently realized, and a converged federal learning model can be finally trained.
S102, obtaining a first loss function secret share of the first training node on the target node according to the substitute intermediate result secret share.
In the learning model training, a loss function of the model is related to an intermediate result generated by the model training and tag data, and in the embodiment of the present application, after obtaining a substitute intermediate result secret share, the target node may obtain a first loss function secret share of the first training node on the target node according to the substitute intermediate result secret share and a tag secret share shared by the training node with the tag to the target node.
S103, receiving a second intermediate result secret share sent by the participating node in the online state, and acquiring a second lost secret share of the rest participating nodes on the target node according to the second intermediate result secret share.
In the embodiment of the application, a second intermediate result secret share sent by the remaining participating nodes in an online state may be received, and after the second intermediate result secret share is obtained, a second loss function secret share of the remaining participating nodes may be obtained according to the second intermediate result secret share and a label secret share that a training node with a label is secretly shared to a target node.
S104, obtaining the loss function of each training node according to the first loss function secret share, the second loss function secret share and the third loss function secret share on the target node.
In this embodiment, if the target node is a training node in an online state, the target node may send the first loss function secret share, the second loss function secret share, and the third loss function secret share to the coordination node. Further, the assisting nodes receive the secret shares of the loss functions of the same learning model sent by the training nodes. For the same learning model, the assisting node recovers the loss function of the same learning model according to the received loss function secret share of the same learning model, namely obtains the loss function of the training node corresponding to the same learning model.
If the target node is an assisting node, aiming at the same learning model, the coordination node acquires the secret share of the loss function of the same learning model from each training node, and further recovers the loss function of the same learning model from the secret share of the loss function of the same learning model.
It should be noted that, in the secret sharing algorithm, the condition that the secret can be recovered is that a certain number of secret shards are required to be obtained at the same time to recover the secret. The target node needs to acquire the secret share of the loss function of certain data to recover the loss function.
Therefore, on the premise of semi-honest assumption, the federal learning model can be trained on the basis of the secret sharing algorithm. When a training node with an outgoing line and a dropped line is in the training process of the federal learning model, an intermediate result generated by the training of the dropped line training node is lost, at the moment, other online participating nodes can obtain a substitute intermediate result secret share of the first training node from a local cache, and continue the subsequent training process of the federal learning model according to the substitute intermediate result secret share, so that a loss function of the federal learning model can be finally obtained, the influence on the effectiveness of the model training process due to the dropped line of the training node is reduced, and the fact that even if the training node in the dropped line state exists is ensured, the training of the federal learning model can be normally and smoothly realized.
It should be noted that, for any training node (including a training node serving as a target node) that is on-line, gradient information of the local learning model may be recovered according to gradient secret shares obtained from other participating nodes about the local learning model. In general, the gradient information of the learning model is a vector, the direction of the gradient information refers to the direction in which the interior point of the domain defined by the function changes, and the modulus of the gradient information is the maximum value of the directional derivative. In the process of learning model training, the loss function is the minimum value and is the optimization target of model training, and at this time, by acquiring the gradient information of the local learning model, the model parameter updating strategy (including the adjustment direction and the adjustment step length) can be determined to ensure that the loss function can be reduced most quickly. Therefore, after the gradient information is recovered, the model parameters of the local learning model can be updated according to the gradient information.
The following explains the update process of model parameters by taking the target node as an example. Firstly, after a set number of gradient secret shares about a local learning module are obtained by a target node, gradient information of the local learning module can be recovered based on a recovery algorithm. The set number of gradient secret shares may be from part of the remaining training nodes, or from part of the remaining training nodes and the assisting nodes, or from part of the remaining training nodes, the assisting nodes and the target node, and it is only necessary to ensure that the number of the received gradient secret shares reaches the set number. The recovery algorithm is not limited in the present application, and may be selected according to actual situations, for example, a super-convergence cell block gradient recovery algorithm, a gradient recovery algorithm based on solving a unilateral optimal problem, and the like may be selected.
After obtaining the gradient information of the local learning model, the target node may determine an adjustment direction and an adjustment step size of a model parameter in the local learning model according to the gradient information, and update the model parameter in the local learning model according to the adjustment direction and the adjustment step size. Further, after the local learning model is updated, the local learning model may continue to be trained to generate a converged local learning model. It should be noted that each training node can recover the gradient information of its own local learning model in the manner of the target training node.
Therefore, on the premise of semi-honest assumption, the method and the device can train the Federal learning model based on the secret sharing algorithm, so that attackers (including external attackers and residual training nodes) can recover gradient information only by obtaining a certain number of gradient secret shares at the same time, the risk that the attackers collude to steal private data is reduced, and the safety in the model training process is improved. Meanwhile, when part of gradient secret shares are lost or destroyed, other gradient secret shares can still be used for recovering gradient information, and the reliability in the model training process is improved. Furthermore, the Federal learning model is trained based on a secret sharing algorithm, the operation of gradient secret shares can be completed quickly, and the operation time is greatly shortened. Furthermore, only the assisting nodes can recover the loss function in the whole model training process, whether the first learning model is converged is identified according to the loss function, and indication information is sent to the target node according to the identification result, so that participation of each training node is avoided, reliability of the indication information fed back by the assisting nodes and received by each training node is ensured, and effectiveness of model training is further improved.
As a possible implementation manner, as shown in fig. 2, the process of acquiring the first training node in the dropped state and acquiring the substitute intermediate result secret share of the first training node from the local cache in step S102 specifically includes the following steps:
s201, first prompt information used for prompting that the training node is disconnected is obtained, wherein the first prompt information at least comprises identification information of the first training node in the disconnected state and the disconnected time.
In the embodiment of the application, in the process of model training, the assisting node can detect the connection state of each training node in real time or periodically, and when the training node is identified to be disconnected, first prompt information for prompting that the training node is disconnected is sent to the target node. Accordingly, the target node may obtain the first hint information.
Optionally, if the target node is a training node in an online state, when the assisting node detects that any training node is offline, the assisting node may send first prompt information prompting that the training node is offline to the target node, and correspondingly, the target node may obtain the first prompt information.
Optionally, if the target node is an assisting node, when the assisting node detects that any training node is disconnected, the assisting node may directly generate first prompt information for prompting that the training node is disconnected.
It should be noted that the first prompt message at least includes identification information of the first training node in the dropped state and the dropped time of the first training node. The identification information is used for representing a specific training node in a dropped state, and the dropped time is used for representing the specific dropped time of the first training node.
It should be noted that the identification information may be text information, such as "training node a" or "training node a", or may also be number information, such as "1001" or "1002"; the off-line time may be a specific step in the current training period, for example, "obtain a loss function of the federal learning model according to the secret share of the gradient", or may be a sequence number of steps in the current training period, "third step", "fourth step", or the like.
For example, during the training process of the federal model, there are five nodes involved in the training, including: training node a to training node d and assisting node. Before obtaining the loss function according to the gradient secret share in the current training period, detecting a connection state from a training node A to a training node D through an assisting node, and determining that a training node C is disconnected, at this time, the assisting node can send first prompt information to the training node A (namely a target node) in an online state, and correspondingly, the training node A can receive the first prompt information, wherein the first prompt information comprises: and the training node C is identification information of the first training node, and a loss function of the federated learning model, namely the disconnection time of the first training node in the current period, is obtained according to the gradient secret share.
S202, according to the offline time, the secret share of the substitute intermediate result of the first training node is obtained from the local cache.
As a possible implementation manner, it may be determined whether the first training node completes the model training at the current time according to the offline time. The judgment result of whether the first training node completes the current model training or not can be the identification result of whether the first training node sends the intermediate secret result share to the target node or not.
Alternatively, if it is recognized that the first training node has sent the intermediate secret result share to the target node, indicating that the first training node is finished when model training is completed, the first intermediate result secret share generated by the first training node from the last model training may be used as the substitute intermediate result secret share.
Alternatively, if it is identified that the first training node does not send the intermediate secret result share to the target node, indicating that the first training node is not finished training the model at the time, the first intermediate result secret share generated by the first training node training the model at the time may be used as the substitute intermediate result secret share.
Therefore, whether the first training node completes the model training at the current time or not can be identified, the stored first intermediate result secret share generated by the first training node in the previous model training is read from the local cache of the target node according to the identification result and used as the substitute intermediate result secret share, the target node can still normally and smoothly realize the acquisition of the loss function of the federal learning model even if the first training node in the offline state exists, and the reliability and the efficiency in the model training process are improved.
Therefore, the stored first intermediate result secret share generated by the last model training of the first training node is read from the local cache of the target node to serve as the substitute intermediate result secret share, so that after the target node is disconnected from the first training node, the target node can replace the operation of receiving the intermediate result secret share sent by the first training node with the read substitute intermediate result secret share, and the training of the federated learning model in the subsequent training period is carried out until the federated learning model converges.
Further, in practical applications, since the dropped connection of the first training node may be temporary, that is, the first training node may be on-line again, in this application, the connection state of the first training node may be continuously detected by the assisting node, and after the first training node is recognized to be on-line again, the assisting node assists the first training node to recover the training of the federal learning model.
As a possible implementation manner, as shown in fig. 3, on the basis of the foregoing embodiment, a process of assisting a first training node to acquire target model parameters specifically includes the following steps:
s301, second prompt information of the first training node which is on line again is obtained.
In the embodiment of the application, after the gradient secret share of the first training node on the target node is obtained, the assisting node can continue to detect the connection state of the first training node, and when the first training node is identified to be on-line again, second prompt information for prompting the first training node to be on-line again is sent to the target node. Accordingly, the target node may obtain the second prompt message.
Optionally, if the target node is a training node in an online state, when the assisting node detects that the first training node is online again, the assisting node may send second prompt information prompting that the first training node is online again to the target node, and correspondingly, the target node may obtain the second prompt information.
Optionally, if the target node is an assisting node, when the assisting node detects that the first training node is on-line again, the assisting node may directly generate second prompt information for prompting that the first training node is on-line again.
S302, sending the initial model parameters to the first training node.
In this embodiment of the application, after the first training node is reconnected, a request for recovering the initial model parameters may be sent to the coordinating node, and accordingly, the coordinating node may respond to the request, obtain the initial model parameters of the first training node from a local cache of the coordinating node, and send the encrypted initial model parameters to the first training node after encrypting the initial model parameters.
It should be noted that, in the present application, an encryption manner of the initial model parameter is not limited, and optionally, the first interference information generated by the first training node may be added to a request for recovering the initial model parameter, which is sent by the first training node to the assisting node, and accordingly, the assisting node may encrypt the initial model parameter of the first training node according to the first interference information and send the encrypted initial model parameter to the training node. Further, the first training node may obtain a decrypted initial model parameter through decryption processing, and remove the first interference information to obtain the initial model parameter.
Optionally, if the target node is a training node, the coordinating node may send the encrypted initial model parameter to the target node, and the target node sends the encrypted initial model parameter to the first training node.
Optionally, if the target node is a coordinating node, the coordinating node may directly send the encrypted initial model parameters to the first training node.
S303, coordinating each participating node to send the cached gradient secret share of the first training node after each model training before the first training node is disconnected, so that the first training node obtains a target model parameter of a first learning model on the first training node according to the initial model parameter and the gradient secret share of the first training node after each model training before the disconnection; the participating nodes comprise training nodes and coordination nodes.
In the embodiment of the application, each participating node can be coordinated to send the gradient secret share after each model training before the first training node cached by the participating node is disconnected to the first training node, correspondingly, the first training node can receive the gradient secret share sent by each participating node and the initial model parameters, and the model parameters of the first learning model on the first training node are restored according to the set restoration algorithm, so that the target model parameters of the first learning model can be restored.
Therefore, the method can continuously detect the connection state of the first training node through the assisting node, and after the first training node is identified to be on-line again, the initial model parameters are sent to the first training node, each participating node is coordinated to send the cached gradient secret share of the first training node after each model training before the first training node is off-line to the first training node, so that the first training node can obtain the target model parameters of the first learning model on the first training node according to the initial model parameters and the gradient secret share of the first training node after each model training before the first training node is off-line, after the first training node is on-line again, the information of each model training lost due to off-line can be safely recovered for the first training node, the first training node can participate in the subsequent model training process after being on-line again, and the first training node is prevented from losing all training information due to transient off-line, The model training system can not continuously participate in the subsequent model training after the model training system is on line again, and the reliability and the efficiency in the model training process are further improved.
Further, the dropped line of the first training node may be temporary or may be long, that is, the first training node may not be on-line again after the dropped line is ended until the federal learning model training is ended, so that in the application, the connection state of the first training node may be continuously detected by the assisting node, and after the first training node is identified to be on-line again after the federal learning model training is ended, the authorized node authorized by the first training node is assisted to acquire the target model parameter of the first training node.
As a possible implementation manner, as shown in fig. 4, on the basis of the foregoing embodiment, a process of assisting an authorized node authorized by a first training node to acquire a target model parameter of the first training node specifically includes the following steps:
s401, when the first training node is disconnected and is not on line again after the Federal learning model training is finished, whether the target node is an authorized node or not is identified.
In the embodiment of the application, after the gradient secret share of the first training node on the target node is obtained, the assisting node can continue to detect the connection state of the first training node, and when the assisting node is not on line again after the first training node is identified to be disconnected and the federal learning model is trained, whether the target node is an authorized node can be further identified. Optionally, the assisting node may identify whether the target node carries authorization information, and when the target node carries the authorization information, indicate that the target node is an authorization node, and perform step S402. The authorized nodes are authorized by the first training nodes and used for recovering model parameters of the first learning model on the first training nodes, and the authorized nodes are training nodes or coordination nodes.
It should be noted that before each training node participates in model training, an authorization node of each training node may be preset, so that after the training node is disconnected, the authorization node may recover the model parameter of the first learning model on the first training node.
For example, before the training node a participates in model training, the assisting node may be preset as an authorized node, so that after the training node a is disconnected, the assisting node may recover the model parameters of the first learning model on the training node a.
S402, if the target node is an authorized node, obtaining initial model parameters of a first training node and gradient secret shares after model training each time before the first training node cached by each participating node is disconnected, wherein the participating nodes comprise training nodes and coordination nodes.
In the embodiment of the application, after a first training node is disconnected and after the federal learning model training is finished, the first training node is not on-line again, and a target node is identified as an authorized node, the target node can send a request for recovering initial model parameters of the first training node and gradient secret shares after each model training before the first training node is disconnected and cached by each participating node to a coordinating node, correspondingly, the coordinating node can respond to the request, obtain the initial model parameters of the first training node from a local cache of the coordinating node, encrypt the initial model parameters, send the encrypted initial model parameters to the target node, and coordinate each participating node to send the cached gradient secret shares after each model training before the first training node is disconnected to the target node.
S403, acquiring target model parameters of the first training node according to the initial model parameters and the gradient secret share after each model training before disconnection; the authorized nodes are authorized by the first training nodes and used for recovering model parameters of the first learning model on the first training nodes, and the authorized nodes are training nodes or coordination nodes.
In the embodiment of the application, after the target node acquires the initial model parameters of the first training node and the gradient secret shares after each model training before disconnection, the model parameters of the first learning model on the first training node can be restored according to a set restoration algorithm, and the target model parameters of the first learning model can be restored.
As a possible implementation manner, as shown in fig. 5, on the basis of the foregoing embodiment, the process of obtaining the target model parameter of the first training node in step S403 specifically includes the following steps:
s501, obtaining gradient information after model training each time before disconnection according to the gradient secret share after model training each time before disconnection of each participated node.
In the embodiment of the application, after the gradient secret share after model training each time before each participating node is disconnected is obtained by the target node, because the gradient secret share after model training each time before each participating node is disconnected is a secret fragment of the gradient information before the first training model is disconnected, the target node can restore the gradient information before the first training model is disconnected according to a set restoration algorithm, and the gradient information before the first training model is disconnected can be restored.
S502, based on a gradient descent algorithm, according to gradient information before line drop, the initial model parameters are gradually updated to obtain target model parameters.
In the embodiment of the application, the target node can determine the adjustment direction and the adjustment step length of the model parameters in the local federal learning model based on a gradient descent algorithm according to the gradient information before the disconnection, and successively update the initial model parameters according to the adjustment direction and the adjustment step length to obtain the target model parameters.
Therefore, the method can continuously detect the connection state of the first training node through the assisting node, identify whether the target node is an authorized node after identifying that the first training node is disconnected and is not on-line again after the federal learning model training is finished, and when the target node is determined to be the authorized node, obtain the initial model parameters of the first training node and the gradient secret share of the first training node after each model training before each model training cached by each participating node is disconnected by the target node, and obtain the target model parameters of the first training node by the target node according to the initial model parameters and the gradient secret share of the first training node after each model training before disconnection, so that when the first training node authorizes the target node, the target node can safely recover the information of the first training node lost due to disconnection during each model training, the technical problem that all training information before the first training node is disconnected is wasted due to the fact that the first training node is disconnected is solved, and reliability and efficiency in the model training process are further improved.
It should be noted that, if the target node is an unauthorized node, the target node may participate in the calculation to obtain the secret share of the second parameter of the target model parameter, so that the authorized node may recover to obtain the target model parameter of the first training node according to the secret share of the second parameter fed back by each participating node.
As a possible implementation manner, as shown in fig. 6, on the basis of the foregoing embodiment, the process of obtaining the secret share of the second parameter of the target model parameter specifically includes the following steps:
s601, if the target node is not an authorized node, obtaining a first parameter secret share of the initial model parameter of the first training node.
And generating and transmitting the first parameter secret share after carrying out secret sharing processing on the initial model parameters by the coordination node.
In this embodiment of the application, if it is determined that the target node is an unauthorized node, the coordinating node may use the initial model parameter of the first training node as a secret, perform secret sharing to generate a secret share of the first parameter, and send the secret share of the first parameter to each training node (including the target node) in the on-line state.
Alternatively, if the target node is a training node, the coordinating node may send the first parameter secret share to the target node and the remaining online training nodes.
Optionally, if the target node is an assisting node, the assisting node may send the first parameter secret share to the remaining online training nodes.
S602, obtaining a second parameter secret share of the target model parameter according to the first parameter secret share and the gradient secret share after model training each time before the first training node is disconnected and cached on the target node.
In the embodiment of the application, after the target node obtains the first parameter secret share, an adjustment direction and an adjustment step length of the first parameter secret share may be determined based on a gradient descent algorithm according to the first parameter secret share and a gradient secret share after model training each time before the first training node is disconnected, which is cached on the target node, and the first parameter secret share is successively updated according to the adjustment direction and the adjustment step length to obtain a second parameter secret share of the target model parameter.
Therefore, the method and the device can be used for obtaining the secret share of the first parameter of the initial model parameter of the first training node and the secret share of the gradient after each model training before the first training node is disconnected, which is cached on the target node, when the target node is identified as the unauthorized node, and obtaining the secret share of the second parameter of the target model parameter, can be used for safely assisting the authorized node to recover information, which is lost due to disconnection, of the first training node during each model training when the first training node is not authorized, so that the technical problem that all training information before the first training node is disconnected is avoided, and the reliability and the efficiency in the model training process are further improved.
Further, after the training of the federal learning model is completed, a prediction stage can be entered to obtain a prediction result of prediction data corresponding to the first training node. The following explains the specific process of the prediction stage with respect to the target node being an authorized node and an unauthorized node, respectively.
For a target node being an authorized node, as shown in fig. 7, on the basis of the foregoing embodiment, a specific process after obtaining target model parameters of a first training node includes the following steps:
s701, acquiring the prediction data sent by the prediction node.
The prediction data may be any data that is intended to be predicted, such as user data, behavioral data, and item data, among others.
In the embodiment of the application, after the federal learning model is trained, the prediction node may send prediction data to the target node, and correspondingly, the target node may receive the prediction data sent by the prediction node.
S702, according to the target model parameters and the prediction data, generating a second intermediate result corresponding to the first training node and feeding the second intermediate result back to the prediction node, so that the prediction node obtains the prediction result of the prediction data corresponding to the first training node according to the second intermediate result.
In this embodiment, the target node may generate a second intermediate result corresponding to the first training node according to the target model parameter and the prediction data, and feed back the second intermediate result to the prediction node, so that the prediction node obtains the prediction result of the prediction data corresponding to the first training node according to the second intermediate result.
It should be noted that, if the target node is a training node, the target node may further generate a third intermediate result corresponding to the local federal learning model based on the local federal learning model and the prediction data, and feed back the third intermediate result to the prediction node, so that the prediction node obtains the prediction result of the prediction data corresponding to the target node according to the third intermediate result.
For the unauthorized target node, as shown in fig. 8, on the basis of the foregoing embodiment, a specific process after obtaining the target model parameters of the first training node includes the following steps:
s801, obtaining a data secret share of the prediction data aiming at the first training node and sent by the prediction node.
In the embodiment of the application, the prediction data of the first training node can be used as a secret, and a secret share of the data is generated through secret sharing processing and is sent to each participating node for sharing.
Optionally, after the training of the federal learning model is completed, the prediction node may send the generated data secret shares of the prediction data to the target node, and accordingly, the target node may receive the data secret shares of the prediction data sent by the prediction node.
S802, according to the secret shares of the data and the secret shares of the second parameters corresponding to the first training node, second secret shares of intermediate results corresponding to the first training node are generated and fed back to the prediction node, so that the prediction node can obtain the prediction result of the prediction data corresponding to the first training node according to the preset number of the secret shares of the second intermediate results.
In this embodiment of the application, the target node may generate a second intermediate result secret share corresponding to the first training node according to the data secret share corresponding to the first training node and the second parameter secret share, and feed back the second intermediate result secret share to the prediction node, so that the prediction node obtains the prediction result of the prediction data corresponding to the first training node according to the preset number of second intermediate result secret shares.
It should be noted that the secret shares of the second intermediate result fed back to the prediction node by each participating node are the secret fragments of the second intermediate result, so that the prediction node can recover the second intermediate result according to the set recovery algorithm, and can recover the prediction result of the prediction data corresponding to the first training node.
Therefore, after the Federal learning model training is completed, the prediction result of the prediction data corresponding to the first training node can be obtained by using the target model parameter of the first training model obtained through recovery. Optionally, if the target node is an authorized node, the target node may directly receive the prediction data sent by the authorized node, and based on the target model parameter and the prediction data, generate a second intermediate result corresponding to the first training node and feed back the second intermediate result to the prediction node, so that the prediction node obtains the prediction result of the prediction data corresponding to the first training node according to the second intermediate result. Optionally, if the target node is an unauthorized node, in order to ensure data transmission security in the prediction stage, the target node may receive a data secret share of the prediction data sent by the prediction node, generate a second intermediate result secret share corresponding to the first training node according to the data secret share and the second parameter secret share, and feed back the second intermediate result secret share to the prediction node, so that the prediction node obtains the prediction result of the prediction data corresponding to the first training node according to a preset number of second intermediate result secret shares. Therefore, the prediction node can acquire the prediction result of the prediction data corresponding to the first training node on the premise of avoiding the prediction data from being stolen. In order to implement the foregoing embodiment, the embodiment of the present invention further provides a flowchart of another method for training a federal learning model, and as shown in fig. 9, a process for training a federal learning model based on a secret sharing algorithm is explained, which specifically includes the following steps:
S901, any training node respectively carries out secret sharing processing on sample data corresponding to the local first learning model to obtain data secret shares and other data secret shares on the training nodes, and the other data secret shares are respectively sent to the rest training nodes and the assisting node.
It should be noted that, because the training nodes participating in the model training include: labeled training nodes and unlabeled training nodes. Therefore, if the training node is a training node without a label, secret sharing processing can be performed only on sample data corresponding to the local learning model corresponding to the training node to generate a secret share of data and secret shares of other data corresponding to the sample data; if the training node is a training node with a label, secret sharing processing can be simultaneously performed on the sample data corresponding to the local learning model corresponding to the training node and the label data of the sample, so as to generate a data secret share of the sample data and a label secret share corresponding to the label data of the sample.
For example, as shown in fig. 10, taking a labeled training node a as an example, the sample data X and the label data y of the sample may be used as a secret a, the secret a is divided into secret shares a1 to a6, the secret share a1 is stored locally, and the secret shares a2 to a6 are sent to the remaining training nodes S1 to S4 and the assisting node S5, so that the training node a carries the data secret share a 1. Wherein A1-A6 are all the same.
Accordingly, each of the training nodes may receive other data secret shares transmitted by the remaining training nodes, and the assisting node may receive other data secret shares transmitted by each of the training nodes.
For example, as shown in fig. 11, taking a labeled training node a as an example, the other data secret shares G2, H2, I2 and J2 respectively sent by the remaining training nodes S1 to S4 may be received, so that the training node a carries the data secret share a1 and the other data secret shares G2, H2, I2 and J2.
For another example, as shown in fig. 12, taking the assisting node as an example, the assisting node may receive other data secret shares a2, G2, H2, I2, and J2 respectively sent by the training nodes S1 to S5, so that the assisting node carries other data secret shares a2, G2, H2, I2, and J2.
Further, in the embodiment of the present application, a secret share of a tag of the first training node on the target node may be obtained. The label secret share is generated after the label data of the sample is subjected to secret sharing processing by the labeled training node.
It should be noted that, because the secret sharing system has a homomorphic characteristic, when trying to acquire the label secret share of the first training node on the target node, the sample data corresponding to the local learning model corresponding to the first training node and the label data of the sample may be used as one secret to perform the secret sharing process, or the sample data corresponding to the local learning model corresponding to the first training node and the label data of the sample may be used as two secrets to perform the secret sharing process, so that the label secret shares of the first training node on the target node acquired by the foregoing two methods are identical.
For example, for sample data X and label data y of the sample corresponding to the first training node, X and y may be used as one secret C to perform secret sharing processing, so that the secret share of the data of the first training node on the target node is obtained as C1. Further, the corresponding tag secret share y1 may be extracted from C1; or y can be used as a secret D to perform secret sharing processing, so that the label secret share D1 of the first training node on the target node can be directly obtained. At this time, y1 is identical to D1.
It should be noted that, if the assisting node identifies a training node with a dropped connection when step S901 is executed, the current model training process may be ended, and the next model training task may be restarted, so that no additional loss may be caused to the federal learning model.
S902, any training node obtains the intermediate result of the first learning model, performs secret sharing processing on the intermediate result to obtain an intermediate result secret share and other intermediate result secret shares on the respective training node, and sends the other intermediate result secret shares to the remaining training nodes and the assisting node respectively.
In the embodiment of the application, each training node can be based on the local learning model parameter θ and the localThe local sample data X is obtained, and the intermediate result u of the local learning model is obtained as thetaTx, and then performing secret sharing processing on the intermediate result u to obtain an intermediate result secret share and other intermediate result secret shares. Further, each training node may store the intermediate result secret share locally and send the other intermediate result secret shares to the remaining training nodes and assisting nodes, respectively.
For example, as shown in fig. 13, taking training node a as an example, the intermediate result u may be used as secret B, the secret B is divided into intermediate result secret shares B1 to B6, the secret share B6 is stored locally, and the secret shares B1 to B5 are sent to the remaining training nodes S1 to S4 and the assisting node S5, so that the training node a carries the intermediate result secret share B1. Wherein B1-B6 are the same.
Accordingly, any one training node may receive other intermediate result secret shares sent by the remaining training nodes, and the assisting nodes may receive other intermediate result secret shares sent by each training node.
For example, as shown in fig. 14, taking the training node a as an example, the training nodes a may receive the other intermediate result secret shares C2, D2, E2, and F2 respectively sent by the remaining training nodes S1 through S4, so that the training node a carries the intermediate result secret share B1 and the other intermediate result secret shares C2, D2, E2, and F2.
For another example, as shown in fig. 15, taking the assisting node as an example, the assisting node may receive other intermediate result secret shares B2, C2, D2, E2 and F2 respectively sent by the training nodes S1 to S5, so that the assisting node carries other intermediate result secret shares B2, C2, D2, E2 and F2.
It should be noted that, when attempting to perform the secret sharing process, the intermediate result may be divided as the secret, and the secret may be shared among n participants (including the training nodes and the assisting nodes), so that the secret can be recovered only when more than a preset number of participants cooperate, and the secret cannot be recovered when less than the preset number of participants cooperate. The preset number can be set according to actual conditions. For example, the preset number may be set to 2/3 × n.
For example, the intermediate result of the training node a is divided as a secret, and is shared among 6 participants from the training node a to the training node e and the assisting node, and if the preset number is 4, the intermediate result of the training node a can be recovered only when the number of cooperative participants reaches 4, and the intermediate result of the training node a cannot be recovered when the number of cooperative participants does not reach 4.
S903, any training node obtains gradient secret shares related to the local learning model according to the data secret shares, the intermediate result secret shares and the label secret shares.
And S904, the residual training nodes and the assisting nodes respectively acquire gradient secret shares related to any training node according to other data secret shares, other intermediate result secret shares and other label secret shares which are secret-shared by any training node.
Optionally, the second gradient secret share is calculated from the data secret share, the intermediate result secret share, and the tag secret share using the following formula:
Figure RE-GDA0002672941870000201
where θ is a first learning model parameter, ui is an intermediate result secret share, xip is a data secret share, and yi is a label secret share.
S905, any training node acquires gradient secret shares related to the training node from the residual training nodes and the assisting nodes, and restores gradient information of a local learning model corresponding to any training node by combining the local acquisition of the gradient secret shares related to the training node.
Optionally, after a certain number of secret shares of the gradient of the local learning model are acquired, the gradient information may be restored according to a set restoration algorithm, so that the gradient information of the local learning model can be restored.
S906, any training node obtains the secret share of the loss function of the local learning model according to the secret share of the intermediate result and the secret share of the label respectively, and sends the secret share of the loss function to the assisting node.
Alternatively, the loss function secret share of the local learning model may be calculated from the intermediate result secret share and the tag secret share using the following formula:
Figure RE-GDA0002672941870000202
and S907, the assisting node receives the secret share of the loss function of the same learning model sent by any training node.
Optionally, the loss function secret share of the same learning model is obtained according to the intermediate result secret share and the tag secret share, and is sent to the assisting node. Accordingly, the assisting node may receive the loss function secret shares of the same learning model sent by any of the training nodes.
And S908, the assisting node recovers the loss function of the same learning model according to the received secret share of the loss function.
Optionally, the assisting node may recover the loss function according to a set recovery algorithm according to the received secret share of the loss function of the same learning model, so as to recover the loss function of the same learning model.
Therefore, on the premise of semi-honest assumption, the method and the device can train the federated learning model based on the secret sharing algorithm, so that attackers (including external attackers and residual training nodes) can recover gradient information only by obtaining a certain number of gradient secret shares at the same time, the risk that the attackers collude to steal private data is reduced, and the safety in the model training process is improved. Meanwhile, when part of gradient secret shares are lost or destroyed, the gradient information can still be recovered by using other gradient secret shares, and the reliability in the model training process is improved. Furthermore, the federated learning model is trained based on a secret sharing algorithm, the gradient secret share operation can be completed in an express way, and the operation time is greatly shortened. Furthermore, only the assisting nodes can recover the loss function in the whole model training process, whether the first learning model is converged is identified according to the loss function, and the indicating information is sent to the target training node according to the identification result, so that participation of each training node is avoided, reliability of the indicating information fed back by the assisting nodes and received by each training node is ensured, and effectiveness of model training is further improved.
It should be noted that, as shown in fig. 16, on the premise of a semi-honest assumption, the present application uses each training node (including a labeled training node and a non-labeled training node) and an assisting node to participate in calculation, trains a longitudinal federal learning model in a manner of a Shamir secret sharing algorithm, a loss function calculation formula, a secret sharing share calculation formula, a symbolic mathematical system based on data flow transformation (TensorFlow), and the like, and determines whether the model converges according to a loss function calculated by a coordinating node, so as to send an instruction to the corresponding training node according to a recognition result until the models corresponding to all the training nodes complete training, thereby ensuring data security, improving model training effectiveness, and shortening operation time.
Based on the same application concept, the embodiment of the application also provides a device corresponding to the method for training the federated learning model.
Fig. 17 is a schematic structural diagram of a training apparatus of a federal learning model according to an embodiment of the present application. As shown in fig. 17, the training device is disposed on a target node, where the target node is any one of a training node and a coordinating node that participate in the training of the federal learning model and are in an online state, and the federal learning model is trained based on a secret sharing algorithm.
This federal learning model's trainer 1000 includes: a first acquisition module 110, a second acquisition module 120, a third acquisition module 130, and a fourth acquisition module 140.
The first obtaining module 110 is configured to, in a training process of the federal learning model, obtain a first training node in a dropped state, and obtain a substitute intermediate result secret share of the first training node from a local cache, where the substitute intermediate result secret share is a first intermediate secret share generated when training is completed before the first training node cached in the local cache is dropped; a second obtaining module 120, configured to obtain a first loss function secret share of the first training node on the target node according to the substitute intermediate result secret share; a third obtaining module 130, configured to receive a second intermediate result secret share sent by the remaining participating nodes in an online state, and obtain a second loss function secret share of the remaining participating nodes on a target node according to the second intermediate result secret share, where the remaining participating nodes include the remaining training nodes and the coordinating node; a fourth obtaining module 140, configured to obtain a loss function of each training node according to the first loss function secret share, the second loss function secret share, and a third loss function secret share of the target node itself.
According to an embodiment of the present application, as shown in fig. 18, the first obtaining module 110 in fig. 17 includes: a prompt information obtaining unit 111, configured to obtain first prompt information used for prompting that the training node is disconnected, where the first prompt information at least includes identification information of the first training node and disconnection time; a cache obtaining unit 112, configured to obtain the substitute intermediate result secret share of the first training node from the local cache according to the offline time.
According to an embodiment of the present application, as shown in fig. 18, the cache obtaining unit 112 in fig. 17 includes: a determining subunit 113, configured to determine, according to the offline time, whether the model training of the first training node is completed at this time; a cache obtaining subunit 114, configured to, when the offline time indicates that the secondary model training is not completed, use a first intermediate result secret share generated by the first training node in the last model training as the substitute intermediate result secret share, and when the offline time indicates that the secondary model training is completed, use a first intermediate result secret share generated by the first training node in the secondary model training as the substitute intermediate result secret share.
According to an embodiment of the present application, the fourth obtaining module 140 in fig. 17 is further configured to: and respectively obtaining loss function secret shares corresponding to the first gradient secret share, the second gradient secret share and the third gradient secret share, sending the loss function secret shares to the coordination node, and obtaining the loss function by the coordination node according to the loss function secret shares.
According to an embodiment of the present application, as shown in fig. 18, when the target node is a coordinating node, the training apparatus 1000 of the federal learning model in fig. 17 further includes: a fifth obtaining module 150, configured to obtain second prompt information that the first training node is on-line again; a sending module 160, configured to send initial model parameters to the first training node; a coordination module 170, configured to coordinate that each participating node sends, to the first training node, the cached gradient secret share of the first training node after each model training before the first training node is disconnected, so that the first training node obtains a target model parameter of a first learning model on the first training node according to the initial model parameter and the gradient secret share of the first training node after each model training before the first training node is disconnected.
According to an embodiment of the present application, as shown in fig. 18, the training apparatus 1000 of the federal learning model in fig. 17 further includes: the authorization identification module 180 is configured to identify whether the target node is an authorized node or not when the first training node is disconnected and is not on-line again all the time after the federal learning model training is finished; a sixth obtaining module 190, configured to, if the target node is the authorized node, obtain an initial model parameter of the first training node and a gradient secret share after each model training before the first training node cached by each participating node drops; a seventh obtaining module 200, configured to obtain a target model parameter of the first training node according to the initial model parameter and the gradient secret share after each model training before the disconnection; the authorized node is authorized by the first training node and used for recovering model parameters of a first learning model on the first training node, and the authorized node is the training node or the coordinating node.
According to an embodiment of the present application, as shown in fig. 18, the training apparatus 1000 of the federal learning model in fig. 17 further includes: an eighth obtaining module 210, configured to obtain, after obtaining the target model parameter of the first training node, prediction data sent by a prediction node; the first prediction module 220 is configured to generate a second intermediate result corresponding to the first training node according to the target model parameter and the prediction data, and feed back the second intermediate result to the prediction node, so that the prediction node obtains the prediction result of the prediction data corresponding to the first training node according to the second intermediate result.
According to an embodiment of the present application, as shown in fig. 18, the seventh obtaining module 200 includes: the gradient information acquisition unit 203 is configured to acquire gradient information after each model training before a disconnection according to the gradient secret share after each model training before the disconnection of each participating node; a parameter updating unit 204, configured to successively update the initial model parameters according to the gradient information before the disconnection based on a gradient descent algorithm, so as to obtain the target model parameters.
According to an embodiment of the present application, as shown in fig. 18, the training apparatus 1000 of the federal learning model in fig. 17 further includes: a ninth obtaining module 230, configured to obtain a first parameter secret share of the initial model parameter of the first training node if the target node is not the authorized node; a tenth obtaining module 240, configured to obtain a second parameter secret share of the target model parameter according to the first parameter secret share and a gradient secret share after each model training before the first training node is disconnected, which is cached on the target node.
According to an embodiment of the present application, as shown in fig. 18, the training apparatus 1000 of the federal learning model in fig. 17 further includes: an eleventh obtaining module 250, configured to, after obtaining the second parameter secret share of the target model parameter, obtain a data secret share of the prediction data sent by the prediction node for the first training node; a second prediction module 260, configured to generate a second intermediate result secret share corresponding to the first training node according to the data secret share and the second parameter secret share corresponding to the first training node, and feed back the second intermediate result secret share to the prediction node, so that the prediction node obtains a prediction result of the prediction data corresponding to the first training node according to a preset number of the second intermediate result secret shares.
According to an embodiment of the present application, as shown in fig. 18, the tenth obtaining module 240 in fig. 17 is further configured to: based on a gradient descent algorithm, according to the gradient secret shares after each model training before the disconnection, the first parameter secret shares are updated successively to obtain the second parameter secret shares.
According to an embodiment of the present application, as shown in fig. 18, the second obtaining module 120 in fig. 17 is further configured to: before the first training node is not on-line, the first intermediate result secret share cached for the last time before the line is dropped is used as the target intermediate result secret share.
According to an embodiment of the present application, as shown in fig. 18, the training apparatus 1000 of the federal learning model in fig. 17 further includes: and a training termination module 270, configured to obtain the number of the training nodes in an online state, and terminate training of the federated learning model if the number of the training nodes is less than or equal to a preset number.
Therefore, the federated learning model can be trained based on the secret sharing algorithm on the premise of the semi-honesty assumption. When a training node with an outgoing line and a dropped line is in the training process of the federal learning model, an intermediate result generated by the training of the dropped line training node is lost, at the moment, other online participating nodes can obtain a substitute intermediate result secret share of the first training node from a local cache, and continue the subsequent training process of the federal learning model according to the substitute intermediate result secret share, so that a loss function of the federal learning model can be finally obtained, the influence on the effectiveness of the model training process due to the dropped line of the training node is reduced, and the fact that even if the training node in the dropped line state exists is ensured, the training of the federal learning model can be normally and smoothly realized.
It should be noted that, as shown in fig. 19, a training system composed of a training apparatus of the federal learning model provided in the present application, and at least one data management system and an auxiliary system can form a service application layer of a cloud platform, and then an application program is established in combination with the data layer and a basic support layer, so as to implement functions of the application program on the basis of eliminating intermediate result leakage risks, avoiding a final calculation result from being acquired by an unnecessary node, and ensuring data security.
The MySQL is a relational database management system, and a Remote Dictionary service (Remote Dictionary Server, Redis for short) belongs to a database; the inter-cloud federal learning compute engine comprises: encryption algorithm, federal Learning Application Programming Interface (federed Learning API), federal Core Application Programming Interface (federed Core API), and Compiler (Compiler).
Based on the same application concept, the embodiment of the application also provides the electronic equipment.
Fig. 20 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 20, the electronic device 2000 includes a memory 201, a processor 202, and a computer program stored in the memory 201 and executable on the processor 202, and when the processor executes the computer program, the processor implements the above-mentioned method for training the federal learning model.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
Obviously, various modifications and alterations to this application will become apparent to those skilled in the art without departing from the invention
With clear spirit and scope. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (26)

1. The method for training the federated learning model is characterized in that an execution subject is a target node, wherein the target node is any one of a training node and a coordination node which participate in the federated learning model training and are in an online state, the federated learning model is trained on the basis of a secret sharing algorithm, and the method comprises the following steps:
in the process of training of the federal learning model, a first training node in a dropped state is obtained, and a substitute intermediate result secret share of the first training node is obtained from a local cache, wherein the substitute intermediate result secret share is a first intermediate secret share generated when training is completed before the first training node cached in the local cache is dropped;
Obtaining a first loss function secret share of the first training node on the target node according to the substitute intermediate result secret share;
receiving a second intermediate result secret share sent by the remaining participating nodes in an online state, and acquiring a second loss function secret share of the remaining participating nodes on the target node according to the second intermediate result secret share, wherein the remaining participating nodes comprise remaining training nodes and the coordinating node;
obtaining a loss function on each training node according to the first loss function secret share, the second loss function secret share and a third loss function secret share of the target node;
the method further comprises the following steps:
when the first training node is not on-line again after being disconnected and the Federal learning model training is finished, identifying whether the target node is an authorized node;
if the target node is the authorized node, obtaining initial model parameters of the first training node and gradient secret shares cached by each participating node after model training each time before the first training node is disconnected;
Acquiring target model parameters of the first training node according to the initial model parameters and the gradient secret share after each model training before disconnection;
the authorized node is authorized by the first training node and used for recovering model parameters of a first learning model on the first training node, and the authorized node is the training node or the coordinating node.
2. A method for training a federal learning model as claimed in claim 1, wherein said obtaining a first training node in a dropped state and a substitute intermediate result secret share of the first training node from a local cache comprises:
acquiring first prompt information for prompting that the training node is disconnected, wherein the first prompt information at least comprises identification information and disconnection time of the first training node;
and acquiring the substitute intermediate result secret share of the first training node from the local cache according to the offline time.
3. A method for training a federal learning model as claimed in claim 2, wherein said obtaining the substitute intermediate result secret share of the first training node from the local cache in accordance with the time-to-disconnect comprises:
Determining whether the current model training of the first training node is finished or not according to the offline time;
if the offline time indicates that the current model training is not finished, taking a first intermediate result secret share generated by the first training node in the last model training as the substitute intermediate result secret share;
and if the off-line time indicates that the current model training is finished, taking a first intermediate result secret share generated by the first training node in the current model training as the substitute intermediate result secret share.
4. A training method of a federal learning model as claimed in any one of claims 1-3, wherein when the target node is any one of the training nodes, the obtaining of the loss function of the federal learning model based on the first secret share of the loss function, the second secret share of the loss and the third secret share of the target node itself comprises:
and sending the first loss function secret share, the second loss function secret share and the third loss function secret share to the coordination nodes, and obtaining the loss function of each training node by the coordination nodes according to the loss function secret share of each training node.
5. A method for training a federal learning model as defined in any of claims 1-3, wherein when said target node is a coordinator node, the method further comprises:
acquiring second prompt information of the first training node which is on-line again;
sending initial model parameters to the first training node;
coordinating each participating node to send the cached gradient secret share of the first training node after model training each time before the first training node is disconnected, so that the first training node obtains a target model parameter of a first learning model on the first training node according to the initial model parameter and the gradient secret share of the first training node after model training each time before the first training node is disconnected.
6. The method for training a federal learning model as claimed in claim 1, wherein said obtaining target model parameters of said first training node further comprises:
acquiring prediction data sent by a prediction node;
and generating a second intermediate result corresponding to the first training node according to the target model parameters and the prediction data, and feeding back the second intermediate result to the prediction node, so that the prediction node obtains the prediction result of the prediction data corresponding to the first training node according to the second intermediate result.
7. A training method of a federal learning model as claimed in claim 1, wherein the obtaining target model parameters of the first training node according to the initial model parameters and the gradient secret shares after each model training before the disconnection comprises:
obtaining gradient information after each model training before the disconnection according to the gradient secret share of each participating node after each model training before the disconnection;
based on a gradient descent algorithm, according to the gradient information before the disconnection, the initial model parameters are gradually updated to obtain the target model parameters.
8. A method of training a federal learning model as defined in claim 1, further comprising:
if the target node is not the authorized node, obtaining a first parameter secret share of the initial model parameter of the first training node;
and acquiring a second parameter secret share of the target model parameter according to the first parameter secret share and the gradient secret share after model training of each time before the first training node is disconnected, which is cached on the target node.
9. A method for training a federal learning model as claimed in claim 8, wherein said obtaining a secret share of a second parameter of the target model parameters further comprises:
Acquiring a data secret share of prediction data which is sent by a prediction node and aims at the first training node;
and generating a second intermediate result secret share corresponding to the first training node according to the data secret share and the second parameter secret share corresponding to the first training node, and feeding back to the prediction node, so that the prediction node obtains a prediction result of the prediction data corresponding to the first training node according to a preset number of second intermediate result secret shares.
10. A method for training a federal learning model as claimed in claim 8, wherein said obtaining a second parameter secret share of the target model parameters based on the first parameter secret share and a gradient secret share after each model training before the first training node is disconnected, which is cached on the target node, comprises:
based on a gradient descent algorithm, according to the gradient secret share after each model training before the disconnection, the first parameter secret share is updated successively to obtain the second parameter secret share.
11. A method for training a federal learning model as claimed in claim 1, further comprising:
Before the first training node is not on-line, the first intermediate result secret share cached the last time before the line is dropped is taken as a target intermediate result secret share.
12. A method for training a federal learning model as defined in any of claims 1-3, further comprising:
and acquiring the number of the training nodes in an online state, and terminating the training of the federal learning model if the number of the training nodes is less than or equal to a preset number.
13. The training device of the federated learning model is characterized in that the training device of the federated learning model is arranged on a target node, wherein the target node is any one of a training node and a coordination node which participate in the federated learning model training and are in an online state; the federated learning model is trained based on a secret sharing algorithm;
wherein, the trainer of the federal learning model comprises:
a first obtaining module, configured to obtain, during a training process of the federal learning model, a first training node in a dropped state, and obtain, from a local cache, a substitute intermediate result secret share of the first training node, where the substitute intermediate result secret share is a first intermediate secret share generated when training is completed before the first training node cached in the local cache is dropped;
A second obtaining module, configured to obtain a first loss function secret share of the first training node on the target node according to the substitute intermediate result secret share;
a third obtaining module, configured to receive a second intermediate result secret share sent by a remaining participant node in an online state, and obtain, according to the second intermediate result secret share, a second loss function secret share of the remaining participant node on the target node, where the remaining participant node includes a remaining training node and the coordinating node;
a fourth obtaining module, configured to obtain a loss function of the federated learning model according to the first secret share of loss function, the second secret share of loss function, and a third secret share of loss function of the target node itself;
the authorization identification module is used for identifying whether the target node is an authorized node or not when the first training node is disconnected and is not on-line again all the time after the Federal learning model training is finished;
a sixth obtaining module, configured to obtain, if the target node is the authorized node, initial model parameters of the first training node and a gradient secret share after each model training of the first training node cached by each participating node before a disconnection;
A seventh obtaining module, configured to obtain a target model parameter of the first training node according to the initial model parameter and the gradient secret share after each model training before the disconnection;
the authorized node is authorized by the first training node and used for recovering model parameters of a first learning model on the first training node, and the authorized node is the training node or the coordinating node.
14. The apparatus for training a federal learning model as claimed in claim 13, wherein said first acquisition module comprises:
a prompt information obtaining unit, configured to obtain first prompt information used for prompting that the training node is disconnected, where the first prompt information at least includes identification information of the first training node and disconnection time;
a cache obtaining unit, configured to obtain the substitute intermediate result secret share of the first training node from the local cache according to the offline time.
15. The apparatus for training the federal learning model as claimed in claim 14, wherein the buffer retrieving unit comprises:
the determining subunit is configured to determine, according to the offline time, whether the current model training of the first training node is completed;
And the cache obtaining subunit is configured to, when the offline time indicates that secondary model training is not completed, use a first intermediate result secret share generated by the first training node in the last model training as the substitute intermediate result secret share, and when the offline time indicates that secondary model training is completed, use a first intermediate result secret share generated by the first training node in the secondary model training as the substitute intermediate result secret share.
16. A training apparatus for a federal learning model as claimed in any one of claims 13-15, wherein when the target node is any one of the training nodes, the fourth obtaining module is further configured to:
and respectively obtaining loss function secret shares corresponding to the first gradient secret share, the second gradient secret share and the third gradient secret share, sending the loss function secret shares to the coordination node, and obtaining the loss function by the coordination node according to the loss function secret shares.
17. A training apparatus for a federal learning model as claimed in any one of claims 13 to 15, wherein when the target node is a coordinating node, the training apparatus for a federal learning model further comprises:
A fifth obtaining module, configured to obtain a second prompt message that the first training node is on-line again;
a sending module, configured to send an initial model parameter to the first training node;
and the coordination module is used for coordinating the cached gradient secret share of the first training node after model training each time before the first training node is disconnected to the first training node, so that the first training node obtains the target model parameter of the first learning model on the first training node according to the initial model parameter and the gradient secret share of the first training node after model training each time before the disconnection.
18. The apparatus for training a federal learning model as in claim 13, further comprising:
the eighth obtaining module is used for obtaining the prediction data sent by the prediction node after obtaining the target model parameter of the first training node;
and the first prediction module is used for generating a second intermediate result corresponding to the first training node according to the target model parameters and the prediction data and feeding back the second intermediate result to the prediction node so that the prediction node can obtain the prediction result of the prediction data corresponding to the first training node according to the second intermediate result.
19. The apparatus for training a federal learning model as claimed in claim 13, wherein said seventh obtaining module comprises:
the gradient information acquisition unit is used for acquiring gradient information after each model training before the disconnection according to the gradient secret share of each participating node after each model training before the disconnection;
and the parameter updating unit is used for gradually updating the initial model parameters according to the gradient information before the disconnection based on a gradient descent algorithm so as to obtain the target model parameters.
20. A training apparatus for a federal learning model as claimed in claim 13, further comprising:
a ninth obtaining module, configured to obtain a first parameter secret share of an initial model parameter of the first training node if the target node is not the authorized node;
a tenth obtaining module, configured to obtain a second parameter secret share of the target model parameter according to the first parameter secret share and a gradient secret share after each model training before the first training node is disconnected, which is cached on the target node.
21. A training apparatus for a federal learning model as claimed in claim 20, further comprising:
An eleventh obtaining module, configured to obtain a secret share of data of the prediction data sent by the prediction node for the first training node after obtaining the secret share of the second parameter of the target model parameter;
and the second prediction module is used for generating a second intermediate result secret share corresponding to the first training node according to the data secret share corresponding to the first training node and the second parameter secret share, and feeding back the second intermediate result secret share to the prediction node, so that the prediction node obtains a prediction result of the prediction data corresponding to the first training node according to a preset number of the second intermediate result secret shares.
22. The apparatus for training a federal learning model as claimed in claim 20, wherein the tenth obtaining module is further configured to:
based on a gradient descent algorithm, according to the gradient secret share after each model training before the disconnection, the first parameter secret share is updated successively to obtain the second parameter secret share.
23. The apparatus for training a federal learning model as claimed in claim 13, wherein said second obtaining module is further configured to:
Before the first training node is not on-line, the first intermediate result secret share cached for the last time before the line is dropped is used as a target intermediate result secret share.
24. A training apparatus for a federal learning model as claimed in any of claims 13-15, further comprising:
and the training termination module is used for acquiring the number of the training nodes in an online state, and terminating the training of the federal learning model if the number of the training nodes is less than or equal to a preset number.
25. An electronic device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, when executing the program, implementing a method for training a federal learning model as claimed in any of claims 1-12.
26. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of training a federal learning model as claimed in any one of claims 1 to 12.
CN202010651671.XA 2020-07-08 2020-07-08 Method and device for training federal learning model Active CN111950740B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010651671.XA CN111950740B (en) 2020-07-08 2020-07-08 Method and device for training federal learning model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010651671.XA CN111950740B (en) 2020-07-08 2020-07-08 Method and device for training federal learning model

Publications (2)

Publication Number Publication Date
CN111950740A CN111950740A (en) 2020-11-17
CN111950740B true CN111950740B (en) 2022-05-24

Family

ID=73340315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010651671.XA Active CN111950740B (en) 2020-07-08 2020-07-08 Method and device for training federal learning model

Country Status (1)

Country Link
CN (1) CN111950740B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113133768A (en) * 2021-04-21 2021-07-20 东南大学 Cardiovascular disease auxiliary diagnosis model and training method based on federal learning
TW202319935A (en) * 2021-11-05 2023-05-16 財團法人資訊工業策進會 Federated learning method and federated learning system based on mediation process
CN114006769B (en) * 2021-11-25 2024-02-06 中国银行股份有限公司 Model training method and device based on transverse federal learning
CN114648130B (en) * 2022-02-07 2024-04-16 北京航空航天大学 Longitudinal federal learning method, device, electronic equipment and storage medium
TWI800303B (en) * 2022-03-16 2023-04-21 英業達股份有限公司 Fedrated learning method using synonym
CN114742233A (en) * 2022-04-02 2022-07-12 支付宝(杭州)信息技术有限公司 Method and device for joint training of logistic regression model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263921A (en) * 2019-06-28 2019-09-20 深圳前海微众银行股份有限公司 A kind of training method and device of federation's learning model
CN110288094A (en) * 2019-06-10 2019-09-27 深圳前海微众银行股份有限公司 Model parameter training method and device based on federation's study
CN110490330A (en) * 2019-08-16 2019-11-22 安徽航天信息有限公司 A kind of distributed machines learning system based on block chain
CN111241567A (en) * 2020-01-16 2020-06-05 深圳前海微众银行股份有限公司 Longitudinal federal learning method, system and storage medium based on secret sharing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10970402B2 (en) * 2018-10-19 2021-04-06 International Business Machines Corporation Distributed learning preserving model security

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288094A (en) * 2019-06-10 2019-09-27 深圳前海微众银行股份有限公司 Model parameter training method and device based on federation's study
CN110263921A (en) * 2019-06-28 2019-09-20 深圳前海微众银行股份有限公司 A kind of training method and device of federation's learning model
CN110490330A (en) * 2019-08-16 2019-11-22 安徽航天信息有限公司 A kind of distributed machines learning system based on block chain
CN111241567A (en) * 2020-01-16 2020-06-05 深圳前海微众银行股份有限公司 Longitudinal federal learning method, system and storage medium based on secret sharing

Also Published As

Publication number Publication date
CN111950740A (en) 2020-11-17

Similar Documents

Publication Publication Date Title
CN111950740B (en) Method and device for training federal learning model
CN111860829A (en) Method and device for training federal learning model
US11947680B2 (en) Model parameter training method, terminal, and system based on federation learning, and medium
CN110288094B (en) Model parameter training method and device based on federal learning
CN110110229B (en) Information recommendation method and device
CN113159327B (en) Model training method and device based on federal learning system and electronic equipment
CN109213900B (en) Data modification method, device, equipment and medium for block chain
CN104335214B (en) Electronic equipment and system and server for certification electronic equipment
CN112162959B (en) Medical data sharing method and device
CN112288094B (en) Federal network representation learning method and system
CN111858955B (en) Knowledge graph representation learning enhancement method and device based on encryption federal learning
CN112394974B (en) Annotation generation method and device for code change, electronic equipment and storage medium
CN112199709A (en) Multi-party based privacy data joint training model method and device
CN113992360A (en) Block chain cross-chain-based federated learning method and equipment
CN112633146B (en) Multi-pose face gender detection training optimization method, device and related equipment
CN104158655A (en) POS master key generation and distribution management system and control method
CN111046857A (en) Face recognition method, device, equipment, medium and system based on knowledge federation
CN110599384B (en) Organization relation transferring method, device, equipment and storage medium
CN115130121A (en) Method for training longitudinal logistic regression model under privacy calculation of third party
CN116502732B (en) Federal learning method and system based on trusted execution environment
CN112948883A (en) Multi-party combined modeling method, device and system for protecting private data
CN113240461A (en) Method, system and medium for identifying potential customers based on longitudinal federal learning
CN108399544A (en) The method and apparatus that auxiliary based on Internet of Things signs block chain contract
CN111092935B (en) Data sharing method and virtual training device for machine learning
CN112100145A (en) Digital model sharing learning system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant