CN116029392A - Joint training method and system based on federal learning - Google Patents

Joint training method and system based on federal learning Download PDF

Info

Publication number
CN116029392A
CN116029392A CN202310065357.7A CN202310065357A CN116029392A CN 116029392 A CN116029392 A CN 116029392A CN 202310065357 A CN202310065357 A CN 202310065357A CN 116029392 A CN116029392 A CN 116029392A
Authority
CN
China
Prior art keywords
party
active
passive
ciphertext
derivative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310065357.7A
Other languages
Chinese (zh)
Inventor
彭胜波
彭宇
周宏�
王克敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Tobacco Corp Guizhou Provincial Co
Original Assignee
China Tobacco Corp Guizhou Provincial Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Tobacco Corp Guizhou Provincial Co filed Critical China Tobacco Corp Guizhou Provincial Co
Priority to CN202310065357.7A priority Critical patent/CN116029392A/en
Publication of CN116029392A publication Critical patent/CN116029392A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a federal learning-based joint training method and system. The method comprises the following steps: each active party calculates the first derivative and the second derivative of each sample, and sends the ciphertext to the parameter server of the passive party after homomorphic encryption; the parameter server of the passive party based on the secure multiparty computing protocol uses ciphertext [ g ] of the first derivative from each active party from the sample dimension ji ]And ciphertext of second derivative [ h ] ji ]Respectively summing to obtain ciphertext [ g ] of first derivative of active side sample i ]And ciphertext of second derivative [ h ] i ]The method comprises the steps of carrying out a first treatment on the surface of the Based on the precision lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and the method is based on the [ g ] i ]And [ h ] j ]Training to obtain a lifting tree model so as to meet the federal learning scene coexisting horizontally and vertically.

Description

Joint training method and system based on federal learning
Technical Field
The invention relates to the field of federal learning, in particular to a federal learning-based joint training method and a federal learning-based joint training system.
Background
Federal learning (Federated Learning) is an emerging artificial intelligence basic technology, and is first proposed by Google in 2016, originally used for solving the problem of local model updating of android mobile phone terminal users, and the design goal is to develop high-efficiency machine learning among multiple participants or multiple computing nodes on the premise of guaranteeing information security during large data exchange, protecting terminal data and personal data privacy and guaranteeing legal compliance.
Lateral federal learning and longitudinal federal learning are two types of federal learning. Horizontal federal learning, federal learning is performed in conjunction with multiple rows of samples of multiple participants having the same characteristics. Longitudinal federal learning is performed in conjunction with a plurality of participants having common samples but different feature sets. On the basis of longitudinal federal learning, cheng K proposed a precision lossless privacy protection tree promotion algorithm in paper security boost: A Lossless Federatedd Learning Framework [ J ]. ArXiv,2019 to train a high quality promotion tree model in a longitudinal federal environment. The method requires that the data of different parties possess the following properties: the sample data is vertically partitioned, one data provider providing tag data and the other data provider providing feature data.
However, in a vertical federal learning scenario, tag data may be distributed among different participants. The existing longitudinal federation learning algorithm cannot meet the federation learning scene coexisting horizontally and longitudinally. For example, party a is a bank that has user tag data of: there is a lending crisis, there is no lending crisis, there may or may not be user features, however, the labels of party a are distributed among different banks, e.g. the farmer has a part of labels and the bank has a part of labels. Party B is an insurance company that has characteristic data of the user (e.g., age, income, etc. of the user). The bank and the insurance company are in a longitudinal relationship, and the label owned by the farm and the label owned by the building are in a transverse relationship.
Disclosure of Invention
The invention provides a federal learning-based joint training method and a federal learning-based joint training system, which are used for meeting the federal learning scene coexisting horizontally and vertically.
In a first aspect, an embodiment of the present invention provides a federal learning-based joint training method, applied to a federal learning-based joint training system, where the system includes at least two active parties and at least one passive party, the passive party owns feature data, each active party owns part of tag data, any one of the at least two active parties is selected as a coordinator, and all the active parties and the passive parties have completed sample alignment, and the method is characterized by comprising:
each active party calculates the first derivative and the second derivative of each sample, and sends the ciphertext to the parameter server of the passive party after homomorphic encryption;
the parameter server of the passive party based on the secure multiparty computing protocol uses ciphertext [ g ] of the first derivative from each active party from the sample dimension ji ]And ciphertext of second derivative [ h ] ji ]Respectively summing to obtain ciphertext [ g ] of first derivative of active side sample i ]And ciphertext of second derivative [ h ] i ];
Based on the precision lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and the method is based on the [ g ] i ]And [ h ] i ]Training to obtain a lifting tree model.
Further, the precision-based lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and the method is based on the [ g ] i ]And [ h ] i ]Training to obtain a lifting tree model, comprising:
the parameter server of the passive party is based on [ g ] i ]And [ h ] i ]Ciphertext [ g ] of left subtree gradient of current node l ]And [ h ] l ];
Decryption by the master [ g ] l ]And [ h ] l ]According to g l And h l Calculating split information of the current node and sending the split information to a coordinator;
the coordinator calculates the globally optimal split information according to the split information and sends the split information to the corresponding passive party;
the passive party divides a sample space according to the splitting information, adds a record of node splitting information into a lookup table, and then broadcasts a record code record id and the sample space of the record to the active party;
the active party splits the current node according to the received sample space, and associates the current node with the passive party and the record id;
and taking the child node split by the current node as a father node, and returning to execute the steps until a preset termination condition is reached.
Further, the preset termination condition includes:
the maximum splitting gain of the node is smaller than or equal to a set gain threshold;
or alternatively, the first and second heat exchangers may be,
the number of the samples contained in the leaf nodes is smaller than a set number threshold;
or alternatively, the first and second heat exchangers may be,
the tree depth of the lifting tree is equal to the set depth threshold.
Further, the precision-based lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and the method is based on the [ g ] i ]And [ h ] i ]Training to obtain a lifting tree model, comprising:
the passive side calculates ciphertext [ g ] of the left subtree gradient of the current node l ]And [ h ] l ]And broadcast [ g ] according to the computing power resources of each active party l ]And [ h ] l ]To different active parties;
the initiative recipe according to g l And h l Calculating split information of a current node, sending the split information to a coordinator, and calculating globally optimal split information by the coordinator;
based on an accuracy lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and a lifting tree model is obtained according to global optimal split information training.
Further, the precision-based lossless privacy protection tree lifting algorithm is used for cooperationThe regulating party coordinates the active party and the passive party according to the [ g ] i ]And [ h ] i ]After training to obtain the lifting tree model, the method further comprises the following steps:
when calculating the evaluation index on the verification set, calculating a local evaluation index value based on the label owned by each active party;
and carrying out statistical operation of the corresponding index values based on the secure multiparty computing protocol, thereby obtaining the evaluation index information based on all the labels.
Further, the precision-based lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and the method is based on the [ g ] i ]And [ h ] i ]Training to obtain a lifting tree model, comprising:
the parameter server of the passive party will [ g ] i ]And [ h ] i ]Dividing the network into at least two worker servers;
each worker server calculates ciphertext [ g ] of the left subtree gradient of the current node l ]And [ h ] l ];
The parameter server of the passive side gathers the [ g ] of each worker server l ]And [ h ] l ];
Based on the precision lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and the method is based on the [ g ] l ]And [ h ] l ]Training to obtain a lifting tree model.
Further, the secure multiparty computing protocol comprises: SPDZ protocol supporting both parties' secure computing or NPDZ protocol supporting multiparty secure computing.
Further, the parameter server of the passive party sums the first and second derivatives from different active parties from the sample dimension based on the secure multiparty computing protocol, respectively, comprising:
acquiring the number of the participants, and selecting a target protocol from the SPDZ protocol and the NPDZ protocol according to the number of the participants;
based on the target protocol, the first and second derivatives from different initiatives are summed separately from the sample dimension.
In a second aspect, an embodiment of the present invention further provides a federal learning-based joint training system, where the system includes at least two active parties and at least one passive party, the passive parties have feature data, each active party has part of tag data, any one of the at least two active parties is selected as a coordinator, and all of the active parties and the passive parties have completed sample alignment, where the system applies the federal learning-based joint training method according to any one of the embodiments of the present invention.
According to the invention, the ciphertext gradient of all samples is summed on the passive side based on the SPDZ protocol or the NPDZ protocol, a third-party node is not needed, the safety of the data of the participators is ensured to the greatest extent, and the federal learning scene coexisting horizontally and longitudinally is satisfied.
When the sample size of the federal training of the participants is large, the computational overhead of this process of comparing the corresponding split gains of different features is relatively large. When the global split point is selected, the embodiment of the invention calculates the left subtree gradient [ g ] of the current node by the passive party l ]And [ h ] l ]And broadcast [ g ] according to the computational power resources of the active party l ]And [ h ] l ]And the global optimal splitting points are calculated by all the parties together to different active parties, so that the efficiency of the algorithm is improved.
When evaluating indexes such as confusion matrix are calculated on the verification set, label information of all active parties is needed to be utilized, but each active party only has part of label information and cannot be combined explicitly. According to the embodiment of the invention, when the evaluation index is calculated, the local evaluation index value is calculated based on the label owned by each active party, and then the statistical operation of the corresponding index value is performed based on the secure multiparty calculation protocol, so that the evaluation index information based on all the labels is obtained.
In joint modeling scenes such as advertisement delivery, financial credit, knowledge graph and the like, the data size of the participants is generally large, and the federal learning algorithm cannot read all data into a memory for calculation, so that how to use the massive data for joint modeling. The embodiment of the invention uses the parameter server of the passive party to realize the following steps of i ]And [ h ] i ]Dividing the network into at least two worker servers; each worker server calculates ciphertext [ g ] of the left subtree gradient of the current node l ]And [ h ] l ]The method comprises the steps of carrying out a first treatment on the surface of the The parameter server of the passive side gathers the [ g ] of each worker server l ]And [ h ] l ]Thereby solving the problem of mass data.
Drawings
FIG. 1 is a flow chart of a federal learning-based joint training method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a model evaluation method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of interaction of a federal learning-based joint training method according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
FIG. 1 is a flowchart of a federal learning-based joint training method according to an embodiment of the present invention. The embodiment of the invention can be applied to the situation that tag data is possibly distributed among different participants in a longitudinal federal learning scene. The method may be performed by a federal learning-based joint training system comprising at least two active parties and at least one passive party, the passive parties having characteristic data, each active party having part of tag data, any one of the at least two active parties being selected as a coordinator, the coordinator being also the active party at the same time. Sample alignment has been completed by all active and passive parties.
Referring to fig. 1, a specific federal learning-based joint training method includes the steps of:
s110, each active party calculates the first derivative and the second derivative of each sample, and sends the ciphertext to the parameter server of the passive party after homomorphic encryption.
Wherein each sample includes characteristic data and/or tag data for that sample.
The first derivative and the second derivative are data items that are fitted based on the taylor formula.
For example, party a is a bank that has user tag data of: there is a lending crisis, there is no lending crisis, there may or may not be user features, however, the labels of party a are distributed among different banks, e.g. the farmer has a part of labels and the bank has a part of labels. Party B is an insurance company that has characteristic data of the user (e.g., age, income, etc. of the user). The construction can be both a coordinator and an active party, the farm is the active party, and the insurance company is the passive party.
S120, the parameter server of the passive party carries out ciphertext [ g ] on the first derivative of each active party from the sample dimension based on the secure multiparty calculation protocol ji ]And ciphertext of second derivative [ h ] ji ]Respectively summing to obtain ciphertext [ g ] of first derivative of active side sample i ]And ciphertext of second derivative [ h ] i ]。
Where j represents different passive parties and i represents different samples.
The secure multiparty computing protocol includes: SPDZ protocol supporting both parties' secure computing or NPDZ protocol supporting multiparty secure computing.
Optionally, the secure multiparty computing protocol may further comprise: ABY (arithmetical, bootan, yao), ABY3.
Specifically, the choice of SPDZ protocol and NPDZ protocol is determined according to the number of participants.
S130, based on an accuracy lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and the method is based on the [ g ] i ]And [ h ] i ]Training to obtain a lifting tree model.
The precision lossless privacy protection tree promotion algorithm is proposed by Cheng K in paper SecureBoost: ALossless Federatedd Learning Framework [ J ]. ArXiv, 2019.
According to the invention, the ciphertext gradient of all samples is summed on the passive side based on the SPDZ protocol or the NPDZ protocol, a third-party node is not needed, the safety of the data of the participators is ensured to the greatest extent, and the federal learning scene coexisting horizontally and longitudinally is satisfied.
Further, the precision-based lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and the method is based on the [ g ] i ]And [ h ] i ]Training to obtain a lifting tree model, comprising:
the passive side calculates ciphertext [ g ] of the left subtree gradient of the current node l ]And [ h ] l ]And broadcast [ g ] according to the computing power resources of each active party l ]And [ h ] l ]To different active parties;
the initiative recipe according to g l And h l Calculating split information of a current node, sending the split information to a coordinator, and calculating globally optimal split information by the coordinator;
based on an accuracy lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and a lifting tree model is obtained according to global optimal split information training.
According to the embodiment of the invention, the distribution calculation is carried out according to the calculation force resources of all the parties, so that the training efficiency of the model is improved.
Fig. 2 is a schematic diagram of a model evaluation method according to an embodiment of the present invention. Referring to fig. 2, the precision lossless privacy protection tree-based lifting algorithm, a coordinator coordinates an active party and a passive party according to [ g ] i ]And [ h ] i ]After training to obtain the lifting tree model, the method further comprises the following steps:
when calculating the evaluation index on the verification set, calculating a local evaluation index value based on the label owned by each active party;
and carrying out statistical operation of the corresponding index values based on the secure multiparty computing protocol, thereby obtaining the evaluation index information based on all the labels.
According to the embodiment of the invention, when the evaluation index is calculated, the local evaluation index value is calculated based on the label owned by each active party, and then the statistical operation of the corresponding index value is performed based on the secure multiparty calculation protocol, so that the evaluation index information based on all the labels is obtained.
Further, the precision-based lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and the precision-based lossless privacy protection tree lifting algorithm is implemented according to [ g ] i ]And [ h ] i ]Training to obtainLifting a tree model, comprising:
the parameter server of the passive party will [ g ] i ]And [ h ] i ]Dividing the network into at least two worker servers;
each worker server calculates ciphertext [ g ] of the left subtree gradient of the current node l ]And [ h ] l ];
The parameter server of the passive side gathers the [ g ] of each worker server l ]And [ h ] l ];
Based on the precision lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and the method is based on the [ g ] l ]And [ h ] l ]Training to obtain a lifting tree model.
The embodiment of the invention uses the parameter server of the passive party to realize the following steps of i ]And [ h ] i ]Dividing the network into at least two worker servers; each worker server calculates ciphertext [ g ] of the left subtree gradient of the current node l ]And [ h ] l ]The method comprises the steps of carrying out a first treatment on the surface of the The parameter server of the passive side gathers the [ g ] of each worker server l ]And [ h ] l ]Thereby solving the problem of mass data.
FIG. 3 is a schematic diagram of interaction of a federal learning-based joint training method according to an embodiment of the present invention. The training method is exemplified by longitudinal federal (X, X, Y) mode learning, where X represents feature data and Y represents tag data. Assuming that all participants have completed the sample alignment process before federal learning, participant a j (j=1, 2, … |a|) (active party ActiveParty) each possess partial tag data Y, party B m (m=1, 2 … |b|) (passive party) has characteristic data X i Wherein i is used for identifying different samples, j is used for identifying different active parties, a| represents the number of active parties, m is used for identifying different passive parties, b| represents the number of passive parties, and A is used in the embodiment 1 As a coordinator, A j (j.noteq.1) is other active party, and the model is XGBoost as example, and the joint modeling steps are as follows:
(1)A 1 generating homomorphic encryption public-private key pairs<s,d>Broadcasting private keys d to A j (j.noteq.1), broadcasting the public key s to B m
(2)A j Ciphertext [ g ] of the first derivative of each sample of the respective possession is calculated ji ]And ciphertext of second derivative [ h ] ji ]And combine [ g ] ji ]And [ h ] ji ]To B m Parameter server (PS, paramServer).
(3)B m PS based on SPDZ protocol summation [ g ] ji ]And [ h ] ji ]To give [ g ] i ]And [ h ] i ],B i PS of (2) will [ g ] i ]And [ h ] i ]The number of the workers is divided into 3 workers, and the workers are respectively marked as w1, w2 and w3;
each worker calculates ciphertext [ g ] of the left subtree gradient of the current node l ]And [ h ] l ];
B m PS summary of each worker [ g ] l ]And [ h ] l ]Obtaining ciphertext g of current node left subtree gradient l ]And [ h ] l ]And broadcast ciphertext to A j
(4)A j Decryption [ g ] l ]、[h l ]And according to g l 、h l Calculating a splitting gain, splitting characteristics and splitting threshold, and transmitting splitting information to A 1
(5)A 1 Calculating globally optimal splitting characteristics and splitting threshold values according to the splitting information, and sending the splitting information to corresponding B m
(6)B m Dividing a sample space according to the split information, and adding a record into the wakeup table (the wakeup table is firstly established if the wakeup table does not exist for the first time); then R is taken up n And the sample space of the left subtree is broadcast to A j . Wherein R is n I.e. the record id in the record table.
(7)A j Splitting a current node according to the received sample space and combining the current node with (B) m 、R n ) And (5) association.
(8) Repeating (1) - (7) until the tree establishment termination condition is reached.
Wherein, the preset termination condition comprises:
the maximum splitting gain of the node is smaller than or equal to a set gain threshold; or alternatively, the first and second heat exchangers may be,
the number of the samples contained in the leaf nodes is smaller than a set number threshold; or alternatively, the first and second heat exchangers may be,
the tree depth of the lifting tree is equal to the set depth threshold.
The present embodiment is not limited in any way, and may be specifically set according to actual needs.
Taking an active party as a building and an agricultural party as an insurance company as an example, building a part of tag data, and taking the agricultural party as an insurance company as an example, wherein the agricultural party is provided with part of tag data, and the tag data are as follows: the user has a lending crisis and the user does not have a lending crisis. Insurance companies possess characteristic data: age of the user, income of the user. Assuming that the construction is both an active party and a coordinator, the farm is the active party and the insurance company is the passive party.
Taking the model as XGBoost as an example, the joint modeling steps are as follows:
(1) The construction server side generates homomorphic encryption public and private key pairs < s, d >, and broadcasts the private key d to the farm.
(2) The construction server and the farming server respectively calculate the first derivative and the second derivative of the sample labels owned by the construction server and the farming server, and homomorphic encryption is carried out to obtain [ g ] 1i ]、[h 1i ],[g 2i ]、[h 2i ]And sending to an insurance company server.
(3) Insurance company server sums g based on SPDZ protocol 1i ]、[g 2i ]To give [ g ] i ]Sum [ h ] 1i ]、[h 2i ]Obtain [ h ] i ]Insurance company server will [ g ] i ]And [ h ] i ]The servers are divided into 3 worker servers and respectively marked as w1, w2 and w3;
each worker calculates ciphertext [ g ] of the left subtree gradient of the current node l ]And [ h ] l ];
The insurance company server gathers the [ g ] of each worker l ]And [ h ] l ]Obtaining the gradient [ g ] of the left subtree of the current node l ]And [ h ] l ]And will [ g ] l ]And [ h ] l ]Broadcast to the construction server and the farming server.
(4) Decryption of construction server and farm server [ g ] l ]、[h l ]And according to g l 、h l Calculating a splitting gain, splitting characteristics and splitting threshold, andthe split information is sent to the construction server.
(5) The construction server calculates a global optimum split characteristic and a split threshold from the split information and sends the split information to the insurer server.
(6) The insurance company server divides a sample space according to the split information and adds a record into the lookup table (the lookup table is not existed for the first time, the lookup table is established first); then R is taken up n And the sample space of the left subtree is broadcast to the construction server and the farming server. Wherein R is n I.e. the record id in the record table.
(7) The construction server and farm server split the current node according to the received sample space and combine the current node with (insurance company, R) n ) And (5) association.
(8) Repeating (1) - (7) until the tree establishment termination condition is reached.
The embodiment of the invention also provides a federal learning-based joint training system, which comprises at least two active parties and at least one passive party, wherein the passive party has characteristic data, each active party has part of tag data, any one of the at least two active parties is selected as a coordinator, and all the active parties and the passive parties have completed sample alignment.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (9)

1. A federal learning-based joint training method, applied to a federal learning-based joint training system, wherein the system includes at least two active parties and at least one passive party, the passive parties possess characteristic data, each active party possesses partial tag data, any one of the at least two active parties is selected as a coordinator, and all of the active parties and the passive parties have completed sample alignment, comprising:
each active party calculates the first derivative and the second derivative of each sample, and sends the ciphertext to the parameter server of the passive party after homomorphic encryption;
the parameter server of the passive party based on the secure multiparty computing protocol uses ciphertext [ g ] of the first derivative from each active party from the sample dimension ji ]And ciphertext of second derivative [ h ] ji ]Respectively summing to obtain ciphertext [ g ] of first derivative of active side sample i ]And ciphertext of second derivative [ h ] i ];
Based on the precision lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and the method is based on the [ g ] i ]And [ h ] i ]Training to obtain a lifting tree model.
2. The method according to claim 1, wherein the precision-based lossless privacy preserving tree promotion algorithm, a coordinator coordinates an active party and a passive party according to the [ g ] i ]And [ h ] i ]Training to obtain a lifting tree model, comprising:
the parameter server of the passive party is based on [ g ] i ]And [ h ] i ]Ciphertext [ g ] of left subtree gradient of current node l ]And [ h ] l ];
Decryption by the master [ g ] l ]And [ h ] l ]According to g l And h l Calculating split information of the current node and sending the split information to a coordinator;
the coordinator calculates the globally optimal split information according to the split information and sends the split information to the corresponding passive party;
the passive party divides a sample space according to the splitting information, adds a record of node splitting information into a lookup table, and then broadcasts a record code record id and the sample space of the record to the active party;
the active party splits the current node according to the received sample space, and associates the current node with the passive party and the record id;
and taking the child node split by the current node as a father node, and returning to execute the steps until a preset termination condition is reached.
3. The method of claim 2, wherein the preset termination condition comprises:
the maximum splitting gain of the node is smaller than or equal to a set gain threshold;
or alternatively, the first and second heat exchangers may be,
the number of the samples contained in the leaf nodes is smaller than a set number threshold;
or alternatively, the first and second heat exchangers may be,
the tree depth of the lifting tree is equal to the set depth threshold.
4. The method according to claim 1, wherein the precision-based lossless privacy preserving tree promotion algorithm, a coordinator coordinates an active party and a passive party according to the [ gi ] ] And [ h ] i ]Training to obtain a lifting tree model, comprising:
the passive side calculates ciphertext [ g ] of the left subtree gradient of the current node l ]And [ h ] l ]And broadcast [ g ] according to the computing power resources of each active party l ]And [ h ] l ]To different active parties;
the initiative recipe according to g l And h l Calculating split information of a current node, sending the split information to a coordinator, and calculating globally optimal split information by the coordinator;
based on an accuracy lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and a lifting tree model is obtained according to global optimal split information training.
5. The method according to claim 1, wherein the precision-based lossless privacy preserving tree promotion algorithm coordinatesThe party coordinates the active party and the passive party according to the [ g ] i ]And [ h ] i ]After training to obtain the lifting tree model, the method further comprises the following steps:
when calculating the evaluation index on the verification set, calculating a local evaluation index value based on the label owned by each active party;
and carrying out statistical operation of the corresponding index values based on the secure multiparty computing protocol, thereby obtaining the evaluation index information based on all the labels.
6. The method according to claim 1, wherein the precision-based lossless privacy preserving tree promotion algorithm, a coordinator coordinates an active party and a passive party according to the [ g ] i ]And [ h ] i ]Training to obtain a lifting tree model, comprising:
the parameter server of the passive party will [ g ] i ]And [ h ] i ]Dividing the network into at least two worker servers;
each worker server calculates ciphertext [ g ] of the left subtree gradient of the current node l ]And [ h ] l ];
The parameter server of the passive side gathers the [ g ] of each worker server l ]And [ h ] l ];
Based on the precision lossless privacy protection tree lifting algorithm, a coordinator coordinates an active party and a passive party, and the method is based on the [ g ] l ]And [ h ] l ]Training to obtain a lifting tree model.
7. The method of claim 1, wherein the secure multiparty computing protocol comprises: SPDZ protocol supporting both parties' secure computing or NPDZ protocol supporting multiparty secure computing.
8. The method of claim 7, wherein the parameter server of the passive party bases the ciphertext [ g ] of the first derivative from each active party from the sample dimension based on a secure multiparty computing protocol ji ]And ciphertext of second derivative [ h ] ji ]Respectively summing to obtain ciphertext [ g ] of first derivative of active side sample i ]And ciphertext of second derivative [ h ] i ]Comprising:
acquiring the number of the participants, and selecting a target protocol from the SPDZ protocol and the NPDZ protocol according to the number of the participants;
based on the target protocol, the first and second derivatives from the coordinator and different initiatives are summed separately from the sample dimension.
9. A federal learning based joint training system, wherein the system comprises at least two active parties and at least one passive party, the passive party having characteristic data, each active party having part of tag data, any one of the at least two active parties being selected as a coordinator, all active and passive parties having completed sample alignment, characterized in that the system is applied with the federal learning based joint training method of any one of claims 1 to 8.
CN202310065357.7A 2023-02-06 2023-02-06 Joint training method and system based on federal learning Pending CN116029392A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310065357.7A CN116029392A (en) 2023-02-06 2023-02-06 Joint training method and system based on federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310065357.7A CN116029392A (en) 2023-02-06 2023-02-06 Joint training method and system based on federal learning

Publications (1)

Publication Number Publication Date
CN116029392A true CN116029392A (en) 2023-04-28

Family

ID=86079333

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310065357.7A Pending CN116029392A (en) 2023-02-06 2023-02-06 Joint training method and system based on federal learning

Country Status (1)

Country Link
CN (1) CN116029392A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117675411A (en) * 2024-01-31 2024-03-08 智慧眼科技股份有限公司 Global model acquisition method and system based on longitudinal XGBoost algorithm
CN117675411B (en) * 2024-01-31 2024-04-26 智慧眼科技股份有限公司 Global model acquisition method and system based on longitudinal XGBoost algorithm

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117675411A (en) * 2024-01-31 2024-03-08 智慧眼科技股份有限公司 Global model acquisition method and system based on longitudinal XGBoost algorithm
CN117675411B (en) * 2024-01-31 2024-04-26 智慧眼科技股份有限公司 Global model acquisition method and system based on longitudinal XGBoost algorithm

Similar Documents

Publication Publication Date Title
WO2020029590A1 (en) Sample prediction method and device based on federated training, and storage medium
CN111695697A (en) Multi-party combined decision tree construction method and device and readable storage medium
CN114401079B (en) Multi-party united information value calculation method, related equipment and storage medium
CN107317672A (en) A kind of light weight terminating machine block catenary system
CN111931253A (en) Data processing method, system, device and medium based on node group
CN115102763B (en) Multi-domain DDoS attack detection method and device based on trusted federal learning
CN111967514B (en) Sample classification method of privacy protection decision tree based on data packaging
CN112668472B (en) Iris image feature extraction method, system and device based on federal learning
CN114239032A (en) Multi-party data interaction method and system based on secure multi-party computation
CN112288094A (en) Federal network representation learning method and system
CN114611128B (en) Longitudinal federal learning method, device, system, equipment and storage medium
CN113947211A (en) Federal learning model training method and device, electronic equipment and storage medium
CN114186694A (en) Efficient, safe and low-communication longitudinal federal learning method
US20190279136A1 (en) Method and system for selective data visualization and posting of supply chain information to a blockchain
CN111581648B (en) Method of federal learning to preserve privacy in irregular users
WO2022076826A1 (en) Privacy preserving machine learning via gradient boosting
CN113688408A (en) Maximum information coefficient method based on safe multi-party calculation
Trieu et al. Multiparty Private Set Intersection Cardinality and Its Applications.
Aoki et al. Limited negative surveys: Privacy-preserving participatory sensing
CN112529102B (en) Feature expansion method, device, medium and computer program product
CN110610098A (en) Data set generation method and device
CN116029392A (en) Joint training method and system based on federal learning
Majumdar et al. DNA based cloud storage security framework using fuzzy decision making technique
CN116032639A (en) Message pushing method and device based on privacy calculation
Fang et al. Privacy preserving decision tree learning over vertically partitioned data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination