CN116886271B - Gradient aggregation method for longitudinal federal XGboost model training - Google Patents

Gradient aggregation method for longitudinal federal XGboost model training Download PDF

Info

Publication number
CN116886271B
CN116886271B CN202311150627.0A CN202311150627A CN116886271B CN 116886271 B CN116886271 B CN 116886271B CN 202311150627 A CN202311150627 A CN 202311150627A CN 116886271 B CN116886271 B CN 116886271B
Authority
CN
China
Prior art keywords
enc
vector
ciphertext
gradient
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311150627.0A
Other languages
Chinese (zh)
Other versions
CN116886271A (en
Inventor
马煜翔
冯黎明
刘文博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lanxiang Zhilian Hangzhou Technology Co ltd
Original Assignee
Lanxiang Zhilian Hangzhou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lanxiang Zhilian Hangzhou Technology Co ltd filed Critical Lanxiang Zhilian Hangzhou Technology Co ltd
Priority to CN202311150627.0A priority Critical patent/CN116886271B/en
Publication of CN116886271A publication Critical patent/CN116886271A/en
Application granted granted Critical
Publication of CN116886271B publication Critical patent/CN116886271B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0816Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
    • H04L9/085Secret sharing or secret splitting, e.g. threshold schemes

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a gradient aggregation method for training a longitudinal federal XGboost model. It comprises the following steps: the initiator calculates a gradient vector g and a second-order gradient vector h, and secretly shares the gradient vector g and the second-order gradient vector h to the participant and the calculator; the computing party carries out homomorphic encryption on the obtained fragments and then sends the fragments to the participators; the participant calculates a one-step gradient aggregate value vector ciphertext and a second-order gradient aggregate value vector ciphertext based on the obtained fragments and the fragments which are homomorphically encrypted by the calculator; the participant shares the one-step gradient aggregate value vector ciphertext and the second-order gradient aggregate value vector ciphertext to the initiator and the calculator in a secret manner; the computing party decrypts the ciphertext fragment to obtain a corresponding plaintext value and sends the plaintext value to the initiating party; the initiator calculates a one-step gradient aggregate value vector G and a second-order gradient aggregate value vector H of the plaintext. According to the method, the computing party is introduced on the basis of protecting the data privacy of each party, and the computing pressure of the initiator is effectively solved.

Description

Gradient aggregation method for longitudinal federal XGboost model training
Technical Field
The invention relates to the technical field of federal learning, in particular to a gradient aggregation method for longitudinal federal XGboost model training.
Background
The problem of data island is solved to a great extent by the occurrence of federal learning, while XGBoost is a machine learning algorithm based on a gradient lifting decision tree (Gradient Boosting Decision Trees), has high-efficiency realization and calculation speed and a certain interpretability, and is one of the preferred algorithms in practical application, and the federal XGBoost algorithm is an important algorithm in federal learning and has extremely high use frequency.
The current federal XGBoost algorithm is mostly based on a semi-homomorphic algorithm to realize privacy protection, but the semi-homomorphic algorithm has a high demand on computing resources, and in some application scenarios, a party participating in federal learning does not have enough computational power support, for example: the data user (such as small-sized science and technology company) has fewer computing resources, cannot bear expensive homomorphic encryption and decryption computation, and the computing resources of the data holder (large-sized science and technology company) are sufficient, so that the data user needs a third party with sufficient computing resources to assist in computation, but the data of the data user is leaked to the third party.
Disclosure of Invention
The invention aims to solve the technical problems, and provides a gradient aggregation method for training a longitudinal federal XGboost model, which introduces a calculator on the basis of protecting the data privacy of an initiator and a participant, and effectively solves the calculation pressure of the initiator.
In order to solve the problems, the invention is realized by adopting the following technical scheme:
the invention discloses a gradient aggregation method for training a longitudinal federal XGboost model, which comprises the following steps that an initiator and a participant hold an intersection sample set for training:
s1: the initiator calculates a first gradient vector g and a second gradient vector h corresponding to the intersection sample set according to the label value of each sample, and secretly shares the first gradient vector g to obtain a first fragment g 1 Second part g 2 Secret sharing is carried out on the second-order gradient vector h to obtain a first fragment h 1 Second segment h 2 Dividing the first slice g 1 First segment h 1 Send to the participator, divide the second slice g 2 Second segment h 2 Sending the result to a computing party;
s2: calculating the second segment g 2 Second segment h 2 Homomorphic encryption is performed to obtain a second piece of ciphertext enc (g) 2 ) Second piece of ciphertext enc (h 2 ) And send to the participants;
s3: the participator according to the first slice g 1 Second piece of ciphertext enc (g) 2 ) Calculating a gradient aggregate value vector ciphertext enc (G) according to the first slice h 1 Second piece of ciphertext enc (h 2 ) Calculating a second-order gradient aggregate value vector ciphertext enc (H);
s4: the participant performs secret sharing on a stepwise aggregate value vector ciphertext enc (G) to obtain a first fragment enc (G) of the ciphertext 1 ) And a second segment G of plain text 2 Secret sharing is carried out on the second-order gradient aggregate value vector ciphertext enc (H) to obtain a first fragment enc (H) of the ciphertext 1 ) And a second segment H of plain text 2 The first slice enc (G 1 ) First slice enc (H 1 ) Send to the computing side to divide the second slice G 2 Second segment H 2 Sending to an initiator;
s5: the calculator calculates the first slice enc (G 1 ) First slice enc (H 1 ) Decrypting to obtain a first fragment G of the plaintext 1 First segment H 1 And send to the initiator;
s6: the initiator according to the first slice G 1 Second segment G 2 Calculating a gradient aggregate value vector G of the plaintext according to the first segment H 1 Second segment H 2 And calculating a second-order gradient aggregate value vector H of the plaintext.
In the scheme, an initiator calculates first-order gradients and second-order gradients corresponding to each sample according to label values of the samples, one step of all samples forms a step vector g, the second-order gradients of all samples form a second-order gradient vector h, the first-order gradient vector g and the second-order gradient vector h are split into two corresponding fragments by adopting a secret sharing algorithm, the corresponding fragments are respectively sent to a participant and a calculator, and the calculated fragments are homomorphic encrypted by the calculator and then sent to the participant.
The participant calculates a gradient aggregate value vector ciphertext enc (G) and a second-order gradient aggregate value vector ciphertext enc (H) based on the obtained fragments and the fragments after homomorphic encryption of the calculator, wherein the plaintext after decryption corresponding to the gradient aggregate value vector ciphertext enc (G) is the gradient aggregate value vector G, and the plaintext after decryption corresponding to the second-order gradient aggregate value vector ciphertext enc (H) is the second-order gradient aggregate value vector H.
The method comprises the steps that a participant adopts a secret sharing algorithm to divide a one-step gradient aggregate value vector ciphertext enc (G) and a second-order gradient aggregate value vector ciphertext enc (H) into a corresponding plaintext fragment and a corresponding ciphertext fragment respectively, the plaintext fragment is sent to an initiator, the ciphertext fragment is sent to a calculator, the calculator decrypts the ciphertext fragment to obtain a corresponding plaintext value and sends the corresponding plaintext value to the initiator, and the initiator calculates the one-step gradient aggregate value vector G and the second-order gradient aggregate value vector H of the plaintext.
The data of the initiator and the data of the participant are not leaked, the initiator does not need to perform homomorphic encryption and decryption calculation on the data, and the computing party performs homomorphic encryption and decryption calculation, so that the computing pressure of the initiator is effectively solved.
Preferably, the step S2 includes the steps of: the calculator generates a public key pk and a private key sk, and uses the public key pk to pair the second segment g 2 Second segment h 2 Homomorphic encryption is performed to obtain a second piece of ciphertext enc (g) 2 ) Second piece of ciphertext enc (h 2 ) And sent to the participants.
Preferably, the step S3 includes the steps of:
s31: the participant calculates a gradient vector ciphertext enc (g) corresponding to the gradient vector g and a second gradient vector ciphertext enc (h) corresponding to the second gradient vector h, enc (g) =g 1 +enc(g 2 ),enc(h)=h 1 +enc(h 2 );
S32: the participant calculates a gradient aggregate value vector ciphertext enc (G) according to a gradient vector ciphertext enc (G), and calculates a second-order gradient aggregate value vector ciphertext enc (H) of the ciphertext according to a second-order gradient vector ciphertext enc (H).
Preferably, the step S1 further includes the steps of: the initiator and the participant perform box division processing on samples in the intersection sample set to obtain m boxes, wherein m is more than or equal to 2;
the step S32 includes the steps of:
s321: the participant calculates a step aggregation value ciphertext corresponding to each sub-box according to a step aggregation value ciphertext value corresponding to a sample in each sub-box, and m step aggregation value ciphertexts corresponding to the sub-boxes form a step aggregation value vector ciphertext enc (G);
s322: the participator calculates a second-order gradient aggregate value ciphertext corresponding to each sub-box according to the second-order gradient ciphertext value corresponding to the sample in each sub-box, and the second-order gradient aggregate value ciphertext corresponding to the m sub-boxes forms a second-order gradient aggregate value vector ciphertext enc (H).
Preferably, in the step S1, a step vector g is shared in secret to obtain a first slice g 1 Second part g 2 The method of (2) is as follows:
the initiator generates a random number vector r g Random number vector r g Consistent with the dimension of a step vector g, the random number vector r is calculated g As the first slice g 1 G-r g As the second segment g 2
Preferably, in the step S1, the second-order gradient vector h is secret-shared to obtain a first slice h 1 Second segment h 2 The method of (2) is as follows:
the initiator generates a random number vector r h Random number vector r h Consistent with the dimension of the second-order gradient vector h, the random number vector r is calculated h As the first slice h 1 Will h-r h As the second segment h 2
Preferably, in the step S4, the first segment enc (G) of the ciphertext obtained by secret sharing of the one-step aggregate value vector ciphertext enc (G) 1 ) And a second segment G of plain text 2 The method of (2) is as follows:
party generation of random number vector r G Random number vector r G Consistent with the dimension of a stepwise aggregate value vector ciphertext enc (G), a random number vector r is generated G Second segment G as plaintext 2 Enc (G) -r G First segment enc (G) 1 )。
Preferably, the step S4 secrets the second-order gradient aggregate value vector ciphertext enc (H)First segment enc (H) sharing ciphertext 1 ) And a second segment H of plain text 2 The method of (2) is as follows:
party generation of random number vector r H Random number vector r H Consistent with the dimension of the second-order gradient aggregate value vector ciphertext enc (H), the random number vector r is calculated H Second segment H as plaintext 2 Enc (H) -r H Second segment enc (H) as ciphertext 1 )。
Preferably, the random number in the random number vector is not 0.
Preferably, the step S6 includes the steps of: initiator calculates g=g 1 +G 2 ,H=H 1 +H 2 Obtaining a gradient aggregate value vector G of the plaintext and a second-order gradient aggregate value vector H of the plaintext.
The beneficial effects of the invention are as follows: by introducing the secret sharing and homomorphic encryption into the computing party on the basis of protecting the data privacy of the initiator and the participant, the data leakage of the initiator and the participant is avoided, and the computing pressure of the initiator is effectively solved.
Drawings
FIG. 1 is a flow chart of an embodiment;
FIG. 2 is a schematic diagram of a data processing flow of an embodiment.
Detailed Description
The technical scheme of the invention is further specifically described below through examples and with reference to the accompanying drawings.
Examples: the gradient aggregation method for training the longitudinal federal XGboost model of the embodiment includes the following steps, as shown in fig. 1 and 2, that an initiator and a participant hold an intersection sample set for training:
s1: the initiator and the participant perform box division processing on samples in the intersection sample set to obtain m boxes, wherein m is more than or equal to 2;
the initiator calculates a first gradient vector g and a second gradient vector h corresponding to the intersection sample set according to the label value of each sample in the intersection sample set, and secretly shares the first gradient vector g to obtain a first fragment g 1 Second part g 2 ,g=g 1 +g 2 Secret sharing is carried out on the second-order gradient vector h to obtain a first fragment h 1 Second segment h 2 ,h=h 1 +h 2 Dividing the first slice g 1 First segment h 1 Send to the participator, divide the second slice g 2 Second segment h 2 Sending the result to a computing party;
the intersection sample set is noted as d, d= [ d (1), d (2) … … d (n)],g=[g(1), g(2)……g(n)],h=[h(1), h(2)……h(n)],g 1 =[g 1 (1), g 1 (2)……g 1 (n)],g 2 =[g 2 (1), g 2 (2)……g 2 (n)],h 1 =[h 1 (1), h 1 (2)……h 1 (n)],h 2 =[h 2 (1), h 2 (2)……h 2 (n)];
Where n is the number of samples in the intersection sample set, d (i) represents the ith sample in the intersection sample set, g (i) represents the first-order gradient value corresponding to d (i), h (i) represents the second-order gradient value corresponding to d (i), g 1 (i) A first slice representing g (i), g 2 (i) A second segment, h, representing g (i) 1 (i) A first slice representing h (i), h 2 (i) A second segment representing h (i), g (i) =g 1 (i)+g 2 (i),h(i)=h 1 (i)+h 2 (i),1≤i≤n;
Secret sharing is carried out on a step vector g to obtain a first slice g 1 Second part g 2 The method of (2) is as follows:
the initiator generates a random number vector r g ,r g =[r g (1), r g (2)……r g (n)]Random number vector r g The random number in the random number is not 0, and the random number vector r g As the first slice g 1 G-r g As the second segment g 2 G is the 1 =r g ,g 2 =g-r g
Secret sharing is carried out on the second-order gradient vector h to obtain a first slice h 1 Second segment h 2 The method of (2) is as follows:
initiator generationRandom number vector r h ,r h =[r h (1), r h (2)……r h (n)]Random number vector r h The random number in the random number is not 0, and the random number vector r h As the first slice h 1 Will h-r h As the second segment h 2 I.e. h 1 =r h ,h 2 =h-r h
S2: the calculator generates a public key pk and a private key sk, and uses the public key pk to pair the second segment g 2 Second segment h 2 Homomorphic encryption is performed to obtain a second piece of ciphertext enc (g) 2 ) Second piece of ciphertext enc (h 2 ) And sending the encrypted ciphertext to the participant, wherein enc (X) represents a ciphertext obtained by homomorphic encryption of data X by adopting a public key pk;
s3: the participant calculates a gradient vector ciphertext enc (g) corresponding to the gradient vector g and a second gradient vector ciphertext enc (h) corresponding to the second gradient vector h, enc (g) =g 1 +enc(g 2 ),enc(h)=h 1 +enc(h 2 ),enc(g)=[enc(g(1)), enc(g(2))……enc(g(n))],enc(h)=[enc(h(1)), enc(h(2))……enc(h(n))]Enc (g (i)) represents a ciphertext corresponding to the first-order gradient value g (i), enc (h (i)) represents a ciphertext corresponding to the second-order gradient value h (i);
the participant calculates a one-step aggregation value ciphertext corresponding to each sub-box according to a one-step ciphertext value corresponding to a sample in each sub-box by adopting an XGboost algorithm, m one-step aggregation value ciphertexts corresponding to each sub-box form a one-step aggregation value vector ciphertext enc (G), enc (G) = [ enc (G (1)), enc (G (2)) … … enc (G (m)) ], 1.ltoreq.j.ltoreq.m, enc (G (j)) represents a one-step aggregation value ciphertext corresponding to the j-th sub-box;
the participants adopt an XGboost algorithm to calculate a second-order gradient aggregate value ciphertext corresponding to each sub-box according to a second-order gradient ciphertext value corresponding to a sample in each sub-box, the second-order gradient aggregate value ciphertext corresponding to m sub-boxes forms a second-order gradient aggregate value vector ciphertext enc (H), enc (H) = [ enc (H (1)), enc (H (2)) … … enc (H (m)) ], 1.ltoreq.j.ltoreq.m, enc (H (j)) represents a second-order gradient aggregate value ciphertext corresponding to a j-th sub-box;
s4: the participant aggregates a step value vectorSecret sharing is carried out on the ciphertext enc (G) to obtain a first fragment enc (G) of the ciphertext 1 ) And a second segment G of plain text 2 ,enc(G)=enc(G 1 )+G 2 Secret sharing is carried out on the second-order gradient aggregate value vector ciphertext enc (H) to obtain a first fragment enc (H) of the ciphertext 1 ) And a second segment H of plain text 2 ,enc(H)=enc(H 1 )+H 2 The first slice enc (G 1 ) First slice enc (H 1 ) Send to the computing side to divide the second slice G 2 Second segment H 2 Sending to an initiator;
secret sharing is carried out on a stepwise aggregate value vector ciphertext enc (G) to obtain a first fragment enc (G) of the ciphertext 1 ) And a second segment G of plain text 2 The method of (2) is as follows:
party generation of random number vector r G ,r G =[r G (1), r G (2)……r G (m)]Random number vector r G The random number in the random number is not 0, and the random number vector r G Consistent with the dimension of a stepwise aggregate value vector ciphertext enc (G), a random number vector r is generated G Second segment G as plaintext 2 Enc (G) -r G First segment enc (G) 1 ) G, i.e 2 =r G ,enc(G 1 )=enc(G)-r G
Secret sharing is carried out on the second-order gradient aggregate value vector ciphertext enc (H) to obtain a first fragment enc (H) of the ciphertext 1 ) And a second segment H of plain text 2 The method of (2) is as follows:
party generation of random number vector r H ,r H =[r H (1), r H (2)……r H (m)]Random number vector r H The random number in the random number is not 0, and the random number vector r H Consistent with the dimension of the second-order gradient aggregate value vector ciphertext enc (H), the random number vector r is calculated H Second segment H as plaintext 2 Enc (H) -r H Second segment enc (H) as ciphertext 1 ) I.e. H 2 =r H ,enc(H 1 )=enc(H)-r H
S5: the calculator uses the private key sk to perform a first segmentation enc (G 1 ) First splitenc(H 1 ) Decrypting to obtain a first fragment G of the plaintext 1 First segment H 1 And send to the initiator;
s6: initiator calculates g=g 1 +G 2 ,H=H 1 +H 2 Obtaining a gradient aggregate value vector G of the plaintext and a second-order gradient aggregate value vector H of the plaintext.
In the scheme, firstly, an initiator and a participant perform box-dividing processing on samples in an intersection sample set to obtain m boxes.
Then, the initiator calculates a first-order gradient and a second-order gradient corresponding to each sample according to the label value of the sample, one step degree of all samples forms a step degree vector g, the second-order gradient of all samples forms a second-order gradient vector h, the step degree vector g and the second-order gradient vector h are split into two corresponding fragments by adopting a secret sharing algorithm, the two fragments are respectively sent to the participant and the calculator, and the calculated fragments are homomorphic encrypted by the calculator and then sent to the participant.
The participator calculates a stepwise aggregation value ciphertext and a second order gradient aggregation value ciphertext corresponding to each bin based on the obtained fragments and the fragments after homomorphic encryption of the computing party, m stepwise aggregation value ciphers corresponding to the bins form a stepwise aggregation value vector ciphertext (G), m stepwise aggregation value ciphers corresponding to the bins form a second order gradient aggregation value vector ciphertext (H), a plaintext corresponding to the stepwise aggregation value vector ciphertext (G) after decryption is a stepwise aggregation value vector G, and a plaintext corresponding to the second order gradient aggregation value vector ciphertext (H) after decryption is a second order gradient aggregation value vector H.
The method comprises the steps that a participant adopts a secret sharing algorithm to divide a one-step gradient aggregate value vector ciphertext enc (G) and a second-order gradient aggregate value vector ciphertext enc (H) into a corresponding plaintext fragment and a corresponding ciphertext fragment respectively, the plaintext fragment is sent to an initiator, the ciphertext fragment is sent to a calculator, the calculator decrypts the ciphertext fragment to obtain a corresponding plaintext value and sends the corresponding plaintext value to the initiator, and the initiator calculates the one-step gradient aggregate value vector G and the second-order gradient aggregate value vector H of the plaintext.
The data of the initiator and the data of the participant are not leaked, the initiator does not need to perform homomorphic encryption and decryption calculation on the data, and the computing party performs homomorphic encryption and decryption calculation, so that the computing pressure of the initiator is effectively solved.
Illustrating:
the method comprises the steps that a small-sized science and technology company A and a large-sized science and technology company B conduct gradient aggregation calculation of longitudinal federal XGboost model training, the small-sized science and technology company A is used as an initiator due to insufficient calculation resources, the large-sized science and technology company B is used as a participant, the initiator and the participant hold intersection sample sets d, d= [ d (1), d (2), d (3), d (4) and d (5) ] for training, and the initiator and the participant conduct box division processing on samples in the intersection sample sets to obtain a box division f1 and a box division f2, samples d (1), d (2), d (3) belong to a box division f1, d (4) and d (5) belong to a box division f2.
The initiator calculates a corresponding first-step gradient vector g and a second-order gradient vector h according to the label value of each sample, wherein g= [ g (1), g (2), g (3), g (4) and g (5)],h=[h(1), h(2) , h(3) , h(4) , h(5)]The initiator generates a random number vector r g ,r g =[r g (1), r g (2) , r g (3) , r g (4) , r g (5)]Random number vector r g The random number in the code is not 0, and a step vector g is shared in secret to obtain a first fragment g 1 Second part g 2 ,g 1 =r g ,g 2 = g-r g =[g(1)-r g (1), g(2)-r g (2), g(3)-r g (3), g(4)-r g (4), g(5)-r g (5)]The initiator generates a random number vector r h ,r g =[r h (1), r h (2) , r h (3) , r h (4) , r h (5)]Random number vector r h The random number in the first segment h is not 0, and the second-order gradient vector h is secret shared to obtain the first segment h 1 Second segment h 2 ,h 1 =r h ,h 2 = h-r h =[h(1)-r h (1), h(2)-r h (2), h(3)-r h (3), h(4)-r h (4), h(5)-r h (5)]Dividing the first slice g 1 First segment h 1 Send to the participator, divide the second slice g 2 Second segment h 2 Sending the result to a computing party;
the calculator generates a public key pk and a private key sk, and uses the public key pk to pair the second segment g 2 Second segment h 2 Homomorphic encryption is performed to obtain a second piece of ciphertext enc (g) 2 ) Second piece of ciphertext enc (h 2 ) And send to the participants;
calculation of g by the Party 1 +enc(g 2 )=enc(g 1 +g 2 )=enc(g),h 1 +enc(h 2 )=enc(h 1 +h 2 ) =enc (h), a gradient vector ciphertext enc (g) corresponding to a gradient vector g and a second gradient vector ciphertext enc (h) corresponding to a second gradient vector h are obtained, where enc (X) represents a ciphertext obtained by homomorphic encryption of data X using a public key pk.
The participant calculates a step-degree aggregation value ciphertext enc (G (1)) corresponding to the sub-box f1 according to samples d (1), d (2) and d (3) in the sub-box f1, a step-degree ciphertext value enc (G (1)) corresponding to the sub-box f1 and enc (G (3)) by adopting an XGboost algorithm, and calculates a step-degree aggregation value ciphertext enc (G (2)) corresponding to the sub-box f2 according to samples d (4), d (5) and enc (G (5)) in the sub-box f2, thereby obtaining a step-degree aggregation value vector ciphertext enc (G), enc (G) = [ enc (G (1)), enc (G (2)) ].
The participant adopts an XGboost algorithm to calculate a second-order gradient aggregate value ciphertext enc (H (1)) corresponding to the sub-box f1 according to the samples d (1), d (2), d (3) corresponding to the second-order gradient ciphertext values enc (H (1)), enc (H (2)), enc (H (3)), and calculates a second-order gradient aggregate value ciphertext enc (H (2)) corresponding to the sub-box f2 according to the samples d (4), d (5) corresponding to the second-order gradient ciphertext values enc (H (4)), enc (H (5)), thereby obtaining a second-order gradient aggregate value vector ciphertext enc (H), enc (H) = [ enc (H (1)), enc (H (2)) ].
Party generation of random number vector r G ,r G =[r G (1), r G (2)]Random number vector r G The random number in the cipher text is not 0, and the first fragment enc (G) of the cipher text is obtained by secret sharing of a stepwise aggregate value vector cipher text enc (G) 1 ) And a second segment G of plain text 2 ,G 2 =r G ,enc(G 1 )=enc(G)-r G =[enc(G(1))-r G (1), enc(G(2))-r G (2)]The party generates a random number vector r H ,r H =[r H (1), r H (2)]Random number vector r H The random number in the cipher text is not 0, and the second-order gradient aggregate value vector cipher text enc (H) is secret shared to obtain a first fragment enc (H) of the cipher text 1 ) And a second segment H of plain text 2 ,H 2 =r H ,enc(H 1 )=enc(H)-r H =[enc(H(1))-r H (1),enc(H(2))-r H (2)]The first slice enc (G 1 ) First slice enc (H 1 ) Send to the computing side to divide the second slice G 2 Second segment H 2 Sending to an initiator;
the calculator uses the private key sk to perform a first segmentation enc (G 1 ) First slice enc (H 1 ) Decrypting to obtain a first fragment G of the plaintext 1 First segment H 1 And send to the initiator, G 1 =[G(1)-r G (1), G(2)-r G (2)],H 1 =[H(1)-r H (1), H(2)-r H (2)]。
Initiator computing G 1 +G 2 =[G(1)-r G (1), G(2)-r G (2)]+[r G (1), r G (2)]=[G(1), G(2)]=G,H 1 +H 2 =[H(1)-r H (1), H(2)-r H (2)]+[r H (1), r H (2)]=[H(1), H(2)]=h, a one-step gradient aggregate value vector G of the plaintext is obtained, and a second-order gradient aggregate value vector H of the plaintext is obtained.

Claims (8)

1. A gradient aggregation method for training a longitudinal federal XGboost model, wherein an initiator and a participant hold an intersection sample set for training, and the method is characterized by comprising the following steps of:
s1: the initiator and the participant perform box division processing on samples in the intersection sample set to obtain m boxes, wherein m is more than or equal to 2;
the initiator calculates a first gradient vector g and a second gradient vector h corresponding to the intersection sample set according to the label value of each sample, and secretly shares the first gradient vector g to obtain a first fragment g 1 Second part g 2 Secret sharing is carried out on the second-order gradient vector h to obtain a first fragment h 1 Second segment h 2 Dividing the first slice g 1 First segment h 1 Send to the participator, divide the second slice g 2 Second segment h 2 Sending the result to a computing party;
s2: calculating the second segment g 2 Second segment h 2 Homomorphic encryption is performed to obtain a second piece of ciphertext enc (g) 2 ) Second piece of ciphertext enc (h 2 ) And send to the participants;
s3: the participant calculates a gradient vector ciphertext enc (g) corresponding to the gradient vector g and a second gradient vector ciphertext enc (h) corresponding to the second gradient vector h, enc (g) =g 1 +enc(g 2 ),enc(h)=h 1 +enc(h 2 ),enc(g)=[enc(g(1)), enc(g(2))……enc(g(n))],enc(h)=[enc(h(1)), enc(h(2))……enc(h(n))]Enc (g (i)) represents ciphertext corresponding to a first-order gradient value g (i), enc (h (i)) represents ciphertext corresponding to a second-order gradient value h (i), and n is the number of samples in an intersection sample set;
the participant calculates a one-step aggregation value ciphertext corresponding to each sub-box according to a one-step ciphertext value corresponding to a sample in each sub-box by adopting an XGboost algorithm, m one-step aggregation value ciphertexts corresponding to each sub-box form a one-step aggregation value vector ciphertext enc (G), enc (G) = [ enc (G (1)), enc (G (2)) … … enc (G (m)) ], 1.ltoreq.j.ltoreq.m, enc (G (j)) represents a one-step aggregation value ciphertext corresponding to the j-th sub-box;
the participants adopt an XGboost algorithm to calculate a second-order gradient aggregate value ciphertext corresponding to each sub-box according to a second-order gradient ciphertext value corresponding to a sample in each sub-box, the second-order gradient aggregate value ciphertext corresponding to m sub-boxes forms a second-order gradient aggregate value vector ciphertext enc (H), enc (H) = [ enc (H (1)), enc (H (2)) … … enc (H (m)) ], 1.ltoreq.j.ltoreq.m, enc (H (j)) represents a second-order gradient aggregate value ciphertext corresponding to a j-th sub-box;
s4: the participant performs secret sharing on a stepwise aggregate value vector ciphertext enc (G) to obtain a first fragment enc (G) of the ciphertext 1 ) And a second segment G of plain text 2 Secret sharing is carried out on the second-order gradient aggregate value vector ciphertext enc (H) to obtain a first fragment enc (H) of the ciphertext 1 ) And a second segment H of plain text 2 The first slice enc (G 1 ) First slice enc (H 1 ) Send to the computing side to divide the second slice G 2 Second segment H 2 Sending to an initiator;
s5: the calculator calculates the first slice enc (G 1 ) First slice enc (H 1 ) Decrypting to obtain a first fragment G of the plaintext 1 First segment H 1 And send to the initiator;
s6: the initiator according to the first slice G 1 Second segment G 2 Calculating a gradient aggregate value vector G of the plaintext according to the first segment H 1 Second segment H 2 And calculating a second-order gradient aggregate value vector H of the plaintext.
2. A gradient aggregation method for longitudinal federal XGboost model training according to claim 1, wherein the step S2 comprises the steps of: the calculator generates a public key pk and a private key sk, and uses the public key pk to pair the second segment g 2 Second segment h 2 Homomorphic encryption is performed to obtain a second piece of ciphertext enc (g) 2 ) Second piece of ciphertext enc (h 2 ) And sent to the participants.
3. The gradient aggregation method for training the longitudinal federal XGboost model according to claim 1, wherein in the step S1, a gradient vector g is shared secretly to obtain a first slice g 1 Second part g 2 The method of (2) is as follows:
the initiator generates a random number vector r g Random number vector r g Consistent with the dimension of a step vector g, the random number vector r is calculated g As the firstSlicing g 1 G-r g As the second segment g 2
4. The gradient aggregation method for training the longitudinal federal XGboost model according to claim 1, wherein the step S1 is characterized in that the second-order gradient vector h is secret-shared to obtain a first slice h 1 Second segment h 2 The method of (2) is as follows:
the initiator generates a random number vector r h Random number vector r h Consistent with the dimension of the second-order gradient vector h, the random number vector r is calculated h As the first slice h 1 Will h-r h As the second segment h 2
5. The gradient aggregation method for training the longitudinal federal XGboost model according to claim 1, wherein in the step S4, a gradient aggregate value vector ciphertext enc (G) is secret-shared to obtain a first fragment enc (G 1 ) And a second segment G of plain text 2 The method of (2) is as follows:
party generation of random number vector r G Random number vector r G Consistent with the dimension of a stepwise aggregate value vector ciphertext enc (G), a random number vector r is generated G Second segment G as plaintext 2 Enc (G) -r G First segment enc (G) 1 )。
6. The gradient aggregation method for training the longitudinal federal XGboost model according to claim 1, wherein in the step S4, the second-order gradient aggregate value vector ciphertext enc (H) is secret-shared to obtain a first fragment enc (H 1 ) And a second segment H of plain text 2 The method of (2) is as follows:
party generation of random number vector r H Random number vector r H Consistent with the dimension of the second-order gradient aggregate value vector ciphertext enc (H), the random number vector r is calculated H Second segment H as plaintext 2 Enc (H) -r H Second segment enc (H) as ciphertext 1 )。
7. A gradient aggregation method for longitudinal federal XGboost model training according to claim 3 or 4 or 5 or 6, wherein the random numbers within the random number vector are not 0.
8. Gradient aggregation method for longitudinal federal XGboost model training according to claim 1 or 2, characterized in that said step S6 comprises the steps of: initiator calculates g=g 1 +G 2 ,H=H 1 +H 2 Obtaining a gradient aggregate value vector G of the plaintext and a second-order gradient aggregate value vector H of the plaintext.
CN202311150627.0A 2023-09-07 2023-09-07 Gradient aggregation method for longitudinal federal XGboost model training Active CN116886271B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311150627.0A CN116886271B (en) 2023-09-07 2023-09-07 Gradient aggregation method for longitudinal federal XGboost model training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311150627.0A CN116886271B (en) 2023-09-07 2023-09-07 Gradient aggregation method for longitudinal federal XGboost model training

Publications (2)

Publication Number Publication Date
CN116886271A CN116886271A (en) 2023-10-13
CN116886271B true CN116886271B (en) 2023-11-21

Family

ID=88272176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311150627.0A Active CN116886271B (en) 2023-09-07 2023-09-07 Gradient aggregation method for longitudinal federal XGboost model training

Country Status (1)

Country Link
CN (1) CN116886271B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149160A (en) * 2020-08-28 2020-12-29 山东大学 Homomorphic pseudo-random number-based federated learning privacy protection method and system
CN112464287A (en) * 2020-12-12 2021-03-09 同济大学 Multi-party XGboost safety prediction model training method based on secret sharing and federal learning
CN113037460A (en) * 2021-03-03 2021-06-25 北京工业大学 Federal learning privacy protection method based on homomorphic encryption and secret sharing
CN113516256A (en) * 2021-09-14 2021-10-19 深圳市洞见智慧科技有限公司 Third-party-free federal learning method and system based on secret sharing and homomorphic encryption
CN113688999A (en) * 2021-08-23 2021-11-23 神州融安科技(北京)有限公司 Training method of transverse federated xgboost decision tree
CN114785480A (en) * 2022-04-12 2022-07-22 支付宝(杭州)信息技术有限公司 Multi-party secure computing method, device and system
CN115730333A (en) * 2022-11-11 2023-03-03 杭州博盾习言科技有限公司 Security tree model construction method and device based on secret sharing and homomorphic encryption
CN115935403A (en) * 2021-08-06 2023-04-07 中国移动通信有限公司研究院 Data detection method and device and user equipment
CN116506154A (en) * 2023-03-20 2023-07-28 湖南科技大学 Safe verifiable federal learning scheme
CN116596094A (en) * 2023-05-30 2023-08-15 湖南工商大学 Data auditing system, method, computer equipment and medium based on federal learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886417B (en) * 2019-03-01 2024-05-03 深圳前海微众银行股份有限公司 Model parameter training method, device, equipment and medium based on federal learning
JP2023008395A (en) * 2021-07-06 2023-01-19 ザ ガバニング カウンシル オブ ザ ユニバーシティ オブ トロント Secure, robust federated learning system by multi-party type homomorphic encryption and federated learning method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149160A (en) * 2020-08-28 2020-12-29 山东大学 Homomorphic pseudo-random number-based federated learning privacy protection method and system
CN112464287A (en) * 2020-12-12 2021-03-09 同济大学 Multi-party XGboost safety prediction model training method based on secret sharing and federal learning
CN113037460A (en) * 2021-03-03 2021-06-25 北京工业大学 Federal learning privacy protection method based on homomorphic encryption and secret sharing
CN115935403A (en) * 2021-08-06 2023-04-07 中国移动通信有限公司研究院 Data detection method and device and user equipment
CN113688999A (en) * 2021-08-23 2021-11-23 神州融安科技(北京)有限公司 Training method of transverse federated xgboost decision tree
CN113516256A (en) * 2021-09-14 2021-10-19 深圳市洞见智慧科技有限公司 Third-party-free federal learning method and system based on secret sharing and homomorphic encryption
CN114785480A (en) * 2022-04-12 2022-07-22 支付宝(杭州)信息技术有限公司 Multi-party secure computing method, device and system
CN115730333A (en) * 2022-11-11 2023-03-03 杭州博盾习言科技有限公司 Security tree model construction method and device based on secret sharing and homomorphic encryption
CN116506154A (en) * 2023-03-20 2023-07-28 湖南科技大学 Safe verifiable federal learning scheme
CN116596094A (en) * 2023-05-30 2023-08-15 湖南工商大学 Data auditing system, method, computer equipment and medium based on federal learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于秘密分享和梯度选择的高效安全联邦学习;董业;侯炜;陈小军;曾帅;;计算机研究与发展(10);全文 *

Also Published As

Publication number Publication date
CN116886271A (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN108898025B (en) Chaotic image encryption method based on double scrambling and DNA coding
Almaiah et al. A new hybrid text encryption approach over mobile ad hoc network
CN111106936A (en) SM 9-based attribute encryption method and system
CN112134688B (en) Asymmetric image encryption method based on quantum chaotic mapping and SHA-3
JP2008301391A (en) Broadcasting encryption system, encryption communication method, decoder and decoding program
CN108566501B (en) Color image encryption method based on mixed domain and LSS type coupling mapping grid
CN110149200A (en) A kind of color image encrypting method based on dynamic DNA and 4D chaos
CN112311524A (en) Image encryption method based on new chaotic mapping and compressed sensing
Gabr et al. A combination of decimal-and bit-level secure multimedia transmission
CN112052466A (en) Support vector machine user data prediction method based on multi-party secure computing protocol
Jassem et al. Enhanced Blowfish algorithm for image encryption based on chaotic map
CN109088721B (en) Entrustable uncovering and encrypting method
CN113869499A (en) High-efficiency conversion method for unintentional neural network
US20170041133A1 (en) Encryption method, program, and system
CN115865307B (en) Data point multiplication operation method for federal learning
CN112580071A (en) Data processing method and device
CN116886271B (en) Gradient aggregation method for longitudinal federal XGboost model training
Gentry et al. How to compress (reusable) garbled circuits
CN110460442A (en) A kind of key encapsulation method based on lattice
CN115865302A (en) Multi-party matrix multiplication method with privacy protection attribute
CN113162765B (en) Trustable public key encryption system and method based on non-interactive key agreement
CN114844635A (en) Method for safely carrying out Shuffle on data
CN114826611A (en) IND-sID-CCA2 security identifier broadcast encryption method based on SM9
Karolin et al. Image encryption and decryption using RSA algorithm with share creation techniques
Rachmawati et al. Hybrid Cryptosystem Combination Algorithm Of Hill Cipher 3x3 and Elgamal To Secure Instant Messaging For Android

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant