CN116886271B

CN116886271B - Gradient aggregation method for longitudinal federal XGboost model training

Info

Publication number: CN116886271B
Application number: CN202311150627.0A
Authority: CN
Inventors: 马煜翔; 冯黎明; 刘文博
Original assignee: Lanxiang Zhilian Hangzhou Technology Co ltd
Current assignee: Lanxiang Zhilian Hangzhou Technology Co ltd
Priority date: 2023-09-07
Filing date: 2023-09-07
Publication date: 2023-11-21
Anticipated expiration: 2043-09-07
Also published as: CN116886271A

Abstract

The invention discloses a gradient aggregation method for training a longitudinal federal XGboost model. It comprises the following steps: the initiator calculates a gradient vector g and a second-order gradient vector h, and secretly shares the gradient vector g and the second-order gradient vector h to the participant and the calculator; the computing party carries out homomorphic encryption on the obtained fragments and then sends the fragments to the participators; the participant calculates a one-step gradient aggregate value vector ciphertext and a second-order gradient aggregate value vector ciphertext based on the obtained fragments and the fragments which are homomorphically encrypted by the calculator; the participant shares the one-step gradient aggregate value vector ciphertext and the second-order gradient aggregate value vector ciphertext to the initiator and the calculator in a secret manner; the computing party decrypts the ciphertext fragment to obtain a corresponding plaintext value and sends the plaintext value to the initiating party; the initiator calculates a one-step gradient aggregate value vector G and a second-order gradient aggregate value vector H of the plaintext. According to the method, the computing party is introduced on the basis of protecting the data privacy of each party, and the computing pressure of the initiator is effectively solved.

Description

Gradient aggregation method for longitudinal federal XGboost model training

Technical Field

The invention relates to the technical field of federal learning, in particular to a gradient aggregation method for longitudinal federal XGboost model training.

Background

The problem of data island is solved to a great extent by the occurrence of federal learning, while XGBoost is a machine learning algorithm based on a gradient lifting decision tree (Gradient Boosting Decision Trees), has high-efficiency realization and calculation speed and a certain interpretability, and is one of the preferred algorithms in practical application, and the federal XGBoost algorithm is an important algorithm in federal learning and has extremely high use frequency.

The current federal XGBoost algorithm is mostly based on a semi-homomorphic algorithm to realize privacy protection, but the semi-homomorphic algorithm has a high demand on computing resources, and in some application scenarios, a party participating in federal learning does not have enough computational power support, for example: the data user (such as small-sized science and technology company) has fewer computing resources, cannot bear expensive homomorphic encryption and decryption computation, and the computing resources of the data holder (large-sized science and technology company) are sufficient, so that the data user needs a third party with sufficient computing resources to assist in computation, but the data of the data user is leaked to the third party.

Disclosure of Invention

The invention aims to solve the technical problems, and provides a gradient aggregation method for training a longitudinal federal XGboost model, which introduces a calculator on the basis of protecting the data privacy of an initiator and a participant, and effectively solves the calculation pressure of the initiator.

In order to solve the problems, the invention is realized by adopting the following technical scheme:

the invention discloses a gradient aggregation method for training a longitudinal federal XGboost model, which comprises the following steps that an initiator and a participant hold an intersection sample set for training:

s1: the initiator calculates a first gradient vector g and a second gradient vector h corresponding to the intersection sample set according to the label value of each sample, and secretly shares the first gradient vector g to obtain a first fragment g ₁ Second part g ₂ Secret sharing is carried out on the second-order gradient vector h to obtain a first fragment h ₁ Second segment h ₂ Dividing the first slice g ₁ First segment h ₁ Send to the participator, divide the second slice g ₂ Second segment h ₂ Sending the result to a computing party;

s2: calculating the second segment g ₂ Second segment h ₂ Homomorphic encryption is performed to obtain a second piece of ciphertext enc (g) ₂ ) Second piece of ciphertext enc (h ₂ ) And send to the participants;

s3: the participator according to the first slice g ₁ Second piece of ciphertext enc (g) ₂ ) Calculating a gradient aggregate value vector ciphertext enc (G) according to the first slice h ₁ Second piece of ciphertext enc (h ₂ ) Calculating a second-order gradient aggregate value vector ciphertext enc (H);

s4: the participant performs secret sharing on a stepwise aggregate value vector ciphertext enc (G) to obtain a first fragment enc (G) of the ciphertext ₁ ) And a second segment G of plain text ₂ Secret sharing is carried out on the second-order gradient aggregate value vector ciphertext enc (H) to obtain a first fragment enc (H) of the ciphertext ₁ ) And a second segment H of plain text ₂ The first slice enc (G ₁ ) First slice enc (H ₁ ) Send to the computing side to divide the second slice G ₂ Second segment H ₂ Sending to an initiator;

s5: the calculator calculates the first slice enc (G ₁ ) First slice enc (H ₁ ) Decrypting to obtain a first fragment G of the plaintext ₁ First segment H ₁ And send to the initiator;

s6: the initiator according to the first slice G ₁ Second segment G ₂ Calculating a gradient aggregate value vector G of the plaintext according to the first segment H ₁ Second segment H ₂ And calculating a second-order gradient aggregate value vector H of the plaintext.

In the scheme, an initiator calculates first-order gradients and second-order gradients corresponding to each sample according to label values of the samples, one step of all samples forms a step vector g, the second-order gradients of all samples form a second-order gradient vector h, the first-order gradient vector g and the second-order gradient vector h are split into two corresponding fragments by adopting a secret sharing algorithm, the corresponding fragments are respectively sent to a participant and a calculator, and the calculated fragments are homomorphic encrypted by the calculator and then sent to the participant.

The participant calculates a gradient aggregate value vector ciphertext enc (G) and a second-order gradient aggregate value vector ciphertext enc (H) based on the obtained fragments and the fragments after homomorphic encryption of the calculator, wherein the plaintext after decryption corresponding to the gradient aggregate value vector ciphertext enc (G) is the gradient aggregate value vector G, and the plaintext after decryption corresponding to the second-order gradient aggregate value vector ciphertext enc (H) is the second-order gradient aggregate value vector H.

The method comprises the steps that a participant adopts a secret sharing algorithm to divide a one-step gradient aggregate value vector ciphertext enc (G) and a second-order gradient aggregate value vector ciphertext enc (H) into a corresponding plaintext fragment and a corresponding ciphertext fragment respectively, the plaintext fragment is sent to an initiator, the ciphertext fragment is sent to a calculator, the calculator decrypts the ciphertext fragment to obtain a corresponding plaintext value and sends the corresponding plaintext value to the initiator, and the initiator calculates the one-step gradient aggregate value vector G and the second-order gradient aggregate value vector H of the plaintext.

The data of the initiator and the data of the participant are not leaked, the initiator does not need to perform homomorphic encryption and decryption calculation on the data, and the computing party performs homomorphic encryption and decryption calculation, so that the computing pressure of the initiator is effectively solved.

Preferably, the step S2 includes the steps of: the calculator generates a public key pk and a private key sk, and uses the public key pk to pair the second segment g ₂ Second segment h ₂ Homomorphic encryption is performed to obtain a second piece of ciphertext enc (g) ₂ ) Second piece of ciphertext enc (h ₂ ) And sent to the participants.

Preferably, the step S3 includes the steps of:

s31: the participant calculates a gradient vector ciphertext enc (g) corresponding to the gradient vector g and a second gradient vector ciphertext enc (h) corresponding to the second gradient vector h, enc (g) =g ₁ +enc(g ₂ )，enc(h)=h ₁ +enc(h ₂ )；

S32: the participant calculates a gradient aggregate value vector ciphertext enc (G) according to a gradient vector ciphertext enc (G), and calculates a second-order gradient aggregate value vector ciphertext enc (H) of the ciphertext according to a second-order gradient vector ciphertext enc (H).

Preferably, the step S1 further includes the steps of: the initiator and the participant perform box division processing on samples in the intersection sample set to obtain m boxes, wherein m is more than or equal to 2;

the step S32 includes the steps of:

s321: the participant calculates a step aggregation value ciphertext corresponding to each sub-box according to a step aggregation value ciphertext value corresponding to a sample in each sub-box, and m step aggregation value ciphertexts corresponding to the sub-boxes form a step aggregation value vector ciphertext enc (G);

s322: the participator calculates a second-order gradient aggregate value ciphertext corresponding to each sub-box according to the second-order gradient ciphertext value corresponding to the sample in each sub-box, and the second-order gradient aggregate value ciphertext corresponding to the m sub-boxes forms a second-order gradient aggregate value vector ciphertext enc (H).

Preferably, in the step S1, a step vector g is shared in secret to obtain a first slice g ₁ Second part g ₂ The method of (2) is as follows:

the initiator generates a random number vector r _g Random number vector r _g Consistent with the dimension of a step vector g, the random number vector r is calculated _g As the first slice g ₁ G-r _g As the second segment g ₂ 。

Preferably, in the step S1, the second-order gradient vector h is secret-shared to obtain a first slice h ₁ Second segment h ₂ The method of (2) is as follows:

the initiator generates a random number vector r _h Random number vector r _h Consistent with the dimension of the second-order gradient vector h, the random number vector r is calculated _h As the first slice h ₁ Will h-r _h As the second segment h ₂ 。

Preferably, in the step S4, the first segment enc (G) of the ciphertext obtained by secret sharing of the one-step aggregate value vector ciphertext enc (G) ₁ ) And a second segment G of plain text ₂ The method of (2) is as follows:

party generation of random number vector r _G Random number vector r _G Consistent with the dimension of a stepwise aggregate value vector ciphertext enc (G), a random number vector r is generated _G Second segment G as plaintext ₂ Enc (G) -r _G First segment enc (G) ₁ )。

Preferably, the step S4 secrets the second-order gradient aggregate value vector ciphertext enc (H)First segment enc (H) sharing ciphertext ₁ ) And a second segment H of plain text ₂ The method of (2) is as follows:

party generation of random number vector r _H Random number vector r _H Consistent with the dimension of the second-order gradient aggregate value vector ciphertext enc (H), the random number vector r is calculated _H Second segment H as plaintext ₂ Enc (H) -r _H Second segment enc (H) as ciphertext ₁ )。

Preferably, the random number in the random number vector is not 0.

Preferably, the step S6 includes the steps of: initiator calculates g=g ₁ +G ₂ ，H=H ₁ +H ₂ Obtaining a gradient aggregate value vector G of the plaintext and a second-order gradient aggregate value vector H of the plaintext.

The beneficial effects of the invention are as follows: by introducing the secret sharing and homomorphic encryption into the computing party on the basis of protecting the data privacy of the initiator and the participant, the data leakage of the initiator and the participant is avoided, and the computing pressure of the initiator is effectively solved.

Drawings

FIG. 1 is a flow chart of an embodiment;

FIG. 2 is a schematic diagram of a data processing flow of an embodiment.

Detailed Description

The technical scheme of the invention is further specifically described below through examples and with reference to the accompanying drawings.

Examples: the gradient aggregation method for training the longitudinal federal XGboost model of the embodiment includes the following steps, as shown in fig. 1 and 2, that an initiator and a participant hold an intersection sample set for training:

s1: the initiator and the participant perform box division processing on samples in the intersection sample set to obtain m boxes, wherein m is more than or equal to 2;

the initiator calculates a first gradient vector g and a second gradient vector h corresponding to the intersection sample set according to the label value of each sample in the intersection sample set, and secretly shares the first gradient vector g to obtain a first fragment g ₁ Second part g ₂ ，g=g ₁ +g ₂ Secret sharing is carried out on the second-order gradient vector h to obtain a first fragment h ₁ Second segment h ₂ ，h=h ₁ +h ₂ Dividing the first slice g ₁ First segment h ₁ Send to the participator, divide the second slice g ₂ Second segment h ₂ Sending the result to a computing party;

the intersection sample set is noted as d, d= [ d (1), d (2) … … d (n)]，g=[g(1), g(2)……g(n)]，h=[h(1), h(2)……h(n)]，g ₁ =[g ₁ (1), g ₁ (2)……g ₁ (n)]，g ₂ =[g ₂ (1), g ₂ (2)……g ₂ (n)]，h ₁ =[h ₁ (1), h ₁ (2)……h ₁ (n)]，h ₂ =[h ₂ (1), h ₂ (2)……h ₂ (n)]；

Where n is the number of samples in the intersection sample set, d (i) represents the ith sample in the intersection sample set, g (i) represents the first-order gradient value corresponding to d (i), h (i) represents the second-order gradient value corresponding to d (i), g ₁ (i) A first slice representing g (i), g ₂ (i) A second segment, h, representing g (i) ₁ (i) A first slice representing h (i), h ₂ (i) A second segment representing h (i), g (i) =g ₁ (i)+g ₂ (i)，h(i)=h ₁ (i)+h ₂ (i)，1≤i≤n；

Secret sharing is carried out on a step vector g to obtain a first slice g ₁ Second part g ₂ The method of (2) is as follows:

the initiator generates a random number vector r _g ，r _g =[r _g (1), r _g (2)……r _g (n)]Random number vector r _g The random number in the random number is not 0, and the random number vector r _g As the first slice g ₁ G-r _g As the second segment g ₂ G is the ₁ =r _g ，g ₂ =g-r _g ；

Secret sharing is carried out on the second-order gradient vector h to obtain a first slice h ₁ Second segment h ₂ The method of (2) is as follows:

initiator generationRandom number vector r _h ，r _h =[r _h (1), r _h (2)……r _h (n)]Random number vector r _h The random number in the random number is not 0, and the random number vector r _h As the first slice h ₁ Will h-r _h As the second segment h ₂ I.e. h ₁ =r _h ，h ₂ =h-r _h ；

S2: the calculator generates a public key pk and a private key sk, and uses the public key pk to pair the second segment g ₂ Second segment h ₂ Homomorphic encryption is performed to obtain a second piece of ciphertext enc (g) ₂ ) Second piece of ciphertext enc (h ₂ ) And sending the encrypted ciphertext to the participant, wherein enc (X) represents a ciphertext obtained by homomorphic encryption of data X by adopting a public key pk;

s3: the participant calculates a gradient vector ciphertext enc (g) corresponding to the gradient vector g and a second gradient vector ciphertext enc (h) corresponding to the second gradient vector h, enc (g) =g ₁ +enc(g ₂ )，enc(h)=h ₁ +enc(h ₂ )，enc(g)=[enc(g(1)), enc(g(2))……enc(g(n))]，enc(h)=[enc(h(1)), enc(h(2))……enc(h(n))]Enc (g (i)) represents a ciphertext corresponding to the first-order gradient value g (i), enc (h (i)) represents a ciphertext corresponding to the second-order gradient value h (i);

the participant calculates a one-step aggregation value ciphertext corresponding to each sub-box according to a one-step ciphertext value corresponding to a sample in each sub-box by adopting an XGboost algorithm, m one-step aggregation value ciphertexts corresponding to each sub-box form a one-step aggregation value vector ciphertext enc (G), enc (G) = [ enc (G (1)), enc (G (2)) … … enc (G (m)) ], 1.ltoreq.j.ltoreq.m, enc (G (j)) represents a one-step aggregation value ciphertext corresponding to the j-th sub-box;

the participants adopt an XGboost algorithm to calculate a second-order gradient aggregate value ciphertext corresponding to each sub-box according to a second-order gradient ciphertext value corresponding to a sample in each sub-box, the second-order gradient aggregate value ciphertext corresponding to m sub-boxes forms a second-order gradient aggregate value vector ciphertext enc (H), enc (H) = [ enc (H (1)), enc (H (2)) … … enc (H (m)) ], 1.ltoreq.j.ltoreq.m, enc (H (j)) represents a second-order gradient aggregate value ciphertext corresponding to a j-th sub-box;

s4: the participant aggregates a step value vectorSecret sharing is carried out on the ciphertext enc (G) to obtain a first fragment enc (G) of the ciphertext ₁ ) And a second segment G of plain text ₂ ，enc(G)=enc(G ₁ )+G ₂ Secret sharing is carried out on the second-order gradient aggregate value vector ciphertext enc (H) to obtain a first fragment enc (H) of the ciphertext ₁ ) And a second segment H of plain text ₂ ，enc(H)=enc(H ₁ )+H ₂ The first slice enc (G ₁ ) First slice enc (H ₁ ) Send to the computing side to divide the second slice G ₂ Second segment H ₂ Sending to an initiator;

secret sharing is carried out on a stepwise aggregate value vector ciphertext enc (G) to obtain a first fragment enc (G) of the ciphertext ₁ ) And a second segment G of plain text ₂ The method of (2) is as follows:

party generation of random number vector r _G ，r _G =[r _G (1), r _G (2)……r _G (m)]Random number vector r _G The random number in the random number is not 0, and the random number vector r _G Consistent with the dimension of a stepwise aggregate value vector ciphertext enc (G), a random number vector r is generated _G Second segment G as plaintext ₂ Enc (G) -r _G First segment enc (G) ₁ ) G, i.e ₂ =r _G ，enc(G ₁ )=enc(G)-r _G ；

Secret sharing is carried out on the second-order gradient aggregate value vector ciphertext enc (H) to obtain a first fragment enc (H) of the ciphertext ₁ ) And a second segment H of plain text ₂ The method of (2) is as follows:

party generation of random number vector r _H ，r _H =[r _H (1), r _H (2)……r _H (m)]Random number vector r _H The random number in the random number is not 0, and the random number vector r _H Consistent with the dimension of the second-order gradient aggregate value vector ciphertext enc (H), the random number vector r is calculated _H Second segment H as plaintext ₂ Enc (H) -r _H Second segment enc (H) as ciphertext ₁ ) I.e. H ₂ =r _H ，enc(H ₁ )=enc(H)-r _H ；

S5: the calculator uses the private key sk to perform a first segmentation enc (G ₁ ) First splitenc(H ₁ ) Decrypting to obtain a first fragment G of the plaintext ₁ First segment H ₁ And send to the initiator;

s6: initiator calculates g=g ₁ +G ₂ ，H=H ₁ +H ₂ Obtaining a gradient aggregate value vector G of the plaintext and a second-order gradient aggregate value vector H of the plaintext.

In the scheme, firstly, an initiator and a participant perform box-dividing processing on samples in an intersection sample set to obtain m boxes.

Then, the initiator calculates a first-order gradient and a second-order gradient corresponding to each sample according to the label value of the sample, one step degree of all samples forms a step degree vector g, the second-order gradient of all samples forms a second-order gradient vector h, the step degree vector g and the second-order gradient vector h are split into two corresponding fragments by adopting a secret sharing algorithm, the two fragments are respectively sent to the participant and the calculator, and the calculated fragments are homomorphic encrypted by the calculator and then sent to the participant.

The participator calculates a stepwise aggregation value ciphertext and a second order gradient aggregation value ciphertext corresponding to each bin based on the obtained fragments and the fragments after homomorphic encryption of the computing party, m stepwise aggregation value ciphers corresponding to the bins form a stepwise aggregation value vector ciphertext (G), m stepwise aggregation value ciphers corresponding to the bins form a second order gradient aggregation value vector ciphertext (H), a plaintext corresponding to the stepwise aggregation value vector ciphertext (G) after decryption is a stepwise aggregation value vector G, and a plaintext corresponding to the second order gradient aggregation value vector ciphertext (H) after decryption is a second order gradient aggregation value vector H.

Illustrating:

the method comprises the steps that a small-sized science and technology company A and a large-sized science and technology company B conduct gradient aggregation calculation of longitudinal federal XGboost model training, the small-sized science and technology company A is used as an initiator due to insufficient calculation resources, the large-sized science and technology company B is used as a participant, the initiator and the participant hold intersection sample sets d, d= [ d (1), d (2), d (3), d (4) and d (5) ] for training, and the initiator and the participant conduct box division processing on samples in the intersection sample sets to obtain a box division f1 and a box division f2, samples d (1), d (2), d (3) belong to a box division f1, d (4) and d (5) belong to a box division f2.

The initiator calculates a corresponding first-step gradient vector g and a second-order gradient vector h according to the label value of each sample, wherein g= [ g (1), g (2), g (3), g (4) and g (5)]，h=[h(1), h(2) , h(3) , h(4) , h(5)]The initiator generates a random number vector r _g ，r _g =[r _g (1), r _g (2) , r _g (3) , r _g (4) , r _g (5)]Random number vector r _g The random number in the code is not 0, and a step vector g is shared in secret to obtain a first fragment g ₁ Second part g ₂ ，g ₁ =r _g ，g ₂ = g-r _g =[g(1)-r _g (1), g(2)-r _g (2), g(3)-r _g (3), g(4)-r _g (4), g(5)-r _g (5)]The initiator generates a random number vector r _h ，r _g =[r _h (1), r _h (2) , r _h (3) , r _h (4) , r _h (5)]Random number vector r _h The random number in the first segment h is not 0, and the second-order gradient vector h is secret shared to obtain the first segment h ₁ Second segment h ₂ ，h ₁ =r _h ，h ₂ = h-r _h =[h(1)-r _h (1), h(2)-r _h (2), h(3)-r _h (3), h(4)-r _h (4), h(5)-r _h (5)]Dividing the first slice g ₁ First segment h ₁ Send to the participator, divide the second slice g ₂ Second segment h ₂ Sending the result to a computing party;

the calculator generates a public key pk and a private key sk, and uses the public key pk to pair the second segment g ₂ Second segment h ₂ Homomorphic encryption is performed to obtain a second piece of ciphertext enc (g) ₂ ) Second piece of ciphertext enc (h ₂ ) And send to the participants;

calculation of g by the Party ₁ +enc(g ₂ )=enc(g ₁ +g ₂ )=enc(g)，h ₁ +enc(h ₂ )=enc(h ₁ +h ₂ ) =enc (h), a gradient vector ciphertext enc (g) corresponding to a gradient vector g and a second gradient vector ciphertext enc (h) corresponding to a second gradient vector h are obtained, where enc (X) represents a ciphertext obtained by homomorphic encryption of data X using a public key pk.

The participant calculates a step-degree aggregation value ciphertext enc (G (1)) corresponding to the sub-box f1 according to samples d (1), d (2) and d (3) in the sub-box f1, a step-degree ciphertext value enc (G (1)) corresponding to the sub-box f1 and enc (G (3)) by adopting an XGboost algorithm, and calculates a step-degree aggregation value ciphertext enc (G (2)) corresponding to the sub-box f2 according to samples d (4), d (5) and enc (G (5)) in the sub-box f2, thereby obtaining a step-degree aggregation value vector ciphertext enc (G), enc (G) = [ enc (G (1)), enc (G (2)) ].

The participant adopts an XGboost algorithm to calculate a second-order gradient aggregate value ciphertext enc (H (1)) corresponding to the sub-box f1 according to the samples d (1), d (2), d (3) corresponding to the second-order gradient ciphertext values enc (H (1)), enc (H (2)), enc (H (3)), and calculates a second-order gradient aggregate value ciphertext enc (H (2)) corresponding to the sub-box f2 according to the samples d (4), d (5) corresponding to the second-order gradient ciphertext values enc (H (4)), enc (H (5)), thereby obtaining a second-order gradient aggregate value vector ciphertext enc (H), enc (H) = [ enc (H (1)), enc (H (2)) ].

Party generation of random number vector r _G ，r _G =[r _G (1), r _G (2)]Random number vector r _G The random number in the cipher text is not 0, and the first fragment enc (G) of the cipher text is obtained by secret sharing of a stepwise aggregate value vector cipher text enc (G) ₁ ) And a second segment G of plain text ₂ ，G ₂ =r _G ，enc(G ₁ )=enc(G)-r _G =[enc(G(1))-r _G (1), enc(G(2))-r _G (2)]The party generates a random number vector r _H ，r _H =[r _H (1), r _H (2)]Random number vector r _H The random number in the cipher text is not 0, and the second-order gradient aggregate value vector cipher text enc (H) is secret shared to obtain a first fragment enc (H) of the cipher text ₁ ) And a second segment H of plain text ₂ ，H ₂ =r _H ，enc(H ₁ )=enc(H)-r _H =[enc(H(1))-r _H (1)，enc(H(2))-r _H (2)]The first slice enc (G ₁ ) First slice enc (H ₁ ) Send to the computing side to divide the second slice G ₂ Second segment H ₂ Sending to an initiator;

the calculator uses the private key sk to perform a first segmentation enc (G ₁ ) First slice enc (H ₁ ) Decrypting to obtain a first fragment G of the plaintext ₁ First segment H ₁ And send to the initiator, G ₁ =[G(1)-r _G (1), G(2)-r _G (2)]，H ₁ =[H(1)-r _H (1), H(2)-r _H (2)]。

Initiator computing G ₁ +G ₂ =[G(1)-r _G (1), G(2)-r _G (2)]+[r _G (1), r _G (2)]=[G(1), G(2)]=G，H ₁ +H ₂ =[H(1)-r _H (1), H(2)-r _H (2)]+[r _H (1), r _H (2)]=[H(1), H(2)]=h, a one-step gradient aggregate value vector G of the plaintext is obtained, and a second-order gradient aggregate value vector H of the plaintext is obtained.

Claims

1. A gradient aggregation method for training a longitudinal federal XGboost model, wherein an initiator and a participant hold an intersection sample set for training, and the method is characterized by comprising the following steps of:

the initiator calculates a first gradient vector g and a second gradient vector h corresponding to the intersection sample set according to the label value of each sample, and secretly shares the first gradient vector g to obtain a first fragment g ₁ Second part g ₂ Secret sharing is carried out on the second-order gradient vector h to obtain a first fragment h ₁ Second segment h ₂ Dividing the first slice g ₁ First segment h ₁ Send to the participator, divide the second slice g ₂ Second segment h ₂ Sending the result to a computing party;

s3: the participant calculates a gradient vector ciphertext enc (g) corresponding to the gradient vector g and a second gradient vector ciphertext enc (h) corresponding to the second gradient vector h, enc (g) =g ₁ +enc(g ₂ )，enc(h)=h ₁ +enc(h ₂ )，enc(g)=[enc(g(1)), enc(g(2))……enc(g(n))]，enc(h)=[enc(h(1)), enc(h(2))……enc(h(n))]Enc (g (i)) represents ciphertext corresponding to a first-order gradient value g (i), enc (h (i)) represents ciphertext corresponding to a second-order gradient value h (i), and n is the number of samples in an intersection sample set;

2. A gradient aggregation method for longitudinal federal XGboost model training according to claim 1, wherein the step S2 comprises the steps of: the calculator generates a public key pk and a private key sk, and uses the public key pk to pair the second segment g ₂ Second segment h ₂ Homomorphic encryption is performed to obtain a second piece of ciphertext enc (g) ₂ ) Second piece of ciphertext enc (h ₂ ) And sent to the participants.

3. The gradient aggregation method for training the longitudinal federal XGboost model according to claim 1, wherein in the step S1, a gradient vector g is shared secretly to obtain a first slice g ₁ Second part g ₂ The method of (2) is as follows:

the initiator generates a random number vector r _g Random number vector r _g Consistent with the dimension of a step vector g, the random number vector r is calculated _g As the firstSlicing g ₁ G-r _g As the second segment g ₂ 。

4. The gradient aggregation method for training the longitudinal federal XGboost model according to claim 1, wherein the step S1 is characterized in that the second-order gradient vector h is secret-shared to obtain a first slice h ₁ Second segment h ₂ The method of (2) is as follows:

5. The gradient aggregation method for training the longitudinal federal XGboost model according to claim 1, wherein in the step S4, a gradient aggregate value vector ciphertext enc (G) is secret-shared to obtain a first fragment enc (G ₁ ) And a second segment G of plain text ₂ The method of (2) is as follows:

6. The gradient aggregation method for training the longitudinal federal XGboost model according to claim 1, wherein in the step S4, the second-order gradient aggregate value vector ciphertext enc (H) is secret-shared to obtain a first fragment enc (H ₁ ) And a second segment H of plain text ₂ The method of (2) is as follows:

7. A gradient aggregation method for longitudinal federal XGboost model training according to claim 3 or 4 or 5 or 6, wherein the random numbers within the random number vector are not 0.

8. Gradient aggregation method for longitudinal federal XGboost model training according to claim 1 or 2, characterized in that said step S6 comprises the steps of: initiator calculates g=g ₁ +G ₂ ，H=H ₁ +H ₂ Obtaining a gradient aggregate value vector G of the plaintext and a second-order gradient aggregate value vector H of the plaintext.