CN113722739A

CN113722739A - Gradient lifting tree model generation method and device, electronic equipment and storage medium

Info

Publication number: CN113722739A
Application number: CN202111038483.0A
Authority: CN
Inventors: 杨恺; 彭南博; 王虎; 黄志翔; 陈晓霖
Original assignee: Jingdong Technology Holding Co Ltd
Current assignee: Jingdong Technology Holding Co Ltd
Priority date: 2021-09-06
Filing date: 2021-09-06
Publication date: 2021-11-30
Anticipated expiration: 2041-09-06
Also published as: CN113722739B

Abstract

The application provides a generation method, a generation device, electronic equipment and a storage medium of a gradient lifting tree model, wherein the method comprises the following steps: generating a business side encryption aggregation value of the first derivative and the second derivative in each data sub-box according to the public key of the data side, the encryption code of the sample set of the current leaf node of the (m-1) th tree and the first derivative and the second derivative of the predicted value after each piece of wind control data iterates for m-1 times, and sending the business side encryption aggregation value to the first data side for decryption; determining a target optimal splitting characteristic and a corresponding target optimal splitting point according to the maximum value of the business side information gain sent by the first data side after decryption, the corresponding business side splitting characteristic number and the business side splitting point number; acquiring encrypted codes of sample sets of two leaf nodes obtained after the current leaf node is split; calculating the target weight of each leaf node according to the encryption codes of the sample set of each leaf node; and calculating according to the target weight to obtain a gradient lifting tree model.

Description

Gradient lifting tree model generation method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of federal learning, in particular to a method and a device for generating a gradient lifting tree model, electronic equipment and a storage medium.

Background

At present, the data-driven artificial intelligence technology plays a great role in various industries, and brings high value. Therefore, privacy protection and data security of data are increasingly emphasized. The concept of federal learning is proposed, and it is expected that multiple participants will share no data, only intermediate results, and no data can be inferred while achieving the goal of common modeling. The model based on longitudinal federal learning is widely applied in wind control scenes.

In the related art, a model established by using federal learning in a wind control scene often has low interpretability, and the establishment of the model is difficult to realize in a lossless manner while data safety is guaranteed.

Disclosure of Invention

The application provides a method and a device for generating a gradient lifting tree model, electronic equipment and a storage medium.

An embodiment of a first aspect of the present application provides a method for generating a gradient lifting tree model, which is applied to a service side, and includes: generating a business side encryption aggregation value of the first-order derivative and the second-order derivative in each data sub-box according to a data side public key, an encryption code of a sample set of a current leaf node of an M-1 th tree, and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M; sending the business side encrypted aggregation value to a first data side in the data sides; determining a target optimal splitting characteristic and a corresponding target optimal splitting point according to the maximum value of the business side information gain sent after the first data side decrypts the encrypted aggregation value of the business side, the corresponding business side splitting characteristic number and the business side splitting point number; acquiring encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point; calculating the target weight of each leaf node according to the encryption codes of the sample set of each leaf node; and calculating according to the target weight to obtain a gradient lifting tree model.

The generation method of the gradient lifting tree model in the embodiment of the application generates a business side encryption aggregation value of a first derivative and a second derivative in each data sub-box according to a public key of a data side, encryption coding of a sample set of a current leaf node of an M-1 tree, a first derivative and a second derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M, sends the business side encryption aggregation value to a first data side in the data side, determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of business side information gain sent after the business side encryption aggregation value is decrypted by the first data side, a corresponding business side splitting characteristic number and a corresponding business side splitting point number, obtains encryption coding of a sample set of two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the current leaf node, and calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. And (3) by utilizing a homomorphic encryption technology, keeping a sample set held by a data party secret for a service party, and ensuring that the service party can correctly calculate the information gain corresponding to each splitting point of each characteristic under the condition of encryption, thereby completing the establishment of the model. The model generation method can ensure that the splitting characteristics and the meaning of the data party disclosed to the service party still cannot reveal the data privacy of the data party, the gradient lifting tree model can be generated in a lossless mode, and meanwhile the interpretability of the gradient lifting tree model is enhanced.

The embodiment of the second aspect of the present application provides a method for generating a gradient lifting tree model, which is applied to a first data side, and includes: receiving a business party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a business party, wherein the business party encrypted aggregate value is generated by the business party according to a public key of the data party, an encrypted code of a sample set of a current leaf node of an M-1 th tree and the first-order derivative and the second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, and M is smaller than a preset number M; carrying out decryption calculation on the encrypted aggregation value of the service party to obtain the maximum value of the information gain of the service party, and the corresponding service party splitting characteristic number and the corresponding service party splitting point number; and sending the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number to the service party so that the service party determines a target optimal splitting feature and a corresponding target optimal splitting point according to the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number, acquires an encrypted code of a sample set of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting feature and the target optimal splitting point, calculates a target weight of each leaf node according to the encrypted code of the sample set of each leaf node, and calculates a gradient lifting tree model according to the target weight.

The method for generating the gradient lifting tree model according to the embodiment of the application receives a business side encryption aggregation value of a first-order derivative and a second-order derivative in each data sub-box sent by a business side, the business side encryption aggregation value is generated by the business side according to a public key of the data side, an encryption code of a sample set of a current leaf node of an M-1 th tree and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, M is smaller than a preset number M, decryption calculation is carried out on the business side encryption aggregation value to obtain a maximum value of business side information gain, a corresponding business side splitting characteristic number and a corresponding business side splitting point number, the maximum value of the business side information gain, the business side splitting characteristic number and the business side splitting point number are sent to the business side, and the business side determines a target splitting characteristic and a corresponding target optimum splitting point according to the maximum value of the business side information gain, the business side splitting characteristic number and the business side splitting point number And splitting points, acquiring encryption codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculating the target weight of each leaf node according to the encryption codes of the sample sets of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. By sending the encryption codes of the sample set to the business side, the data security of the sample set is guaranteed, the characteristic meaning of the data side is disclosed to the business side on the premise, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on federal learning is enhanced.

An embodiment of a third aspect of the present application provides a method for generating a gradient lifting tree model, which is applied to a second data side, and includes: receiving an encrypted first derivative and an encrypted second derivative sent by a service party, wherein the encrypted first derivative and the encrypted second derivative are obtained by the service party after homomorphic encryption is carried out on the first derivative and the second derivative; generating a data side encryption aggregation value of the first derivative and the second derivative in each data sub-box according to a service side public key, a sample set of a current leaf node, the encryption first derivative and the encryption second derivative; sending the data side encryption aggregation value to the service side; receiving the maximum value of the data party information gain sent by the service party after decrypting the data party encrypted aggregation value, and the corresponding data party splitting characteristic number and the data party splitting point number; determining the optimal splitting characteristic of the data party, the optimal splitting point of the data party and the maximum information gain of the data party according to the maximum value of the information gain of the data party, the number of the splitting characteristic of the data party and the number of the splitting point of the data party; sending the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side to the service side, so that the service side can determine the optimal splitting characteristic of a target and the corresponding optimal splitting point of the target according to the optimal splitting characteristic of the data side, the optimal splitting point of the data side, the maximum information gain of the data side, the maximum value of the information gain of the service side, the splitting characteristic number of the service side and the splitting point number of the service side, obtain the encrypted codes of a sample set of two leaf nodes obtained after splitting of the current leaf node based on the optimal splitting characteristic of the target and the optimal splitting point of the target, calculate the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, calculate a gradient lifting tree model according to the target weight, and obtain the maximum value, the splitting characteristic of the information gain of the service side and the maximum value of the target lifting tree model are obtained according to the encrypted codes of the target weight of each leaf node, The service party splitting characteristic number and the service party splitting point number are obtained by the first data party through decryption calculation of service party encryption aggregation values of a first derivative and a second derivative in each data sub-box sent by the service party.

The method for generating the gradient lifting tree model includes the steps of receiving an encrypted first-order derivative and an encrypted second-order derivative sent by a service party, obtaining the encrypted first-order derivative and the encrypted second-order derivative after the service party encrypts the first-order derivative and the encrypted second-order derivative in a homomorphic mode, generating a data party encrypted aggregation value of the first-order derivative and the second-order derivative in each data sub-box according to a public key of the service party, a sample set of a current leaf node, the encrypted first-order derivative and the encrypted second-order derivative, sending the data party encrypted aggregation value to the service party, receiving a maximum value of data party information gain sent by the service party after the service party decrypts the data party encrypted aggregation value, and corresponding data party splitting characteristic numbers and data party splitting point numbers, and determining data party splitting characteristics according to the maximum value of the data party information gain, the data party splitting characteristic numbers and the data party splitting point numbers, and determining data party splitting characteristics of the data parties, The data side optimal splitting point and the data side maximum information gain are sent to a service side, so that the service side can determine a target optimal splitting characteristic and a corresponding target optimal splitting point according to the data side optimal splitting characteristic, the data side optimal splitting point, the data side maximum information gain, the maximum value of the service side information gain, a service side splitting characteristic number and the service side splitting point number, obtain encrypted codes of a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculate a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and obtain a gradient lifting tree model according to the target weight calculation, and calculate the maximum value of the service side information gain, The service party splitting characteristic number and the service party splitting point number are obtained by carrying out decryption calculation on a service party encryption aggregation value of a first-order derivative and a second-order derivative in each data sub-box sent by the service party by the first data party. The characteristic meaning of the data side is disclosed to the business side on the premise of hiding the sample set, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on the federal learning is enhanced.

An embodiment of a fourth aspect of the present application provides a device for generating a gradient spanning tree model, which is applied to a service provider, and includes: the first generation module is configured to generate a business side encryption aggregation value of the first order derivative and the second order derivative in each data bin according to a data side public key, encryption codes of a sample set of a current leaf node of an M-1 th tree, and a first order derivative and a second order derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M; a first sending module configured to send the business side encrypted aggregate value to a first one of the data sides; a first determining module, configured to determine a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of a service party information gain sent after the first data party decrypts the encrypted aggregation value of the service party, and a corresponding service party splitting characteristic number and a corresponding service party splitting point number; the obtaining module is configured to obtain encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point; a first calculation module configured to calculate a target weight for each leaf node from the cryptographic encoding of the sample set for each leaf node; and the second calculation module is configured to calculate a gradient lifting tree model according to the target weight.

The generation device of the gradient lifting tree model in the embodiment of the application generates a business side encryption aggregation value of a first derivative and a second derivative in each data sub-box according to a public key of a data side, encryption coding of a sample set of a current leaf node of an M-1 tree, a first derivative and a second derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M, sends the business side encryption aggregation value to a first data side in the data side, determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of business side information gain sent after the business side encryption aggregation value is decrypted by the first data side, a corresponding business side splitting characteristic number and a corresponding business side splitting point number, obtains encryption coding of a sample set of two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the current leaf node, and calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. And (3) by utilizing a homomorphic encryption technology, keeping a sample set held by a data party secret for a service party, and ensuring that the service party can correctly calculate the information gain corresponding to each splitting point of each characteristic under the condition of encryption, thereby completing the establishment of the model. The model generation device can ensure that the splitting characteristics and the meaning of the data party disclosed to the service party still cannot reveal the data privacy of the data party, the gradient lifting tree model is generated in a lossless mode, and meanwhile the interpretability of the gradient lifting tree model is enhanced.

An embodiment of a fifth aspect of the present application provides a generation apparatus of a gradient-lifting tree model, which is applied to a first data party, and includes: the first receiving module is configured to receive a service party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a service party, wherein the service party encrypted aggregate value is generated by the service party according to a data party public key, an encrypted code of a sample set of a current leaf node of an M-1 th tree and the first-order derivative and the second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, and M is smaller than a preset number M; the first decryption module is configured to decrypt and calculate the encrypted aggregation value of the service party to obtain the maximum value of the information gain of the service party, and the corresponding service party splitting feature number and the corresponding service party splitting point number; the second sending module is configured to send the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number to the service party, so that the service party determines a target optimal splitting feature and a corresponding target optimal splitting point according to the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number, obtains encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting feature and the target optimal splitting point, calculates a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and calculates a gradient lifting tree model according to the target weight.

The generation device of the gradient lifting tree model in the embodiment of the application receives a business side encryption aggregation value of a first-order derivative and a second-order derivative in each data sub-box sent by a business side, the business side encryption aggregation value is generated by the business side according to a public key of the data side, an encryption code of a sample set of a current leaf node of an M-1 th tree and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, M is smaller than a preset number M, decryption calculation is carried out on the business side encryption aggregation value to obtain a maximum value of business side information gain, a corresponding business side splitting characteristic number and a corresponding business side splitting point number, the maximum value of business side information gain, the business side splitting characteristic number and the business side splitting point number are sent to the business side, and the business side determines a target splitting characteristic and a corresponding target optimum splitting point according to the maximum value of the business side information gain, the business side splitting characteristic number and the business side splitting point number And splitting points, acquiring encryption codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculating the target weight of each leaf node according to the encryption codes of the sample sets of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. By sending the encryption codes of the sample set to the business side, the data security of the sample set is guaranteed, the characteristic meaning of the data side is disclosed to the business side on the premise, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on federal learning is enhanced.

An embodiment of a sixth aspect of the present application provides an apparatus for generating a gradient-lifting tree model, which is applied to a second data side, and includes: the sixth receiving module is configured to receive an encrypted first-order derivative and an encrypted second-order derivative sent by a service party, and the encrypted first-order derivative and the encrypted second-order derivative are obtained by the service party after homomorphic encryption is performed on the first-order derivative and the second-order derivative; a fourth generation module configured to generate a data side encrypted aggregate value of the first order derivative and the second order derivative in each data bin according to a public key of a service side, a sample set of a current leaf node, the encrypted first order derivative and the encrypted second order derivative; a seventh sending module configured to send the data side encrypted aggregate value to the service side; a seventh receiving module, configured to receive a maximum value of data party information gain sent after the service party decrypts the data party encrypted aggregation value, and a corresponding data party splitting feature number and a data party splitting point number; a third determining module configured to determine the data side optimal splitting feature, the data side optimal splitting point and the data side maximum information gain according to the maximum value of the data side information gain, the data side splitting feature number and the data side splitting point number; an eighth sending module, configured to send the optimal splitting characteristic of the data party, the optimal splitting point of the data party, and the maximum information gain of the data party to the service party, so that the service party determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to the optimal splitting characteristic of the data party, the optimal splitting point of the data party, the maximum information gain of the data party, the maximum value of the information gain of the service party, the splitting characteristic number of the service party, and the splitting point number of the service party, and obtains encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculates a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and obtains a gradient lifting tree model according to the target weight calculation, the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number are obtained by the first data party through decryption calculation on the service party encryption aggregation value of the first derivative and the second derivative in each data sub-box sent by the service party.

The generation device of the gradient lifting tree model in the embodiment of the application receives an encrypted first-order derivative and an encrypted second-order derivative sent by a service party, the encrypted first-order derivative and the encrypted second-order derivative are obtained by the service party through homomorphic encryption of the first-order derivative and the second-order derivative, a data party encrypted aggregation value of the first-order derivative and the second-order derivative in each data sub-box is generated according to a public key of the service party, a sample set of a current leaf node, the encrypted first-order derivative and the encrypted second-order derivative, the data party encrypted aggregation value is sent to the service party, the maximum value of data party information gain sent after the service party decrypts the data party encrypted aggregation value is received, and the corresponding data party splitting characteristic number and data party splitting point number, and the data party splitting characteristic number, the optimal data splitting characteristic number, the second-order and the data party splitting characteristic number are determined according to the maximum value of the data party information gain, the data party splitting characteristic number and the data party splitting point number, The data side optimal splitting point and the data side maximum information gain are sent to a service side, so that the service side can determine a target optimal splitting characteristic and a corresponding target optimal splitting point according to the data side optimal splitting characteristic, the data side optimal splitting point, the data side maximum information gain, the maximum value of the service side information gain, a service side splitting characteristic number and the service side splitting point number, obtain encrypted codes of a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculate a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and obtain a gradient lifting tree model according to the target weight calculation, and calculate the maximum value of the service side information gain, The service party splitting characteristic number and the service party splitting point number are obtained by carrying out decryption calculation on a service party encryption aggregation value of a first-order derivative and a second-order derivative in each data sub-box sent by the service party by the first data party. The characteristic meaning of the data side is disclosed to the business side on the premise of hiding the sample set, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on the federal learning is enhanced.

An embodiment of a seventh aspect of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of generating a gradient-boosted tree model as described in the embodiments of the first aspect above, or to perform a method of generating a gradient-boosted tree model as described in the embodiments of the second aspect above, or to perform a method of generating a gradient-boosted tree model as described in the embodiments of the third aspect above.

An eighth aspect of the present application proposes a computer-readable storage medium storing computer instructions for causing a computer to execute the method for generating a gradient-lifting tree model according to the embodiment of the first aspect, or execute the method for generating a gradient-lifting tree model according to the embodiment of the second aspect, or execute the method for generating a gradient-lifting tree model according to the embodiment of the third aspect.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flowchart of a method for generating a gradient lifting tree model according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a method for generating a gradient spanning tree model according to another embodiment of the present application;

fig. 3 is a schematic flowchart of a method for generating a gradient spanning tree model according to another embodiment of the present application;

fig. 4 is a schematic flowchart of a method for generating a gradient spanning tree model according to another embodiment of the present application;

FIG. 5 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present disclosure;

FIG. 6 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present disclosure;

FIG. 7 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present disclosure;

FIG. 8 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present disclosure;

FIG. 9 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present application;

FIG. 10 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present application;

FIG. 11 is a flowchart illustrating a method for generating a gradient lifting tree model according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a gradient lifting tree model generation apparatus according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a gradient lifting tree model generation apparatus according to another embodiment of the present application;

fig. 14 is a schematic structural diagram of a gradient lifting tree model generation apparatus according to another embodiment of the present application;

fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application. A method, an apparatus, an electronic device, and a storage medium for generating a gradient lifting tree model according to an embodiment of the present application are described below with reference to the drawings.

Fig. 1 is a schematic flow chart of a method for generating a gradient lifting tree model according to an embodiment of the present application. The method for generating a gradient spanning tree model according to the embodiment of the present application may be executed by the apparatus for generating a gradient spanning tree model provided in the embodiment of the present application, and the apparatus for generating a gradient spanning tree model may be disposed in an electronic device of a business side (Guest). As shown in fig. 1, the method for generating a gradient spanning tree model according to the embodiment of the present application may specifically include the following steps:

s101, generating a business side encryption aggregation value of the first-order derivative and the second-order derivative in each data sub-box according to a data side public key, an encryption code of a sample set of a current leaf node of an M-1 th tree, and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M.

Specifically, in the embodiment of the application, a wind control scene is taken as an example, wind control data is used as sample data, and the generation method of the gradient lifting tree model is described based on federal learning.

Federal Learning (Federal Learning) is an emerging artificial intelligence basic technology, and the design goal of the federal Learning is to carry out efficient machine Learning among multiple parties or multiple computing nodes on the premise of guaranteeing information security during big data exchange, protecting terminal data and personal data privacy and guaranteeing legal compliance. The goal is to build a model across organizations while each organization's personal data remains in its local environment and model parameters are exchanged under cryptographic mechanisms in a federated system. According to different characteristics of data held by different participants of federal learning, the data can be classified into horizontal federal learning, longitudinal federal learning, federal transfer learning and the like. The method is characterized in that different data holders have different characteristic dimensions of the same sample (such as a certain client), which is caused by different company businesses. The longitudinal federal learning is more widely applied and developed in a wind control scene, wherein the characteristics are that the label is only held by one party (namely, a business party Guest), and other parties only have partial characteristics of data (namely, a data party Host). The business side hopes to improve the effect of the model through the cooperation with the data side, and the purpose of reducing the risk is achieved. In the process, both the business side and the data side need to ensure the data security of the own side.

In the embodiment of the application, a business party and a data party respectively generate homomorphic encryption public and private key pairs by utilizing homomorphic encryption technology, and send own public keys to the other party, for example, the business party generates homomorphic encryption business party keys and business party public keys and sends the business party public keys to each data party, and the data party generates homomorphic encryption data party keys and data party public keys and sends the data party public keys to the business party for information transmission among different participating parties, thereby ensuring the security of data of each party. It should be noted that the public key of the data party is generated by a first data party (Host1), the first data party may be any one of the data parties, the data parties include a first data party and a second data party, and the second data party is another data party of the data parties except the first data party.

It should be noted that, before the gradient lifting tree model is established, sample data needs to be processed, that is, encrypted samples are aligned, since it is necessary for each participant to correspond features belonging to the same piece of data when the model is trained, in order to protect data privacy and security, data in the sample Set is aligned by using a privacy Protection Set Interaction (PSI), common users of both participants are confirmed without disclosing respective data of each participant, and users that do not overlap with each other are not exposed. E.g. PSI aligned service side data

And the label y ∈ RⁿAnd P data cube data X¹,…,X^PThe p-th participant data dimension is d^p. The set of whole samples is X ═ X⁰,X¹,…,X^P]∈R^n×d，

For n winds of p-th participantEach characteristic dimension k of the control data is 1, …, d^pAnd performing binning to obtain L quantile points S of the characteristics_k＝{s_k1,…,s_kLAs a threshold candidate to be split. The business side obtains the predicted value after each piece of wind control data iterates for m-1 times through calculation

First derivative of

And second derivative

i is 1, …, n. Carrying out homomorphic encryption on g and h according to the public key of the service party to obtain an encrypted first derivative

And encrypting the second derivative

To each of the data parties, wherein,

indicating that the encryption operation is performed using the public key of the business party. For each leaf node of the current mth tree, each participant p is required to determine whether it should split, how it should split. Participant p according to L thresholds S_kAll data samples were divided into L-1 bin intervals.

In the embodiment of the present application, if the p-th party is a service party, the service party encrypts and encodes the sample set I of the current leaf node of the m-1 th tree known by the own party according to the public key of the data party sent by the data party, specifically, the first data party

The predicted value of each piece of wind control data calculated by the own party after iteration for m-1 times

First derivative g of (initially 0)_iAnd second derivative h_iGenerating business side encrypted aggregate values for the first and second derivatives within each data bin

And

wherein M is less than the preset number M. Wherein the first derivative and the second derivative of the service side encrypt the aggregate value

And

the calculation method of (2) is as follows:

wherein, g_iIterating each piece of wind control data for m-1 times to obtain a predicted value

First derivative of, h_iIterating each piece of wind control data for m-1 times to obtain a predicted value

N is the number of wind control data of the p-th party of the current participant, pi is the code of the sample set I, pi^binFor a set of samples I belonging to the current bin_binIs coded, i.e.

And represents that the ith sample is in sample set I and belongs to the current data bin.

If the p-th party is a data party, the data party sends a public key of the service party, a sample set I of the current leaf node of the m-1 th tree known by the data party and an encrypted first-order derivative sent by the service party

And encrypting the second derivative

Generating a data-side encrypted aggregate value of the first and second derivatives within each data bin

And

and encrypts the data side with the aggregated value

And

and sending the data to a service party for decryption. Wherein the data side of the first and second derivatives encrypts the aggregate value

And

the calculation method comprises the following steps:

where π is the code of the sample set I, π^binBinning sample sets I for current data_binIs coded, i.e.

In the embodiment of the application, the data side encrypts the sample data to generate the encrypted code of the sample set

By utilizing the homomorphic encryption technology, the business side can still complete the construction of the model under the condition that the plaintext of the sample set of each node of the data side is not known, the security is higher, and the risk of data leakage of the data side is avoided.

The data side encrypts the sample data to generate an encrypted code of the sample set, which may be to encrypt the entire sample data, so that the overhead of sending the encrypted sample set to the service side by the data side is n ciphertexts. In practical application, according to the security level, the communication overhead can be reduced by using a form of confusion of partial sample data instead of confusion of the whole sample set.

S102, the service party encrypted aggregation value is sent to a first data party in the data parties.

Specifically, the service side encrypts the service side encrypted aggregate value generated in step S101

And

and sending the data to a first data side in the data sides for decryption.

S103, determining the target optimal splitting characteristic and the corresponding target optimal splitting point according to the maximum value of the business side information gain sent after the first data side decrypts the business side encrypted aggregation value, the corresponding business side splitting characteristic number and the business side splitting point number.

Specifically, after the first data party receives the service party encrypted aggregation value sent by the service party through step S102, decrypting the encrypted aggregate value of the service party, calculating the information gain corresponding to each splitting point (each split) of each characteristic of the service party, further obtain the maximum value of the information gain, i.e. the maximum value of the service side information gain, and the service side split characteristic number and the service side split point number corresponding to the maximum value of the service side information gain, and the maximum value of the service side information gain, and the corresponding service party splitting feature number and the corresponding service party splitting point number are sent to the service party, and the service party receives the maximum value of the information gain of the own party feature sent by the first data party, the corresponding splitting feature number and the corresponding splitting point number, namely the maximum value of the service party information gain, and the corresponding service party splitting feature number and the corresponding service party splitting point number.

Meanwhile, the service side calculates the information gain of each splitting of each characteristic of the data side according to the data side encryption aggregation value of the first-order derivative and the second-order derivative in each data sub-box of the data side, and further obtains the maximum value of the information gain, namely the maximum value of the data side information gain, the data side splitting characteristic number and the data side splitting point number corresponding to the maximum value of the data side information gain, and sends the maximum value of the data side information gain, the corresponding data side splitting characteristic number and the corresponding data side splitting point number to the corresponding data side. And the service party compares the maximum value of the information gain of the service party with the maximum value of the information gain of the data party to obtain a larger value of the two, if the larger value is smaller than a threshold value gamma, the current node is not split, otherwise, the splitting characteristic corresponding to the larger value is determined as the target optimal splitting characteristic, and the corresponding splitting point is determined as the target optimal splitting point.

And S104, acquiring the encryption codes of the sample sets of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point.

Specifically, the target splitting characteristic and the target optimal splitting point obtained in step S103 may belong to two cases, namely, a service party and a data party. And the service party or the data party splits the current leaf node into two new leaf nodes according to the target optimal splitting characteristic and the target optimal splitting point, obtains the encrypted codes of the sample sets of the two leaf nodes obtained after splitting, and sends the encrypted codes of the sample sets of the two leaf nodes to the other party. If the target optimal splitting characteristic belongs to the data side, the data side needs to send the encrypted codes of the sample sets of the two leaf nodes obtained after splitting to the service side, and if the target optimal splitting characteristic belongs to the service side, the service side needs to send the encrypted codes of the sample sets of the two leaf nodes obtained after splitting to the data side. The above process is performed cyclically for each leaf node of the mth tree until all leaf nodes can not be split any more or the depth of the tree reaches the set maximum depth, so that the service side and the data side can acquire the encrypted codes of the sample set of each leaf node of the mth tree.

And S105, calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node.

Specifically, the service side calculates the target weight of each leaf node according to the encrypted codes of the sample set of leaf nodes obtained in step S104 and the public key of the data side.

And S106, calculating according to the target weight to obtain a gradient lifting tree model.

Specifically, a gradient lifting tree model is calculated according to the target weight obtained in step S105.

In the method for generating a gradient spanning tree model according to the embodiment of the application, a service side generates a service side encryption aggregation value of a first derivative and a second derivative in each data sub-box according to a data side public key, encryption coding of a sample set of a current leaf node of an M-1 tree, a first derivative and a second derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M, sends the service side encryption aggregation value to a first data side in the data side, determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of service side information gain sent after the first data side decrypts the service side encryption aggregation value, a corresponding service side splitting characteristic number and a corresponding service side splitting point number, obtains the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, and obtains encryption coding of the sample set of two leaf nodes after splitting, and calculating the target weight of each leaf node according to the encryption codes of the sample set of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. And (3) by utilizing a homomorphic encryption technology, keeping a sample set held by a data party secret for a service party, and ensuring that the service party can correctly calculate the information gain corresponding to each splitting point of each characteristic under the condition of encryption, thereby completing the establishment of the model. The model generation method can ensure that the splitting characteristics and the meaning of the data party disclosed to the service party still cannot reveal the data privacy of the data party, the gradient lifting tree model can be generated in a lossless mode, and meanwhile the interpretability of the gradient lifting tree model is enhanced.

Fig. 2 is a schematic flow chart of a method for generating a gradient lifting tree model according to another embodiment of the present application. As shown in fig. 2, on the basis of the embodiment shown in fig. 1, the method for generating a gradient-lifting tree model in the embodiment of the present application specifically includes the following steps:

s201, generating a business side encryption aggregation value of the first-order derivative and the second-order derivative in each data sub-box according to the public key of the data side, the encryption code of the sample set of the current leaf node of the (M-1) th tree, and the first-order derivative and the second-order derivative of the predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M.

S202, the service party encryption aggregation value is sent to a first data party in the data parties.

Specifically, steps S201 to S202 are similar to steps S101 to S102 in the above embodiment, and are not described again here.

The step S103 in the above embodiment may specifically include the following steps S203 to S207.

S203, according to the maximum value of the business side information gain, the business side splitting feature number and the business side splitting point number, the business side optimal splitting feature, the business side optimal splitting point and the corresponding business side maximum information gain are determined.

Specifically, the service side determines the maximum value of the service side information gain sent by the first data side as the maximum information gain of the service side, the service side determines the service side splitting characteristic corresponding to the service side splitting characteristic number sent by the first data side as the optimal splitting characteristic of the service side, and the service side determines the service side splitting point corresponding to the service side splitting point number sent by the first data side as the optimal splitting point of the service side.

And S204, acquiring the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the corresponding maximum information gain of the data side.

Specifically, the service side determines the maximum value of the data side information gain obtained by the service side through calculation as the maximum information gain of the data side, the service side determines the data side splitting characteristic corresponding to the obtained data side splitting characteristic number as the optimal data side splitting characteristic, and the service side determines the data side splitting point corresponding to the obtained data side splitting point number as the optimal data side splitting point.

S205, determine the larger value of the maximum information gain of the service side and the maximum information gain of the data side as the target maximum information gain.

Specifically, the service side compares the maximum information gain of the service side obtained in step S203 with the maximum information gain of the data side obtained in step S204, and determines the larger value of the two as the target maximum information gain.

And S206, determining the optimal splitting characteristic corresponding to the maximum information gain of the target as the optimal splitting characteristic of the target.

Specifically, the service side determines the optimal splitting characteristic corresponding to the target maximum information gain determined in step S205 as the target optimal splitting characteristic.

And S207, determining the optimal splitting point corresponding to the target maximum information gain as the target optimal splitting point.

Specifically, the service side determines the optimal splitting point corresponding to the target maximum information gain determined in step S205 as the target optimal splitting point.

And S208, acquiring the encryption codes of the sample sets of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point.

S209, calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node.

And S210, calculating according to the target weight to obtain a gradient lifting tree model.

Specifically, steps S208 to S210 are similar to steps S104 to S106 in the above embodiment, and the detailed process is not described here again.

In the method for generating a gradient spanning tree model according to the embodiment of the application, a service side generates a service side encryption aggregation value of a first derivative and a second derivative in each data sub-box according to a data side public key, encryption coding of a sample set of a current leaf node of an M-1 tree, a first derivative and a second derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M, sends the service side encryption aggregation value to a first data side in the data side, determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of service side information gain sent after the first data side decrypts the service side encryption aggregation value, a corresponding service side splitting characteristic number and a corresponding service side splitting point number, obtains the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, and obtains encryption coding of the sample set of two leaf nodes after splitting, and calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. And (3) by utilizing a homomorphic encryption technology, keeping a sample set held by a data party secret for a service party, and ensuring that the service party can correctly calculate the information gain corresponding to each splitting point of each characteristic under the condition of encryption, thereby completing the establishment of the model. The model generation method can ensure that the splitting characteristics and the meaning of the data party disclosed to the service party still cannot reveal the data privacy of the data party, the gradient lifting tree model can be generated in a lossless mode, and meanwhile the interpretability of the gradient lifting tree model is enhanced.

As a possible implementation manner, on the basis of the foregoing embodiment, as shown in fig. 3, the step S204 of "obtaining the optimal splitting characteristic of the data side, the optimal splitting point of the data side, and the corresponding maximum information gain of the data side" may specifically include the following steps:

s301, after homomorphic encryption is carried out on the first derivative and the second derivative, an encrypted first derivative and an encrypted second derivative are obtained.

Specifically, the service side calculates the predicted value of each piece of wind control data after iteration for m-1 times

First derivative of

And second derivative

For the first derivative g_iAnd second derivative h_iPerforming homomorphic encryption to obtain encrypted first derivative

And second order encryption

Wherein the content of the first and second substances,

the encryption operation is performed by using a public key of a homomorphic encryption system generated by a service party, namely, the public key of the service party.

And S302, sending the encrypted first-order derivative and the encrypted second-order derivative to a data side.

Specifically, the service side encrypts the first derivative obtained in step S301

And encrypting the second derivative

And sending the data to each data party.

And S303, receiving a data party encryption aggregation value of a first-order derivative and a second-order derivative in each data sub-box sent by a data party, wherein the data party encryption aggregation value is generated by the data party according to a service party public key, a sample set of a current leaf node, an encryption first-order derivative and an encryption second-order derivative.

Specifically, the service side receives the data side encryption aggregation value of the first derivative and the second derivative in each data sub-box sent by the data side

And

the data side calculates to obtain the data side encryption aggregation value

And

refer to the description of the embodiment shown in fig. 1, and are not repeated herein.

S304, carrying out decryption calculation on the encrypted aggregation value of the data side to obtain the maximum value of the information gain of the data side, and the corresponding data side splitting characteristic number and the data side splitting point number.

Specifically, the service side performs decryption calculation on the data side encryption aggregation value received in step S304 to obtain the maximum value of the data side information gain, and the corresponding data side split feature number and data side split point number.

S305, the maximum value of the data side information gain, the corresponding data side splitting characteristic number and the data side splitting point number are sent to the data side.

Specifically, the service side sends the maximum value of the data side information gain obtained in step S304, and the corresponding data side split feature number and data side split feature number to the data side.

And S306, receiving the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side sent by the data side, wherein the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side are determined by the data side according to the maximum value of the information gain of the data side, the splitting characteristic number of the data side and the splitting point number of the data side.

Specifically, the data side determines the maximum value of the data side information gain sent by the service side as the maximum information gain of the data side, the data side determines the data side splitting characteristic corresponding to the data side splitting characteristic number sent by the service side as the optimal splitting characteristic of the data side, the data side determines the data side splitting point corresponding to the data side splitting point number sent by the service side as the optimal splitting point of the data side, and sends the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side to the service side. And the service party receives the optimal splitting characteristic of the data party, the optimal splitting point of the data party and the maximum gain of the data party, which are sent by the data party.

As a feasible implementation manner, on the basis of the foregoing embodiment, in the step S208, "obtaining the encryption code of the sample set of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point" may specifically include two cases that the following target optimal splitting characteristic belongs to a service party or a data party:

in the first case, if the target optimal splitting characteristic belongs to the service party, calculating a sample set of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, encrypting the sample set of the two leaf nodes by using a data party public key to obtain an encrypted code of the sample set of the two leaf nodes, and sending the encrypted code of the sample set of the two leaf nodes to the data party.

Specifically, if the target optimal splitting characteristic belongs to the service side, the service side needs to calculate a sample set of two leaf nodes obtained after splitting, where the two leaf nodes are obtained after the current leaf node is split based on the target optimal splitting characteristic and the target optimal splitting point, and in addition, in the sample set of the two leaf nodes, the left side is a sample set whose corresponding characteristic is less than or equal to a threshold, and the right side is a sample set whose corresponding characteristic is greater than the threshold.

And the service party encrypts the sample sets of the two leaf nodes by using the public key of the data party to obtain the encrypted codes of the sample sets of the two leaf nodes, and sends the encrypted codes to the data party. Taking the left leaf node as an example, its samplesThe calculation method of the set is I_L＝I∩I′_LWherein l'_LAnd representing the sample sets of which the corresponding features are less than or equal to the threshold value in all the sample sets. Since the target optimal splitting characteristic is owned by the business side, the business side can directly obtain I'_LOf encoded plaintext pi'_LBy using

Calculating the code pi of the sample set of the left leaf node_LOf the encrypted value, wherein the symbol

Representing multiplication by elements, and, likewise, being obtained by calculation

The data side receiving the transmission of the service side

And

obtaining respective sample sets I of two corresponding leaf nodes through decryption_LAnd I_R。

In the second case, the optimal target splitting characteristic belongs to the data side, and then the encrypted codes of the sample sets of the two leaf nodes sent by the data side are received, the encrypted codes of the sample sets of the two leaf nodes calculate the current leaf node based on the optimal target splitting characteristic and the optimal target splitting point for the data side, the sample sets of the two leaf nodes are obtained after splitting, and the sample sets of the two leaf nodes are encrypted by adopting a public key of the data side.

Specifically, if the target optimal splitting characteristic belongs to a data side, the data side calculates to obtain a sample set of two leaf nodes obtained after splitting, wherein the two leaf nodes obtained after splitting are the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, the two leaf nodes obtained after splitting are encrypted by the data side through a data side public key and are sent to a service side, and the service side receives encrypted codes of the sample set of the two leaf nodes.

For example, the data side computes a sample set of two leaf nodes resulting after splitting, left side I_LIs a sample set with corresponding features less than or equal to a threshold value in the sample set I, and the right side I_RFor the sample set with the corresponding characteristics larger than the threshold value in the I, obtaining an encrypted code after encrypting by using a public key of a data party

And

and sent to the service party.

As a possible implementation manner, on the basis of the foregoing embodiment, as shown in fig. 4, the step S209 of "calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node" may specifically include the following steps:

s401, generating a data side encryption aggregation value of each leaf node according to the encryption code, the data side public key, the first derivative and the second derivative of the sample set of each leaf node.

Specifically, the service side is based on the encryption coding of the sample set of each leaf node j

Public key of data side, first derivative g_iAnd second derivative h_iI-1, … …, n, generating a data side encrypted aggregate value for each leaf node

And

the calculation method is as follows:

wherein pi_i(j) Is the ith element of the vector pi (j), and pi (j) is the code of the sample set of the jth leaf node.

S402, the data side encryption aggregation value is sent to the first data side.

Specifically, the service side sends the data side encryption aggregation value obtained in step S401 to the first data side.

And S403, receiving the data party decryption aggregation value and the corresponding number which are sent after the first data party decrypts the data party encryption aggregation value.

Specifically, the first data party receives the data party encrypted aggregation value sent by the service party in step S402, decrypts the data party encrypted aggregation value to obtain a data party decrypted aggregation value and a corresponding number, and sends the data party decrypted aggregation value and the corresponding number to the service party, and the service party receives the data party decrypted aggregation value and the corresponding number sent by the first data party.

S404, calculating to obtain the target weight according to the decryption aggregation value and the corresponding number of the data side.

Specifically, the service side calculates the optimal weight of the leaf node, i.e. the target weight, according to the decryption aggregation value and the corresponding number of the data side received in step S403

The calculation method is as follows:

it should be noted here that, for each leaf node of the mth tree, each participant needs to determine whether or not the leaf node needs to be split, how to split, until all leaf nodes can not be split any more or until the tree is not split any moreThe service side calculates the target weight of each leaf node, and constructs the mth tree according to the target weight. To protect the security of the traffic label, the plain text of the weights of the leaf nodes are all at the traffic, so the first and second derivatives g_iAnd h_iOnly the business party can compute.

As a possible implementation manner, on the basis of the above embodiment, as shown in fig. 5, the step S210 of "calculating a gradient lifting tree model according to the target weight" may specifically include the following steps:

s501, constructing the mth tree according to the target weight.

Specifically, the parameters of the mth tree are updated according to the target weight, and the construction of the mth tree is completed.

S502, updating the predicted value of the mth sub-model to each piece of wind control data, wherein the mth sub-model comprises the 1 st tree to the mth tree.

Specifically, the predicted value of the mth sub-model established currently to each piece of wind control data is updated

Wherein the mth submodel f_mIncluding the 1 st tree through the m-th tree.

M sub-model f_mThe established goal of (1) is to minimize the Loss function Loss:

wherein the content of the first and second substances,

for the output of a model consisting of m-1 random forests that has been constructed previously,

is a vector

Is, { x_iI 1, …, n is a training data set,

as a loss function, e.g. as a Mean Square Error (MSE) loss function

For fast calculation a second order approximation of the loss function can be used, i.e.

Each node split of each tree is determined by the information Gain calculated from all split points of all features:

wherein G is_LAnd G_RAggregating values for the gradient of the left and right nodes after splitting, i.e.

And S503, calculating according to the M submodels to obtain a gradient lifting tree model.

Specifically, the gradient lifting tree model is obtained by calculation according to the updated M submodels, and the calculation method is as follows:

Fig. 6 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present application. The method for generating a gradient lifting tree model according to the embodiment of the present application may be executed by the apparatus for generating a gradient lifting tree model provided in the embodiment of the present application, and the apparatus for generating a gradient lifting tree model may be disposed in the electronic device of the first data party. As shown in fig. 6, the method for generating a gradient spanning tree model according to the embodiment of the present application may specifically include the following steps:

s601, receiving a service party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a service party, wherein the service party encrypted aggregate value is generated by the service party according to a public key of the data party, an encrypted code of a sample set of a current leaf node of an M-1 th tree, and the first-order derivative and the second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, and M is smaller than a preset number M.

S602, carrying out decryption calculation on the encrypted aggregation value of the service party to obtain the maximum value of the information gain of the service party, and the corresponding service party splitting characteristic number and the corresponding service party splitting point number.

S603, the maximum value of the information gain of the service party, the splitting feature number of the service party and the splitting point number of the service party are sent to the service party, so that the service party can determine the target optimal splitting feature and the corresponding target optimal splitting point according to the maximum value of the information gain of the service party, the splitting feature number of the service party and the splitting point number of the service party, obtain the encryption codes of the sample sets of the two leaf nodes obtained after splitting based on the target optimal splitting feature and the target optimal splitting point of the current leaf node, calculate the target weight of each leaf node according to the encryption codes of the sample set of each leaf node, and calculate and obtain the gradient lifting tree model according to the target weights.

It should be noted here that the above explanation of the embodiment of the gradient lifting tree model generation method is also applicable to the gradient lifting tree model generation method in the embodiment of the present application, and the specific process is not described here again.

The method for generating a gradient lifting tree model according to the embodiment of the application includes that a first data party receives a business party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a business party, the business party encrypted aggregate value is generated by the business party according to a public key of the data party, encrypted codes of a sample set of current leaf nodes of M-1 trees and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, M is smaller than a preset number M, decryption calculation is conducted on the business party encrypted aggregate value to obtain a maximum value of business party information gain, a corresponding business party splitting characteristic number and a corresponding business party splitting point number, and the maximum value of the business party information gain, the business party splitting characteristic number and the business party splitting point number are sent to the business party for the business party to send the maximum value of the business party information gain, the business party encrypted aggregate value and the second-order derivative to the business party according to the maximum value of the business party information gain, Determining a target optimal splitting characteristic and a corresponding target optimal splitting point by the service party splitting characteristic number and the service party splitting point number, acquiring encryption codes of a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculating a target weight of each leaf node according to the encryption codes of the sample set of each leaf node, and calculating according to the target weight to obtain a gradient lifting tree model. By sending the encryption codes of the sample set to the business side, the data security of the sample set is guaranteed, the characteristic meaning of the data side is disclosed to the business side on the premise, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on federal learning is enhanced.

As a possible implementation manner, on the basis of the foregoing embodiment, as shown in fig. 7, the method for generating a gradient lifting tree model provided in this embodiment of the present application may further include the following steps:

s701, generating a data party public key.

Specifically, the first data party generates a data party public key by using a homomorphic encryption technology.

S702, the public key of the data party is sent to the service party.

Specifically, the first data party sends the data party public key to the service party, and the first data party public key can be used for encrypted transmission of intermediate data in the model generation process, so that the security of the intermediate value is protected, and data leakage is prevented. In the embodiment of the present application,

for cryptographic operations with the public key of a homomorphic cryptographic system generated by the data side, the same applies

The method represents that the public key of a homomorphic encryption system generated by a service party is used for encryption operation, and homomorphic encryption technology can perform addition and multiplication operation based on ciphertext, so that the service party can still correctly calculate the information gain corresponding to each splitting point of each feature even if a sample set under a node held by the data party keeps the secrecy of the service party。

As a possible implementation manner, on the basis of the foregoing embodiment, as shown in fig. 8, the method for generating a gradient lifting tree model provided in this embodiment of the present application may further include the following steps:

s801, receiving the encrypted first derivative and the encrypted second derivative sent by the service party, wherein the encrypted first derivative and the encrypted second derivative are obtained by the service party through homomorphic encryption of the first derivative and the second derivative.

S802, generating a data side encryption aggregation value of the first derivative and the second derivative in each data sub-box according to the public key of the service side, the sample set of the current leaf node, the encryption first derivative and the encryption second derivative.

Specifically, the first data party encrypts the first derivative according to the public key of the service party, the sample set I of the current leaf node

And encrypting the second derivative

And generating a data side encryption aggregation value of the first derivative and the second derivative in each data sub-box, wherein the calculation method comprises the following steps:

And S803, sending the encrypted aggregation value of the data side to the service side.

S804, receiving the maximum value of the data side information gain sent by the service side after decrypting the data side encrypted aggregation value, and the corresponding data side split characteristic number and the data side split point number.

And S805, determining the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side according to the maximum value of the information gain of the data side, the splitting characteristic number of the data side and the splitting point number of the data side.

And S806, sending the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side to the service side.

As a possible implementation manner, on the basis of the foregoing embodiment, the method for generating a gradient-enhanced tree model provided in this embodiment of the present application may further include the following steps: if the target optimal splitting characteristic belongs to a service party, receiving encrypted codes of sample sets of two leaf nodes sent by the service party, calculating the target optimal splitting characteristic and the target optimal splitting point of the current leaf node for the service party by the encrypted codes of the sample sets of the two leaf nodes, obtaining the sample sets of the two leaf nodes after splitting, and encrypting the sample sets of the two leaf nodes by adopting a data party public key; and if the target optimal splitting characteristic belongs to a data party, calculating a sample set of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, encrypting the sample set of the two leaf nodes by adopting a public key of the data party to obtain encrypted codes of the sample set of the two leaf nodes, and sending the encrypted codes of the sample set of the two leaf nodes to a service party.

As a possible implementation manner, on the basis of the foregoing embodiment, as shown in fig. 9, the method for generating a gradient lifting tree model provided in this embodiment of the present application may further include the following steps:

and S901, receiving a data party encrypted aggregate value of each leaf node sent by a service party, wherein the data party encrypted aggregate value is generated by the service party according to the encrypted code of the sample set of each leaf node, a data party public key, a first derivative and a second derivative.

And S902, carrying out decryption calculation on the encrypted aggregation value of the data side to obtain the decrypted aggregation value of the data side and a corresponding serial number.

And S903, sending the decryption aggregation value of the data side and the corresponding number to the service side so that the service side can calculate to obtain the target weight according to the decryption aggregation value of the data side and the corresponding number.

Fig. 10 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present application. The method for generating a gradient lifting tree model according to the embodiment of the present application may be executed by the apparatus for generating a gradient lifting tree model provided in the embodiment of the present application, and the apparatus for generating a gradient lifting tree model may be disposed in the electronic device of the second data party. As shown in fig. 10, the method for generating a gradient spanning tree model according to the embodiment of the present application may specifically include the following steps:

s1001, receiving an encrypted first derivative and an encrypted second derivative sent by a service party, wherein the encrypted first derivative and the encrypted second derivative are obtained by the service party through homomorphic encryption of the first derivative and the second derivative.

And S1002, generating a data side encryption aggregation value of the first derivative and the second derivative in each data sub-box according to the public key of the service side, the sample set of the current leaf node, the encryption first derivative and the encryption second derivative.

And S1003, sending the encrypted aggregation value of the data side to the service side.

And S1004, receiving the maximum value of the data side information gain sent after the service side decrypts the data side encrypted aggregation value, and the corresponding data side split characteristic number and the data side split point number.

S1005, determining the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side according to the maximum value of the information gain of the data side, the splitting characteristic number of the data side and the splitting point number of the data side.

S1006, the data side optimal splitting feature, the data side optimal splitting point and the data side maximum information gain are sent to the service side, so that the service side can determine the target optimal splitting feature and the corresponding target optimal splitting point according to the data side optimal splitting feature, the data side optimal splitting point, the data side maximum information gain, the maximum value of the service side information gain, the service side splitting feature number and the service side splitting point number, obtain the encryption codes of the sample sets of the two leaf nodes obtained after the splitting of the current leaf node based on the target optimal splitting feature and the target optimal splitting point, calculate the target weight of each leaf node according to the encryption codes of the sample sets of each leaf node, calculate the gradient lifting tree model according to the target weight, and obtain the maximum value of the service side information gain, the service side splitting feature number and the service side splitting point number, and the first data side sends the first data side to the service side, and the first and second derivatives of each data split box in the service side by the first data side And the service party encrypts the aggregation value to obtain the encryption aggregation value through decryption calculation.

As a possible implementation manner, on the basis of the foregoing embodiment, the method for generating a gradient lifting tree according to the embodiment of the present application may further include the following steps: if the target optimal splitting characteristic belongs to a service party, receiving encrypted codes of sample sets of two leaf nodes sent by the service party, calculating the target optimal splitting characteristic and the target optimal splitting point of the current leaf node for the service party by the encrypted codes of the sample sets of the two leaf nodes, obtaining the sample sets of the two leaf nodes after splitting, and encrypting the sample sets of the two leaf nodes by adopting a data party public key; and if the target optimal splitting characteristic belongs to a data party, calculating a sample set of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, encrypting the sample set of the two leaf nodes by adopting a public key of the data party to obtain encrypted codes of the sample set of the two leaf nodes, and sending the encrypted codes of the sample set of the two leaf nodes to a service party.

According to the generation method of the gradient lifting tree model, a second data party receives an encrypted first-order derivative and an encrypted second-order derivative which are sent by a service party, the encrypted first-order derivative and the encrypted second-order derivative are obtained by the service party through homomorphic encryption of the first-order derivative and the second-order derivative, a data party encrypted aggregation value of the first-order derivative and the second-order derivative in each data sub-box is generated according to a public key of the service party, a sample set of a current leaf node, the encrypted first-order derivative and the encrypted second-order derivative, the data party encrypted aggregation value is sent to the service party, the maximum value of data party information gain sent after the service party decrypts the encrypted aggregation value of the data party is received, and the corresponding data party splitting characteristic number and data party splitting point number, and the optimal splitting characteristic number of the data party is determined according to the maximum value of the data party information gain, the data party splitting characteristic number and the data party splitting point number, The data side optimal splitting point and the data side maximum information gain are sent to a service side, so that the service side can determine a target optimal splitting characteristic and a corresponding target optimal splitting point according to the data side optimal splitting characteristic, the data side optimal splitting point, the data side maximum information gain, the maximum value of the service side information gain, a service side splitting characteristic number and the service side splitting point number, obtain encrypted codes of a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculate a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and obtain a gradient lifting tree model according to the target weight calculation, and calculate the maximum value of the service side information gain, The service party splitting characteristic number and the service party splitting point number are obtained by carrying out decryption calculation on a service party encryption aggregation value of a first-order derivative and a second-order derivative in each data sub-box sent by the service party by the first data party. The characteristic meaning of the data side is disclosed to the business side on the premise of hiding the sample set, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on the federal learning is enhanced.

For clarity of the generation method of the gradient lifting tree model in the embodiment of the present application, the following description is made in detail with reference to fig. 11. As shown in fig. 11, the method for generating a gradient spanning tree model according to the embodiment of the present application may specifically include the following steps:

s1101, aligning the sample sets of the service side and the data side by using the PSI technique.

And S1102, the service party generates a service party key and a public key which are encrypted in a homomorphic way, the service party public key is sent to the data party, the first data party generates a data party key and a public key, and the data party public key is sent to the service party.

S1103, the service side performs binning on each feature dimension k of each piece of wind control data in the sample set to obtain L quantiles of the features, and the L quantiles serve as threshold candidates to be split.

And S1104, the business side calculates a first derivative and a second derivative of the predicted value after each piece of wind control data iterates for m-1 times, and transmits the first derivative and the second derivative to each data side after homomorphic encryption.

Each participant executes the following steps to judge the splitting condition of each leaf node of the current tree, and the specific process is as follows:

s1105, according to L threshold values S_kAll sample sets were divided into L-1 bin intervals.

If the current participant is the business party, executing the steps S1106-S1109, and if the current participant is the data party, executing the steps S1110-S1114.

And S1106, generating a business side encryption aggregation value of the first derivative and the second derivative in each data sub-box according to the public key of the data side, the encryption code of the sample set of the current leaf node of the (M-1) th tree, and the first derivative and the second derivative of the predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M.

S1107, send the service party encrypted aggregated value to the first data party of the data parties.

S1108, the first data side decrypts the encrypted aggregation value of the service side, calculates and sends the maximum value of the information gain of the service side, the corresponding service side split characteristic number and the corresponding service side split point number to the service side.

S1109, the service side determines the target optimal splitting characteristic and the corresponding target optimal splitting point according to the maximum value of the service side information gain, the corresponding service side splitting characteristic number and the corresponding service side splitting point number.

And S1110, the data side generates data side encryption aggregation values of the first-order derivatives and the second-order derivatives in each data sub-box according to the public key of the service side, the sample set of the current leaf node, the encryption first-order derivatives and the encryption second-order derivatives.

And S1111, sending the encrypted aggregation value of the data side to the service side.

S1112, the service side decrypts the encrypted aggregation value of the data side, calculates and sends the maximum value of the data side information gain, and the corresponding data side split feature number and data side split point number to the data side.

And S1113, determining the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side by the data side according to the maximum value of the information gain of the data side, the splitting characteristic number of the data side and the splitting point number of the data side.

S1114, sending the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side to the service side.

S1115, comparing the maximum information gain of the service side with the maximum information gain of the data side, and determining that the larger value of the two is the target maximum information gain, and the corresponding characteristic is the target optimal splitting characteristic.

And S1116, if the optimal splitting characteristic belongs to the service party, the service party calculates the encrypted value of the sample set of the two leaf nodes obtained after splitting, and sends the encrypted value to each data party.

And S1117, if the optimal splitting characteristic belongs to the data side, the data side calculates the sample sets of the two leaf nodes obtained after splitting, encrypts and sends the sample sets to the service side.

S1118, whether all leaf nodes are split or not or whether the depth of the tree reaches the set maximum depth is judged. If so, go to step S1119, otherwise, go back to step S1105.

S1119, calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node.

And S1120, calculating according to the target weight to obtain a gradient lifting tree model.

In order to implement the foregoing embodiment, an apparatus for generating a gradient lifting tree model is further provided in the embodiment of the present application. Fig. 12 is a schematic structural diagram of a gradient lifting tree model generation apparatus according to an embodiment of the present application. As shown in fig. 12, the apparatus 1200 for generating a gradient spanning tree model according to the embodiment of the present application may specifically include: a first generation module 1201, a first sending module 1202, a first determination module 1203, an obtaining module 1204, a first calculation module 1205, and a second calculation module 1206.

The first generation module 1201 is configured to generate a business side encrypted aggregate value of the first order derivative and the second order derivative in each data bin according to a data side public key, encrypted codes of a sample set of a current leaf node of an M-1 th tree, and a first order derivative and a second order derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M.

A first sending module 1202 configured to send the traffic side encrypted aggregated value to a first one of the data sides.

The first determining module 1203 is configured to determine the target optimal splitting characteristic and the corresponding target optimal splitting point according to the maximum value of the business side information gain sent after the first data side decrypts the business side encrypted aggregation value, and the corresponding business side splitting characteristic number and the business side splitting point number.

An obtaining module 1204 is configured to obtain the encrypted codes of the sample sets of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point.

A first calculating module 1205 configured to calculate the target weight of each leaf node according to the cryptographic coding of the sample set of each leaf node.

A second calculation module 1206 configured to calculate a gradient lifting tree model according to the target weight.

In one embodiment of the present application, the public key of the data party is generated by the first data party.

In an embodiment of the present application, the first determining module 1203 may specifically include: the first determining unit is configured to determine the optimal splitting characteristic of the service party, the optimal splitting point of the service party and the corresponding maximum information gain of the service party according to the maximum value of the information gain of the service party, the splitting characteristic number of the service party and the splitting point number of the service party; an acquisition unit configured to acquire a data side optimal splitting characteristic, a data side optimal splitting point, and a corresponding data side maximum information gain; a second determination unit configured to determine a larger value of the traffic side maximum information gain and the data side maximum information gain as a target maximum information gain; the third determining unit is configured to determine the optimal splitting characteristic corresponding to the target maximum information gain as a target optimal splitting characteristic; and the fourth determining unit is configured to determine the optimal splitting point corresponding to the target maximum information gain as the target optimal splitting point.

In an embodiment of the present application, the obtaining unit may specifically include: the encryption subunit is configured to perform homomorphic encryption on the first derivative and the second derivative to obtain an encrypted first derivative and an encrypted second derivative; a first transmitting subunit configured to transmit the encrypted first-order derivative and the encrypted second-order derivative to a data side; the first receiving subunit is configured to receive a data party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a data party, wherein the data party encrypted aggregate value is generated by the data party according to a service party public key, a sample set of a current leaf node, an encrypted first-order derivative and an encrypted second-order derivative; the decryption subunit is configured to perform decryption calculation on the data side encrypted aggregation value to obtain a maximum value of data side information gain, and a corresponding data side split feature number and a corresponding data side split point number; a second transmitting subunit configured to transmit the maximum value of the data side information gain, and the corresponding data side split feature number and data side split point number to the data side; and the second receiving subunit is configured to receive the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side, which are sent by the data side, wherein the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side are determined by the data side according to the maximum value of the information gain of the data side, the splitting characteristic number of the data side and the splitting point number of the data side.

In an embodiment of the present application, the obtaining module 1204 may be specifically configured to: the first execution unit is configured to calculate a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and a target optimal splitting point if the target optimal splitting characteristic belongs to a service party, encrypt the sample set of the two leaf nodes by adopting a data party public key to obtain encrypted codes of the sample set of the two leaf nodes, and send the encrypted codes of the sample set of the two leaf nodes to the data party; and the second execution unit is configured to receive the encrypted codes of the sample sets of the two leaf nodes sent by the data side if the target optimal splitting characteristic belongs to the data side, calculate the target optimal splitting characteristic and the target optimal splitting point of the current leaf node for the data side by the encrypted codes of the sample sets of the two leaf nodes, obtain the sample sets of the two leaf nodes after splitting, and encrypt the sample sets of the two leaf nodes by adopting a public key of the data side.

In an embodiment of the present application, the first calculation module 1205 may specifically include a generation unit configured to generate a data-side encrypted aggregate value of each leaf node according to the encrypted code, the data-side public key, the first derivative, and the second derivative of the sample set of each leaf node; a transmitting unit configured to transmit the data side encryption aggregation value to the first data side; the receiving unit is configured to receive the data side decryption aggregation value and the corresponding number which are sent after the first data side decrypts the data side encryption aggregation value; and the first calculation unit is configured to calculate the target weight according to the decryption aggregation value of the data side and the corresponding number.

In an embodiment of the present application, the second calculating module 1206 may specifically include a constructing unit configured to construct an mth tree according to the target weight; the updating unit is configured to update the predicted value of the mth sub-model for each piece of wind control data, wherein the mth sub-model comprises 1 st tree to mth tree; and the second calculation unit is configured to calculate a gradient lifting tree model according to the M submodels.

It should be noted that the above explanation of the embodiment of the gradient lifting tree model generation method is also applicable to the gradient lifting tree model generation apparatus of this embodiment, and the specific process is not described herein again.

In the generation apparatus of the gradient spanning tree model in the embodiment of the application, a service side generates a service side encryption aggregation value of a first order derivative and a second order derivative in each data bin according to a data side public key, an encryption code of a sample set of a current leaf node of an M-1 th tree, a first order derivative and a second order derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M, sends the service side encryption aggregation value to a first data side in the data side, determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of a service side information gain sent after the first data side decrypts the service side encryption aggregation value, a corresponding service side splitting characteristic number and a corresponding service side splitting point number, obtains the encryption code of the sample set of two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the current leaf node, and calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. And (3) by utilizing a homomorphic encryption technology, keeping a sample set held by a data party secret for a service party, and ensuring that the service party can correctly calculate the information gain corresponding to each splitting point of each characteristic under the condition of encryption, thereby completing the establishment of the model. The model generation device can ensure that the splitting characteristics and the meaning of the data party disclosed to the service party still cannot reveal the data privacy of the data party, the gradient lifting tree model is generated in a lossless mode, and meanwhile the interpretability of the gradient lifting tree model is enhanced.

In order to implement the foregoing embodiment, an apparatus for generating a gradient lifting tree model is further provided in the embodiment of the present application. Fig. 13 is a schematic structural diagram of a gradient lifting tree model generation apparatus according to another embodiment of the present application. As shown in fig. 13, the apparatus 1300 for generating a gradient spanning tree model according to an embodiment of the present application may specifically include: a first receiving module 1301, a first decrypting module 1302 and a second sending module 1303.

The first receiving module 1301 is configured to receive a service party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a service party, where the service party encrypted aggregate value is generated by the service party according to a data party public key, an encrypted code of a sample set of a current leaf node of an M-1 th tree, and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, and M is smaller than a preset number M.

The first decryption module 1302 is configured to perform decryption calculation on the service party encrypted aggregation value to obtain a maximum value of the service party information gain, and a corresponding service party splitting feature number and a corresponding service party splitting point number.

The second sending module 1303 is configured to send the maximum value of the information gain of the service party, the splitting feature number of the service party, and the splitting point number of the service party to the service party, so that the service party determines an optimal target splitting feature and a corresponding optimal target splitting point according to the maximum value of the information gain of the service party, the splitting feature number of the service party, and the splitting point number of the service party, and obtains an encrypted code of a sample set of two leaf nodes obtained after splitting based on the optimal target splitting feature and the optimal target splitting point of the current leaf node, calculates a target weight of each leaf node according to the encrypted code of the sample set of each leaf node, and calculates a gradient lifting tree model according to the target weight.

In an embodiment of the present application, the gradient lifting tree generating apparatus 1300 of the embodiment of the present application may further include: a second generation module configured to generate a public key of a data party; and the third sending module is configured to send the public key of the data party to the service party.

In an embodiment of the present application, the gradient lifting tree generating apparatus 1300 of the embodiment of the present application may further include: the second receiving module is configured to receive the encrypted first-order derivative and the encrypted second-order derivative sent by the service party, and the encrypted first-order derivative and the encrypted second-order derivative are obtained by the service party after homomorphic encryption is performed on the first-order derivative and the second-order derivative; a third generation module configured to generate a data side encrypted aggregation value of the first derivative and the second derivative in each data bin according to the public key of the service side, the sample set of the current leaf node, the encrypted first derivative and the encrypted second derivative; a fourth sending module configured to send the data side encrypted aggregate value to the service side; the third receiving module is configured to receive the maximum value of the data party information gain sent by the service party after decrypting the data party encrypted aggregation value, and the corresponding data party splitting feature number and the data party splitting point number; the second determining module is configured to determine the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side according to the maximum value of the information gain of the data side, the splitting characteristic number of the data side and the splitting point number of the data side; and the fifth sending module is configured to send the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side to the service side.

In an embodiment of the present application, the gradient lifting tree generating apparatus 1300 of the embodiment of the present application may further include: the fourth receiving module is configured to receive the encrypted codes of the sample sets of the two leaf nodes sent by the service party if the target optimal splitting characteristic belongs to the service party, calculate the target optimal splitting characteristic and the target optimal splitting point of the current leaf node for the service party according to the encrypted codes of the sample sets of the two leaf nodes, obtain the sample sets of the two leaf nodes after splitting, and encrypt the sample sets of the two leaf nodes by adopting a public key of a data party; and the third calculation module is configured to calculate a sample set of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point if the target optimal splitting characteristic belongs to the data side, encrypt the sample set of the two leaf nodes by adopting a public key of the data side to obtain encrypted codes of the sample set of the two leaf nodes, and send the encrypted codes of the sample set of the two leaf nodes to the service side.

In an embodiment of the present application, the gradient lifting tree generating apparatus 1300 of the embodiment of the present application may further include: a fifth receiving module, configured to receive a data side encryption aggregation value of each leaf node sent by a service side, where the data side encryption aggregation value is generated by the service side according to an encryption code of a sample set of each leaf node, a data side public key, a first order derivative, and a second order derivative; the second decryption module is configured to decrypt the encrypted aggregation value of the data side to obtain a decrypted aggregation value of the data side and a corresponding number; and the sixth sending module is configured to send the data side decryption aggregation value and the corresponding number to the service side so that the service side can calculate the target weight according to the data side decryption aggregation value and the corresponding number.

In the apparatus for generating a gradient spanning tree model in an embodiment of the application, a first data party receives a service party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a service party, the service party encrypted aggregate value is generated by the service party according to a data party public key, an encrypted code of a sample set of a current leaf node of an M-1 th tree, and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, where M is smaller than a preset number M, and the service party encrypted aggregate value is decrypted to obtain a maximum value of a service party information gain, a corresponding service party splitting characteristic number and a corresponding service party splitting point number, and the maximum value of the service party information gain, the service party splitting characteristic number and the service party splitting point number are sent to the service party for the service party to send the maximum value of the service party information gain, the service party encrypted aggregate value and the second-order derivative to the service party according to the maximum value of the service party information gain, Determining a target optimal splitting characteristic and a corresponding target optimal splitting point by the service party splitting characteristic number and the service party splitting point number, acquiring encryption codes of a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculating a target weight of each leaf node according to the encryption codes of the sample set of each leaf node, and calculating according to the target weight to obtain a gradient lifting tree model. By sending the encryption codes of the sample set to the business side, the data security of the sample set is guaranteed, the characteristic meaning of the data side is disclosed to the business side on the premise, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on federal learning is enhanced.

In order to implement the foregoing embodiment, an apparatus for generating a gradient lifting tree model is further provided in the embodiment of the present application. Fig. 14 is a schematic structural diagram of a gradient spanning tree model generating device according to another embodiment of the present application, which is applied to a second data party, and as shown in fig. 14, the gradient spanning tree model generating device 1400 according to the embodiment of the present application may specifically include: a sixth receiving module 1401, a fourth generating module 1402, a seventh transmitting module 1403, a seventh receiving module 1404, a third determining module 1405 and an eighth transmitting module 1406.

A sixth receiving module 1401, configured to receive the encrypted first derivative and the encrypted second derivative sent by the service party, where the encrypted first derivative and the encrypted second derivative are obtained by the service party performing homomorphic encryption on the first derivative and the second derivative.

A fourth generating module 1402 configured to generate a data-side encrypted aggregate value of the first and second derivatives in each data bin according to the public key of the service side, the sample set of the current leaf node, the encrypted first derivative, and the encrypted second derivative.

A seventh sending module 1403 configured to send the data side encrypted aggregation value to the service side.

A seventh receiving module 1404, configured to receive a maximum value of the data side information gain sent after the service side decrypts the data side encrypted aggregation value, and a corresponding data side split feature number and a data side split point number.

A third determination module 1405 configured to determine the data side optimal splitting feature, the data side optimal splitting point, and the data side maximum information gain according to the maximum value of the data side information gain, the data side splitting feature number, and the data side splitting point number.

An eighth sending module 1406 configured to send the optimal splitting characteristic of the data side, the optimal splitting point of the data side, and the maximum information gain of the data side to the service side, so that the service side determines the optimal splitting characteristic of the target and the corresponding optimal splitting point according to the optimal splitting characteristic of the data side, the optimal splitting point of the data side, the maximum information gain of the data side, the maximum value of the information gain of the service side, the splitting characteristic number of the service side, and the splitting point number of the service side, obtains the encryption codes of the sample sets of the two leaf nodes obtained after splitting based on the optimal splitting characteristic of the target and the optimal splitting point of the target of the current leaf node, calculates the target weight of each leaf node according to the encryption codes of the sample set of each leaf node, obtains the gradient lifting tree model according to the target weight calculation, and obtains the maximum value of the information gain of the service side, the splitting characteristic number of the service side, and the splitting point number of the service side from the first data side to the data in each data split box sent by the service side And the business side encryption aggregation values of the first derivative and the second derivative are obtained by decryption calculation.

In an embodiment of the present application, the apparatus 1400 for generating a gradient lifting tree model according to an embodiment of the present application may further include: the eighth receiving module is configured to receive the encrypted codes of the sample sets of the two leaf nodes sent by the service party if the target optimal splitting characteristic belongs to the service party, calculate the target optimal splitting characteristic and the target optimal splitting point of the current leaf node for the service party according to the encrypted codes of the sample sets of the two leaf nodes, obtain the sample sets of the two leaf nodes after splitting, and encrypt the sample sets of the two leaf nodes by adopting a public key of a data party; and the fourth calculation module is configured to calculate a sample set of the two leaf nodes obtained after the splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point if the target optimal splitting characteristic belongs to the data side, encrypt the sample set of the two leaf nodes by using a public key of the data side to obtain encrypted codes of the sample set of the two leaf nodes, and send the encrypted codes of the sample set of the two leaf nodes to the service side.

In the apparatus for generating a gradient spanning tree model according to the embodiment of the present application, a second data party receives an encrypted first-order derivative and an encrypted second-order derivative sent by a service party, where the encrypted first-order derivative and the encrypted second-order derivative are obtained by the service party by performing homomorphic encryption on the first-order derivative and the second-order derivative, and according to a public key of the service party, a sample set of a current leaf node, the encrypted first-order derivative and the encrypted second-order derivative, a data party encrypted aggregation value of the first-order derivative and the second-order derivative in each data bin is generated, the data party encrypted aggregation value is sent to the service party, a maximum value of data party information gain sent by the service party after decrypting the data party encrypted aggregation value is received, and corresponding data party split feature numbers and data party split point numbers, and according to the maximum value of data party information gain, the data party split feature number and the data party split point number, an optimal split feature number of the data party is determined, The data side optimal splitting point and the data side maximum information gain are sent to a service side, so that the service side can determine a target optimal splitting characteristic and a corresponding target optimal splitting point according to the data side optimal splitting characteristic, the data side optimal splitting point, the data side maximum information gain, the maximum value of the service side information gain, a service side splitting characteristic number and the service side splitting point number, obtain encrypted codes of a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculate a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and obtain a gradient lifting tree model according to the target weight calculation, and calculate the maximum value of the service side information gain, The service party splitting characteristic number and the service party splitting point number are obtained by carrying out decryption calculation on a service party encryption aggregation value of a first-order derivative and a second-order derivative in each data sub-box sent by the service party by the first data party. The characteristic meaning of the data side is disclosed to the business side on the premise of hiding the sample set, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on the federal learning is enhanced.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 15 is a block diagram of an electronic device of a gradient lifting tree model generation method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as smart voice interaction devices, personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 15, the electronic apparatus includes: one or more processors 1501, memory 1502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor 1501 may process instructions executed within the electronic device, including instructions stored in or on a memory to display graphical information of a GUI on an external input/output device (such as a display device coupled to an interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 15 illustrates an example of a processor 1501.

The memory 1502 is a non-transitory computer readable storage medium provided herein. The memory stores instructions executable by the at least one processor, so that the at least one processor executes the gradient lifting tree generation method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the gradient boosting tree generation method provided herein.

The memory 1502 is a non-transitory computer readable storage medium, and can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the gradient boosting tree generation method in the embodiment of the present application (for example, the first generation module 1201, the first transmission module 1202, the first determination module 1203, the acquisition module 1204, the first calculation module 1205 and the second calculation module 1206 shown in fig. 12, or the first receiving module 1301, the first decryption module 1302 and the second transmission module 1303 shown in fig. 13, or the sixth receiving module 1401, the fourth generation module 1402, the seventh transmission module 1403, the seventh receiving module 1404, the third determination module 1405 and the eighth transmission module 1406 shown in fig. 14). The processor 1501 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 1502, that is, implements the generation method of the gradient lifting tree model in the above method embodiment.

The memory 1502 may include a program storage area that may store an operating system, an application program required for at least one function, and a data storage area; the storage data area may store data created according to use of an electronic device of a generation method of the gradient boosting tree model, and the like. Further, the memory 1502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1502 may optionally include memory remotely located from the processor 1501, which may be connected over a network to an electronic device of the method of generating a gradient boosting tree model. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the gradient lifting tree model generation method may further include: an input device 1503 and an output device 1504. The processor 1501, the memory 1502, the input device 1503, and the output device 1504 may be connected by a bus or other means, such as the bus connection shown in fig. 15.

The input device 1503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the method of generating the gradient boosting tree model, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 1504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS").

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

In the description of the present specification, the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A generation method of a gradient lifting tree model is applied to a business side, and comprises the following steps:

generating a business side encryption aggregation value of the first-order derivative and the second-order derivative in each data sub-box according to a data side public key, an encryption code of a sample set of a current leaf node of an M-1 th tree, and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M;

sending the business side encrypted aggregation value to a first data side in the data sides;

determining a target optimal splitting characteristic and a corresponding target optimal splitting point according to the maximum value of the business side information gain sent after the first data side decrypts the encrypted aggregation value of the business side, the corresponding business side splitting characteristic number and the business side splitting point number;

acquiring encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point;

calculating the target weight of each leaf node according to the encryption codes of the sample set of each leaf node;

and calculating according to the target weight to obtain a gradient lifting tree model.

2. The generation method according to claim 1, wherein the data party public key is generated by the first data party.

3. The method according to claim 1, wherein the determining a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of a service party information gain sent after the first data party decrypts the encrypted aggregation value of the service party, and a corresponding service party splitting characteristic number and a corresponding service party splitting point number comprises:

determining the optimal splitting characteristic of the service party, the optimal splitting point of the service party and the corresponding maximum information gain of the service party according to the maximum value of the information gain of the service party, the splitting characteristic number of the service party and the splitting point number of the service party;

acquiring optimal splitting characteristics of a data side, optimal splitting points of the data side and corresponding maximum information gain of the data side;

determining the larger value of the maximum information gain of the service party and the maximum information gain of the data party as a target maximum information gain;

determining the optimal splitting characteristic corresponding to the target maximum information gain as the target optimal splitting characteristic;

and determining the optimal splitting point corresponding to the target maximum information gain as the target optimal splitting point.

4. The method according to claim 1, wherein the obtaining the optimal splitting characteristic of the data side, the optimal splitting point of the data side, and the corresponding maximum information gain of the data side comprises:

after homomorphic encryption is carried out on the first derivative and the second derivative, an encrypted first derivative and an encrypted second derivative are obtained;

sending the encrypted first derivative and the encrypted second derivative to the data side;

receiving a data party encryption aggregation value of the first-order derivative and the second-order derivative in each data sub-box sent by the data party, wherein the data party encryption aggregation value is generated by the data party according to a service party public key, a sample set of a current leaf node, the encryption first-order derivative and the encryption second-order derivative;

carrying out decryption calculation on the encrypted aggregation value of the data side to obtain the maximum value of the information gain of the data side, and the corresponding data side splitting characteristic number and the data side splitting point number;

sending the maximum value of the data side information gain, the corresponding data side split characteristic number and the corresponding data side split point number to the data side;

and receiving the optimal splitting characteristic of the data party, the optimal splitting point of the data party and the maximum information gain of the data party, which are sent by the data party, wherein the optimal splitting characteristic of the data party, the optimal splitting point of the data party and the maximum information gain of the data party are determined by the data party according to the maximum value of the information gain of the data party, the splitting characteristic number of the data party and the splitting point number of the data party.

5. The generation method according to claim 1, wherein the obtaining of the encrypted codes of the sample sets of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point comprises:

if the target optimal splitting characteristic belongs to the service party, calculating sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, encrypting the sample sets of the two leaf nodes by adopting the data party public key to obtain encrypted codes of the sample sets of the two leaf nodes, and sending the encrypted codes of the sample sets of the two leaf nodes to the data party;

and if the target optimal splitting characteristic belongs to the data side, receiving encrypted codes of the sample sets of the two leaf nodes sent by the data side, calculating the sample sets of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point by the encrypted codes of the sample sets of the two leaf nodes for the data side, and encrypting the sample sets of the two leaf nodes by adopting the data side public key.

6. The method of generating as claimed in claim 1, wherein said calculating the target weight for each leaf node from the cryptographic encoding of the sample set for each leaf node comprises:

generating a data party encrypted aggregation value of each leaf node according to the encrypted codes of the sample set of each leaf node, the data party public key, the first-order derivative and the second-order derivative;

sending the data side encrypted aggregate value to the first data side;

receiving a data party decryption aggregation value and a corresponding number which are sent after the first data party decrypts the data party encryption aggregation value;

and calculating to obtain the target weight according to the decryption aggregation value of the data side and the corresponding number.

7. The method of generating as claimed in claim 1, wherein said calculating a gradient-boosted tree model from said target weights comprises:

constructing an m tree according to the target weight;

updating the predicted value of the mth sub-model for each piece of wind control data, wherein the mth sub-model comprises 1 st tree to the mth tree;

and calculating to obtain the gradient lifting tree model according to the M sub-models.

8. A generation method of a gradient lifting tree model is applied to a first data side, and comprises the following steps:

receiving a business party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a business party, wherein the business party encrypted aggregate value is generated by the business party according to a public key of the data party, an encrypted code of a sample set of a current leaf node of an M-1 th tree and the first-order derivative and the second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, and M is smaller than a preset number M;

carrying out decryption calculation on the encrypted aggregation value of the service party to obtain the maximum value of the information gain of the service party, and the corresponding service party splitting characteristic number and the corresponding service party splitting point number;

and sending the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number to the service party so that the service party determines a target optimal splitting feature and a corresponding target optimal splitting point according to the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number, acquires an encrypted code of a sample set of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting feature and the target optimal splitting point, calculates a target weight of each leaf node according to the encrypted code of the sample set of each leaf node, and calculates a gradient lifting tree model according to the target weight.

9. The generation method according to claim 8, further comprising:

generating the public key of the data side;

and sending the public key of the data party to the service party.

10. The generation method according to claim 8, further comprising:

receiving an encrypted first derivative and an encrypted second derivative sent by the service party, wherein the encrypted first derivative and the encrypted second derivative are obtained by the service party after homomorphic encryption is carried out on the first derivative and the second derivative;

generating a data side encryption aggregation value of the first derivative and the second derivative in each data sub-box according to a service side public key, a sample set of a current leaf node, the encryption first derivative and the encryption second derivative;

sending the data side encryption aggregation value to the service side;

receiving the maximum value of the data party information gain sent by the service party after decrypting the data party encrypted aggregation value, and the corresponding data party splitting characteristic number and the data party splitting point number;

determining the optimal splitting characteristic of the data party, the optimal splitting point of the data party and the maximum information gain of the data party according to the maximum value of the information gain of the data party, the number of the splitting characteristic of the data party and the number of the splitting point of the data party;

and sending the optimal splitting characteristic of the data party, the optimal splitting point of the data party and the maximum information gain of the data party to the service party.

11. The generation method according to claim 8, further comprising:

if the target optimal splitting characteristic belongs to the service party, receiving encrypted codes of sample sets of the two leaf nodes sent by the service party, wherein the encrypted codes of the sample sets of the two leaf nodes are used for calculating the sample sets of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point of the current leaf node and are obtained by encrypting the sample sets of the two leaf nodes by adopting the data party public key;

and if the target optimal splitting characteristic belongs to the data side, calculating sample sets of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, encrypting the sample sets of the two leaf nodes by adopting the data side public key to obtain encrypted codes of the sample sets of the two leaf nodes, and sending the encrypted codes of the sample sets of the two leaf nodes to the service side.

12. The generation method according to claim 8, further comprising:

receiving a data side encryption aggregation value of each leaf node sent by the service side, wherein the data side encryption aggregation value is generated by the service side according to an encryption code of a sample set of each leaf node, the data side public key, the first-order derivative and the second-order derivative;

carrying out decryption calculation on the encrypted aggregation value of the data side to obtain a decrypted aggregation value of the data side and a corresponding number;

and sending the data side decryption aggregation value and the corresponding number to the service side so that the service side can calculate the target weight according to the data side decryption aggregation value and the corresponding number.

13. A generation method of a gradient lifting tree model is applied to a second data party, and comprises the following steps:

receiving an encrypted first derivative and an encrypted second derivative sent by a service party, wherein the encrypted first derivative and the encrypted second derivative are obtained by the service party after homomorphic encryption is carried out on the first derivative and the second derivative;

sending the data side encryption aggregation value to the service side;

sending the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side to the service side, so that the service side can determine the optimal splitting characteristic of a target and the corresponding optimal splitting point of the target according to the optimal splitting characteristic of the data side, the optimal splitting point of the data side, the maximum information gain of the data side, the maximum value of the information gain of the service side, the splitting characteristic number of the service side and the splitting point number of the service side, obtain the encrypted codes of a sample set of two leaf nodes obtained after splitting of the current leaf node based on the optimal splitting characteristic of the target and the optimal splitting point of the target, calculate the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, calculate a gradient lifting tree model according to the target weight, and obtain the maximum value, the splitting characteristic of the information gain of the service side and the maximum value of the target lifting tree model are obtained according to the encrypted codes of the target weight of each leaf node, The service party splitting characteristic number and the service party splitting point number are obtained by the first data party through decryption calculation of service party encryption aggregation values of a first derivative and a second derivative in each data sub-box sent by the service party.

14. The generation method according to claim 13, further comprising:

15. A generation device of a gradient lifting tree model is applied to a business side and comprises the following components:

the first generation module is configured to generate a business side encryption aggregation value of the first order derivative and the second order derivative in each data bin according to a data side public key, encryption codes of a sample set of a current leaf node of an M-1 th tree, and a first order derivative and a second order derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M;

a first sending module configured to send the business side encrypted aggregate value to a first one of the data sides;

a first determining module, configured to determine a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of a service party information gain sent after the first data party decrypts the encrypted aggregation value of the service party, and a corresponding service party splitting characteristic number and a corresponding service party splitting point number;

the obtaining module is configured to obtain encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point;

a first calculation module configured to calculate a target weight for each leaf node from the cryptographic encoding of the sample set for each leaf node;

and the second calculation module is configured to calculate a gradient lifting tree model according to the target weight.

16. The generation apparatus of claim 15, wherein the public key of the data party is generated by the first data party.

17. The apparatus of claim 15, wherein the first determining module comprises:

a first determining unit, configured to determine an optimal splitting feature of a service party, an optimal splitting point of the service party, and a corresponding maximum information gain of the service party according to the maximum value of the information gain of the service party, the splitting feature number of the service party, and the splitting point number of the service party;

an acquisition unit configured to acquire a data side optimal splitting characteristic, a data side optimal splitting point, and a corresponding data side maximum information gain;

a second determination unit configured to determine a larger value of the traffic side maximum information gain and the data side maximum information gain as a target maximum information gain;

a third determining unit configured to determine an optimal splitting characteristic corresponding to the target maximum information gain as the target optimal splitting characteristic;

a fourth determining unit configured to determine an optimal split point corresponding to the target maximum information gain as the target optimal split point.

18. The generation apparatus according to claim 15, wherein the acquisition unit includes:

the encryption subunit is configured to perform homomorphic encryption on the first order derivative and the second order derivative to obtain an encrypted first order derivative and an encrypted second order derivative;

a first transmitting subunit configured to transmit the encrypted first-order derivative and the encrypted second-order derivative to the data side;

a first receiving subunit, configured to receive a data party encrypted aggregate value of the first order derivative and the second order derivative in each data sub-box sent by the data party, where the data party encrypted aggregate value is generated by the data party according to a service party public key, a sample set of a current leaf node, the encrypted first order derivative, and the encrypted second order derivative;

the decryption subunit is configured to perform decryption calculation on the data side encrypted aggregation value to obtain a maximum value of data side information gain, and a corresponding data side split feature number and a corresponding data side split point number;

a second transmitting subunit configured to transmit the maximum value of the data side information gain, and the corresponding data side split feature number and data side split point number to the data side;

a second receiving subunit, configured to receive the optimal splitting characteristic of the data party, the optimal splitting point of the data party, and the maximum information gain of the data party, which are sent by the data party, where the optimal splitting characteristic of the data party, the optimal splitting point of the data party, and the maximum information gain of the data party are determined by the data party according to the maximum value of the information gain of the data party, the splitting characteristic number of the data party, and the splitting point number of the data party.

19. The generation apparatus according to claim 15, wherein the obtaining module comprises:

a first execution unit, configured to calculate sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point if the target optimal splitting characteristic belongs to the service party, encrypt the sample sets of the two leaf nodes by using the data party public key to obtain encryption codes of the sample sets of the two leaf nodes, and send the encryption codes of the sample sets of the two leaf nodes to the data party;

and a second execution unit, configured to receive, if the target optimal splitting characteristic belongs to the data side, encryption codes of sample sets of the two leaf nodes sent by the data side, where the encryption codes of the sample sets of the two leaf nodes calculate, for the data side, a sample set of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point of the current leaf node, and encrypt the sample set of the two leaf nodes by using the data side public key.

20. The generation apparatus according to claim 15, wherein the first calculation module comprises:

a generating unit configured to generate a data-side encrypted aggregate value for each leaf node from the encrypted encoding of the sample set for each leaf node, the data-side public key, the first derivative, and the second derivative;

a transmitting unit configured to transmit the data side encrypted aggregate value to the first data side;

a receiving unit configured to receive a data side decryption aggregation value and a corresponding number, which are sent after the first data side decrypts the data side encryption aggregation value;

and the first calculation unit is configured to calculate the target weight according to the data side decryption aggregation value and the corresponding number.

21. The generation apparatus according to claim 15, wherein the second calculation module comprises:

a building unit configured to build an mth tree according to the target weight;

the updating unit is configured to update the predicted value of the mth sub-model for each piece of wind control data, and the mth sub-model comprises 1 st tree to the mth tree;

a second calculating unit configured to calculate the gradient lifting tree model according to the M sub-models.

22. A generation device of a gradient lifting tree model is applied to a first data side and comprises:

the first receiving module is configured to receive a service party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a service party, wherein the service party encrypted aggregate value is generated by the service party according to a data party public key, an encrypted code of a sample set of a current leaf node of an M-1 th tree and the first-order derivative and the second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, and M is smaller than a preset number M;

the first decryption module is configured to decrypt and calculate the encrypted aggregation value of the service party to obtain the maximum value of the information gain of the service party, and the corresponding service party splitting feature number and the corresponding service party splitting point number;

the second sending module is configured to send the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number to the service party, so that the service party determines a target optimal splitting feature and a corresponding target optimal splitting point according to the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number, obtains encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting feature and the target optimal splitting point, calculates a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and calculates a gradient lifting tree model according to the target weight.

23. The generation apparatus according to claim 22, further comprising:

a second generation module configured to generate the data party public key;

a third sending module configured to send the data party public key to the service party.

24. The generation apparatus according to claim 22, further comprising:

the second receiving module is configured to receive an encrypted first-order derivative and an encrypted second-order derivative sent by the service party, and the encrypted first-order derivative and the encrypted second-order derivative are obtained by the service party after homomorphic encryption is performed on the first-order derivative and the second-order derivative;

a third generating module configured to generate a data side encrypted aggregate value of the first and second derivatives in each data bin according to a public key of a service side, a sample set of a current leaf node, the encrypted first derivative and the encrypted second derivative;

a fourth sending module configured to send the data side encrypted aggregate value to the service side;

a third receiving module, configured to receive a maximum value of data party information gain sent after the service party decrypts the data party encrypted aggregation value, and a corresponding data party splitting feature number and a data party splitting point number;

a second determining module configured to determine the data side optimal splitting feature, the data side optimal splitting point, and the data side maximum information gain according to the maximum value of the data side information gain, the data side splitting feature number, and the data side splitting point number;

a fifth sending module configured to send the optimal splitting characteristic of the data side, the optimal splitting point of the data side, and the maximum information gain of the data side to the service side.

25. The generation apparatus according to claim 22, further comprising:

a fourth receiving module, configured to receive, if the target optimal splitting characteristic belongs to the service party, encryption codes of sample sets of the two leaf nodes sent by the service party, where the encryption codes of the sample sets of the two leaf nodes calculate, for the service party, a sample set of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point of the current leaf node, and encrypt the sample set of the two leaf nodes by using the data party public key;

and the third calculation module is configured to calculate sample sets of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point if the target optimal splitting characteristic belongs to the data party, encrypt the sample sets of the two leaf nodes by using the data party public key to obtain encrypted codes of the sample sets of the two leaf nodes, and send the encrypted codes of the sample sets of the two leaf nodes to the service party.

26. The generation apparatus according to claim 22, further comprising:

a fifth receiving module configured to receive a data side encrypted aggregate value of each leaf node sent by the service side, where the data side encrypted aggregate value is generated by the service side according to an encrypted code of a sample set of each leaf node, the data side public key, the first order derivative, and the second order derivative;

the second decryption module is configured to decrypt the data side encrypted aggregate value to obtain a data side decrypted aggregate value and a corresponding number;

and the sixth sending module is configured to send the data side decryption aggregation value and the corresponding number to the service side, so that the service side can calculate the target weight according to the data side decryption aggregation value and the corresponding number.

27. A generation device of a gradient lifting tree model is applied to a second data party, and comprises:

the sixth receiving module is configured to receive an encrypted first-order derivative and an encrypted second-order derivative sent by a service party, and the encrypted first-order derivative and the encrypted second-order derivative are obtained by the service party after homomorphic encryption is performed on the first-order derivative and the second-order derivative;

a fourth generation module configured to generate a data side encrypted aggregate value of the first order derivative and the second order derivative in each data bin according to a public key of a service side, a sample set of a current leaf node, the encrypted first order derivative and the encrypted second order derivative;

a seventh sending module configured to send the data side encrypted aggregate value to the service side;

a seventh receiving module, configured to receive a maximum value of data party information gain sent after the service party decrypts the data party encrypted aggregation value, and a corresponding data party splitting feature number and a data party splitting point number;

a third determining module configured to determine the data side optimal splitting feature, the data side optimal splitting point and the data side maximum information gain according to the maximum value of the data side information gain, the data side splitting feature number and the data side splitting point number;

an eighth sending module, configured to send the optimal splitting characteristic of the data party, the optimal splitting point of the data party, and the maximum information gain of the data party to the service party, so that the service party determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to the optimal splitting characteristic of the data party, the optimal splitting point of the data party, the maximum information gain of the data party, the maximum value of the information gain of the service party, the splitting characteristic number of the service party, and the splitting point number of the service party, and obtains encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculates a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and obtains a gradient lifting tree model according to the target weight calculation, the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number are obtained by the first data party through decryption calculation on the service party encryption aggregation value of the first derivative and the second derivative in each data sub-box sent by the service party.

28. The generation apparatus of claim 27, further comprising:

an eighth receiving module, configured to receive, if the target optimal splitting characteristic belongs to the service party, encryption codes of sample sets of the two leaf nodes sent by the service party, where the encryption codes of the sample sets of the two leaf nodes calculate, for the service party, a sample set of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point of the current leaf node, and encrypt the sample set of the two leaf nodes by using the data party public key;

and a fourth calculation module, configured to calculate a sample set of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point if the target optimal splitting characteristic belongs to the data side, encrypt the sample set of the two leaf nodes by using the data side public key to obtain encrypted codes of the sample set of the two leaf nodes, and send the encrypted codes of the sample set of the two leaf nodes to the service side.

29. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of generating a gradient-boosted tree model as defined in any one of claims 1 to 7, or to perform a method of generating a gradient-boosted tree model as defined in any one of claims 8 to 12, or to perform a method of generating a gradient-boosted tree model as defined in any one of claims 13 to 14.

30. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of generating a gradient-lifting tree model according to any one of claims 1-7, or to perform the method of generating a gradient-lifting tree model according to any one of claims 8-12, or to perform the method of generating a gradient-lifting tree model according to any one of claims 13-14.