CN113722739A - Gradient lifting tree model generation method and device, electronic equipment and storage medium - Google Patents

Gradient lifting tree model generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113722739A
CN113722739A CN202111038483.0A CN202111038483A CN113722739A CN 113722739 A CN113722739 A CN 113722739A CN 202111038483 A CN202111038483 A CN 202111038483A CN 113722739 A CN113722739 A CN 113722739A
Authority
CN
China
Prior art keywords
data
party
splitting
encrypted
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111038483.0A
Other languages
Chinese (zh)
Other versions
CN113722739B (en
Inventor
杨恺
彭南博
王虎
黄志翔
陈晓霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
Jingdong Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Holding Co Ltd filed Critical Jingdong Technology Holding Co Ltd
Priority to CN202111038483.0A priority Critical patent/CN113722739B/en
Publication of CN113722739A publication Critical patent/CN113722739A/en
Application granted granted Critical
Publication of CN113722739B publication Critical patent/CN113722739B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application provides a generation method, a generation device, electronic equipment and a storage medium of a gradient lifting tree model, wherein the method comprises the following steps: generating a business side encryption aggregation value of the first derivative and the second derivative in each data sub-box according to the public key of the data side, the encryption code of the sample set of the current leaf node of the (m-1) th tree and the first derivative and the second derivative of the predicted value after each piece of wind control data iterates for m-1 times, and sending the business side encryption aggregation value to the first data side for decryption; determining a target optimal splitting characteristic and a corresponding target optimal splitting point according to the maximum value of the business side information gain sent by the first data side after decryption, the corresponding business side splitting characteristic number and the business side splitting point number; acquiring encrypted codes of sample sets of two leaf nodes obtained after the current leaf node is split; calculating the target weight of each leaf node according to the encryption codes of the sample set of each leaf node; and calculating according to the target weight to obtain a gradient lifting tree model.

Description

Gradient lifting tree model generation method and device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of federal learning, in particular to a method and a device for generating a gradient lifting tree model, electronic equipment and a storage medium.
Background
At present, the data-driven artificial intelligence technology plays a great role in various industries, and brings high value. Therefore, privacy protection and data security of data are increasingly emphasized. The concept of federal learning is proposed, and it is expected that multiple participants will share no data, only intermediate results, and no data can be inferred while achieving the goal of common modeling. The model based on longitudinal federal learning is widely applied in wind control scenes.
In the related art, a model established by using federal learning in a wind control scene often has low interpretability, and the establishment of the model is difficult to realize in a lossless manner while data safety is guaranteed.
Disclosure of Invention
The application provides a method and a device for generating a gradient lifting tree model, electronic equipment and a storage medium.
An embodiment of a first aspect of the present application provides a method for generating a gradient lifting tree model, which is applied to a service side, and includes: generating a business side encryption aggregation value of the first-order derivative and the second-order derivative in each data sub-box according to a data side public key, an encryption code of a sample set of a current leaf node of an M-1 th tree, and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M; sending the business side encrypted aggregation value to a first data side in the data sides; determining a target optimal splitting characteristic and a corresponding target optimal splitting point according to the maximum value of the business side information gain sent after the first data side decrypts the encrypted aggregation value of the business side, the corresponding business side splitting characteristic number and the business side splitting point number; acquiring encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point; calculating the target weight of each leaf node according to the encryption codes of the sample set of each leaf node; and calculating according to the target weight to obtain a gradient lifting tree model.
The generation method of the gradient lifting tree model in the embodiment of the application generates a business side encryption aggregation value of a first derivative and a second derivative in each data sub-box according to a public key of a data side, encryption coding of a sample set of a current leaf node of an M-1 tree, a first derivative and a second derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M, sends the business side encryption aggregation value to a first data side in the data side, determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of business side information gain sent after the business side encryption aggregation value is decrypted by the first data side, a corresponding business side splitting characteristic number and a corresponding business side splitting point number, obtains encryption coding of a sample set of two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the current leaf node, and calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. And (3) by utilizing a homomorphic encryption technology, keeping a sample set held by a data party secret for a service party, and ensuring that the service party can correctly calculate the information gain corresponding to each splitting point of each characteristic under the condition of encryption, thereby completing the establishment of the model. The model generation method can ensure that the splitting characteristics and the meaning of the data party disclosed to the service party still cannot reveal the data privacy of the data party, the gradient lifting tree model can be generated in a lossless mode, and meanwhile the interpretability of the gradient lifting tree model is enhanced.
The embodiment of the second aspect of the present application provides a method for generating a gradient lifting tree model, which is applied to a first data side, and includes: receiving a business party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a business party, wherein the business party encrypted aggregate value is generated by the business party according to a public key of the data party, an encrypted code of a sample set of a current leaf node of an M-1 th tree and the first-order derivative and the second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, and M is smaller than a preset number M; carrying out decryption calculation on the encrypted aggregation value of the service party to obtain the maximum value of the information gain of the service party, and the corresponding service party splitting characteristic number and the corresponding service party splitting point number; and sending the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number to the service party so that the service party determines a target optimal splitting feature and a corresponding target optimal splitting point according to the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number, acquires an encrypted code of a sample set of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting feature and the target optimal splitting point, calculates a target weight of each leaf node according to the encrypted code of the sample set of each leaf node, and calculates a gradient lifting tree model according to the target weight.
The method for generating the gradient lifting tree model according to the embodiment of the application receives a business side encryption aggregation value of a first-order derivative and a second-order derivative in each data sub-box sent by a business side, the business side encryption aggregation value is generated by the business side according to a public key of the data side, an encryption code of a sample set of a current leaf node of an M-1 th tree and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, M is smaller than a preset number M, decryption calculation is carried out on the business side encryption aggregation value to obtain a maximum value of business side information gain, a corresponding business side splitting characteristic number and a corresponding business side splitting point number, the maximum value of the business side information gain, the business side splitting characteristic number and the business side splitting point number are sent to the business side, and the business side determines a target splitting characteristic and a corresponding target optimum splitting point according to the maximum value of the business side information gain, the business side splitting characteristic number and the business side splitting point number And splitting points, acquiring encryption codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculating the target weight of each leaf node according to the encryption codes of the sample sets of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. By sending the encryption codes of the sample set to the business side, the data security of the sample set is guaranteed, the characteristic meaning of the data side is disclosed to the business side on the premise, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on federal learning is enhanced.
An embodiment of a third aspect of the present application provides a method for generating a gradient lifting tree model, which is applied to a second data side, and includes: receiving an encrypted first derivative and an encrypted second derivative sent by a service party, wherein the encrypted first derivative and the encrypted second derivative are obtained by the service party after homomorphic encryption is carried out on the first derivative and the second derivative; generating a data side encryption aggregation value of the first derivative and the second derivative in each data sub-box according to a service side public key, a sample set of a current leaf node, the encryption first derivative and the encryption second derivative; sending the data side encryption aggregation value to the service side; receiving the maximum value of the data party information gain sent by the service party after decrypting the data party encrypted aggregation value, and the corresponding data party splitting characteristic number and the data party splitting point number; determining the optimal splitting characteristic of the data party, the optimal splitting point of the data party and the maximum information gain of the data party according to the maximum value of the information gain of the data party, the number of the splitting characteristic of the data party and the number of the splitting point of the data party; sending the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side to the service side, so that the service side can determine the optimal splitting characteristic of a target and the corresponding optimal splitting point of the target according to the optimal splitting characteristic of the data side, the optimal splitting point of the data side, the maximum information gain of the data side, the maximum value of the information gain of the service side, the splitting characteristic number of the service side and the splitting point number of the service side, obtain the encrypted codes of a sample set of two leaf nodes obtained after splitting of the current leaf node based on the optimal splitting characteristic of the target and the optimal splitting point of the target, calculate the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, calculate a gradient lifting tree model according to the target weight, and obtain the maximum value, the splitting characteristic of the information gain of the service side and the maximum value of the target lifting tree model are obtained according to the encrypted codes of the target weight of each leaf node, The service party splitting characteristic number and the service party splitting point number are obtained by the first data party through decryption calculation of service party encryption aggregation values of a first derivative and a second derivative in each data sub-box sent by the service party.
The method for generating the gradient lifting tree model includes the steps of receiving an encrypted first-order derivative and an encrypted second-order derivative sent by a service party, obtaining the encrypted first-order derivative and the encrypted second-order derivative after the service party encrypts the first-order derivative and the encrypted second-order derivative in a homomorphic mode, generating a data party encrypted aggregation value of the first-order derivative and the second-order derivative in each data sub-box according to a public key of the service party, a sample set of a current leaf node, the encrypted first-order derivative and the encrypted second-order derivative, sending the data party encrypted aggregation value to the service party, receiving a maximum value of data party information gain sent by the service party after the service party decrypts the data party encrypted aggregation value, and corresponding data party splitting characteristic numbers and data party splitting point numbers, and determining data party splitting characteristics according to the maximum value of the data party information gain, the data party splitting characteristic numbers and the data party splitting point numbers, and determining data party splitting characteristics of the data parties, The data side optimal splitting point and the data side maximum information gain are sent to a service side, so that the service side can determine a target optimal splitting characteristic and a corresponding target optimal splitting point according to the data side optimal splitting characteristic, the data side optimal splitting point, the data side maximum information gain, the maximum value of the service side information gain, a service side splitting characteristic number and the service side splitting point number, obtain encrypted codes of a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculate a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and obtain a gradient lifting tree model according to the target weight calculation, and calculate the maximum value of the service side information gain, The service party splitting characteristic number and the service party splitting point number are obtained by carrying out decryption calculation on a service party encryption aggregation value of a first-order derivative and a second-order derivative in each data sub-box sent by the service party by the first data party. The characteristic meaning of the data side is disclosed to the business side on the premise of hiding the sample set, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on the federal learning is enhanced.
An embodiment of a fourth aspect of the present application provides a device for generating a gradient spanning tree model, which is applied to a service provider, and includes: the first generation module is configured to generate a business side encryption aggregation value of the first order derivative and the second order derivative in each data bin according to a data side public key, encryption codes of a sample set of a current leaf node of an M-1 th tree, and a first order derivative and a second order derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M; a first sending module configured to send the business side encrypted aggregate value to a first one of the data sides; a first determining module, configured to determine a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of a service party information gain sent after the first data party decrypts the encrypted aggregation value of the service party, and a corresponding service party splitting characteristic number and a corresponding service party splitting point number; the obtaining module is configured to obtain encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point; a first calculation module configured to calculate a target weight for each leaf node from the cryptographic encoding of the sample set for each leaf node; and the second calculation module is configured to calculate a gradient lifting tree model according to the target weight.
The generation device of the gradient lifting tree model in the embodiment of the application generates a business side encryption aggregation value of a first derivative and a second derivative in each data sub-box according to a public key of a data side, encryption coding of a sample set of a current leaf node of an M-1 tree, a first derivative and a second derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M, sends the business side encryption aggregation value to a first data side in the data side, determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of business side information gain sent after the business side encryption aggregation value is decrypted by the first data side, a corresponding business side splitting characteristic number and a corresponding business side splitting point number, obtains encryption coding of a sample set of two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the current leaf node, and calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. And (3) by utilizing a homomorphic encryption technology, keeping a sample set held by a data party secret for a service party, and ensuring that the service party can correctly calculate the information gain corresponding to each splitting point of each characteristic under the condition of encryption, thereby completing the establishment of the model. The model generation device can ensure that the splitting characteristics and the meaning of the data party disclosed to the service party still cannot reveal the data privacy of the data party, the gradient lifting tree model is generated in a lossless mode, and meanwhile the interpretability of the gradient lifting tree model is enhanced.
An embodiment of a fifth aspect of the present application provides a generation apparatus of a gradient-lifting tree model, which is applied to a first data party, and includes: the first receiving module is configured to receive a service party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a service party, wherein the service party encrypted aggregate value is generated by the service party according to a data party public key, an encrypted code of a sample set of a current leaf node of an M-1 th tree and the first-order derivative and the second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, and M is smaller than a preset number M; the first decryption module is configured to decrypt and calculate the encrypted aggregation value of the service party to obtain the maximum value of the information gain of the service party, and the corresponding service party splitting feature number and the corresponding service party splitting point number; the second sending module is configured to send the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number to the service party, so that the service party determines a target optimal splitting feature and a corresponding target optimal splitting point according to the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number, obtains encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting feature and the target optimal splitting point, calculates a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and calculates a gradient lifting tree model according to the target weight.
The generation device of the gradient lifting tree model in the embodiment of the application receives a business side encryption aggregation value of a first-order derivative and a second-order derivative in each data sub-box sent by a business side, the business side encryption aggregation value is generated by the business side according to a public key of the data side, an encryption code of a sample set of a current leaf node of an M-1 th tree and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, M is smaller than a preset number M, decryption calculation is carried out on the business side encryption aggregation value to obtain a maximum value of business side information gain, a corresponding business side splitting characteristic number and a corresponding business side splitting point number, the maximum value of business side information gain, the business side splitting characteristic number and the business side splitting point number are sent to the business side, and the business side determines a target splitting characteristic and a corresponding target optimum splitting point according to the maximum value of the business side information gain, the business side splitting characteristic number and the business side splitting point number And splitting points, acquiring encryption codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculating the target weight of each leaf node according to the encryption codes of the sample sets of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. By sending the encryption codes of the sample set to the business side, the data security of the sample set is guaranteed, the characteristic meaning of the data side is disclosed to the business side on the premise, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on federal learning is enhanced.
An embodiment of a sixth aspect of the present application provides an apparatus for generating a gradient-lifting tree model, which is applied to a second data side, and includes: the sixth receiving module is configured to receive an encrypted first-order derivative and an encrypted second-order derivative sent by a service party, and the encrypted first-order derivative and the encrypted second-order derivative are obtained by the service party after homomorphic encryption is performed on the first-order derivative and the second-order derivative; a fourth generation module configured to generate a data side encrypted aggregate value of the first order derivative and the second order derivative in each data bin according to a public key of a service side, a sample set of a current leaf node, the encrypted first order derivative and the encrypted second order derivative; a seventh sending module configured to send the data side encrypted aggregate value to the service side; a seventh receiving module, configured to receive a maximum value of data party information gain sent after the service party decrypts the data party encrypted aggregation value, and a corresponding data party splitting feature number and a data party splitting point number; a third determining module configured to determine the data side optimal splitting feature, the data side optimal splitting point and the data side maximum information gain according to the maximum value of the data side information gain, the data side splitting feature number and the data side splitting point number; an eighth sending module, configured to send the optimal splitting characteristic of the data party, the optimal splitting point of the data party, and the maximum information gain of the data party to the service party, so that the service party determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to the optimal splitting characteristic of the data party, the optimal splitting point of the data party, the maximum information gain of the data party, the maximum value of the information gain of the service party, the splitting characteristic number of the service party, and the splitting point number of the service party, and obtains encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculates a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and obtains a gradient lifting tree model according to the target weight calculation, the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number are obtained by the first data party through decryption calculation on the service party encryption aggregation value of the first derivative and the second derivative in each data sub-box sent by the service party.
The generation device of the gradient lifting tree model in the embodiment of the application receives an encrypted first-order derivative and an encrypted second-order derivative sent by a service party, the encrypted first-order derivative and the encrypted second-order derivative are obtained by the service party through homomorphic encryption of the first-order derivative and the second-order derivative, a data party encrypted aggregation value of the first-order derivative and the second-order derivative in each data sub-box is generated according to a public key of the service party, a sample set of a current leaf node, the encrypted first-order derivative and the encrypted second-order derivative, the data party encrypted aggregation value is sent to the service party, the maximum value of data party information gain sent after the service party decrypts the data party encrypted aggregation value is received, and the corresponding data party splitting characteristic number and data party splitting point number, and the data party splitting characteristic number, the optimal data splitting characteristic number, the second-order and the data party splitting characteristic number are determined according to the maximum value of the data party information gain, the data party splitting characteristic number and the data party splitting point number, The data side optimal splitting point and the data side maximum information gain are sent to a service side, so that the service side can determine a target optimal splitting characteristic and a corresponding target optimal splitting point according to the data side optimal splitting characteristic, the data side optimal splitting point, the data side maximum information gain, the maximum value of the service side information gain, a service side splitting characteristic number and the service side splitting point number, obtain encrypted codes of a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculate a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and obtain a gradient lifting tree model according to the target weight calculation, and calculate the maximum value of the service side information gain, The service party splitting characteristic number and the service party splitting point number are obtained by carrying out decryption calculation on a service party encryption aggregation value of a first-order derivative and a second-order derivative in each data sub-box sent by the service party by the first data party. The characteristic meaning of the data side is disclosed to the business side on the premise of hiding the sample set, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on the federal learning is enhanced.
An embodiment of a seventh aspect of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of generating a gradient-boosted tree model as described in the embodiments of the first aspect above, or to perform a method of generating a gradient-boosted tree model as described in the embodiments of the second aspect above, or to perform a method of generating a gradient-boosted tree model as described in the embodiments of the third aspect above.
An eighth aspect of the present application proposes a computer-readable storage medium storing computer instructions for causing a computer to execute the method for generating a gradient-lifting tree model according to the embodiment of the first aspect, or execute the method for generating a gradient-lifting tree model according to the embodiment of the second aspect, or execute the method for generating a gradient-lifting tree model according to the embodiment of the third aspect.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of a method for generating a gradient lifting tree model according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of a method for generating a gradient spanning tree model according to another embodiment of the present application;
fig. 3 is a schematic flowchart of a method for generating a gradient spanning tree model according to another embodiment of the present application;
fig. 4 is a schematic flowchart of a method for generating a gradient spanning tree model according to another embodiment of the present application;
FIG. 5 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present disclosure;
FIG. 6 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present disclosure;
FIG. 7 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present disclosure;
FIG. 8 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present disclosure;
FIG. 9 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present application;
FIG. 10 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present application;
FIG. 11 is a flowchart illustrating a method for generating a gradient lifting tree model according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of a gradient lifting tree model generation apparatus according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a gradient lifting tree model generation apparatus according to another embodiment of the present application;
fig. 14 is a schematic structural diagram of a gradient lifting tree model generation apparatus according to another embodiment of the present application;
fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application. A method, an apparatus, an electronic device, and a storage medium for generating a gradient lifting tree model according to an embodiment of the present application are described below with reference to the drawings.
Fig. 1 is a schematic flow chart of a method for generating a gradient lifting tree model according to an embodiment of the present application. The method for generating a gradient spanning tree model according to the embodiment of the present application may be executed by the apparatus for generating a gradient spanning tree model provided in the embodiment of the present application, and the apparatus for generating a gradient spanning tree model may be disposed in an electronic device of a business side (Guest). As shown in fig. 1, the method for generating a gradient spanning tree model according to the embodiment of the present application may specifically include the following steps:
s101, generating a business side encryption aggregation value of the first-order derivative and the second-order derivative in each data sub-box according to a data side public key, an encryption code of a sample set of a current leaf node of an M-1 th tree, and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M.
Specifically, in the embodiment of the application, a wind control scene is taken as an example, wind control data is used as sample data, and the generation method of the gradient lifting tree model is described based on federal learning.
Federal Learning (Federal Learning) is an emerging artificial intelligence basic technology, and the design goal of the federal Learning is to carry out efficient machine Learning among multiple parties or multiple computing nodes on the premise of guaranteeing information security during big data exchange, protecting terminal data and personal data privacy and guaranteeing legal compliance. The goal is to build a model across organizations while each organization's personal data remains in its local environment and model parameters are exchanged under cryptographic mechanisms in a federated system. According to different characteristics of data held by different participants of federal learning, the data can be classified into horizontal federal learning, longitudinal federal learning, federal transfer learning and the like. The method is characterized in that different data holders have different characteristic dimensions of the same sample (such as a certain client), which is caused by different company businesses. The longitudinal federal learning is more widely applied and developed in a wind control scene, wherein the characteristics are that the label is only held by one party (namely, a business party Guest), and other parties only have partial characteristics of data (namely, a data party Host). The business side hopes to improve the effect of the model through the cooperation with the data side, and the purpose of reducing the risk is achieved. In the process, both the business side and the data side need to ensure the data security of the own side.
In the embodiment of the application, a business party and a data party respectively generate homomorphic encryption public and private key pairs by utilizing homomorphic encryption technology, and send own public keys to the other party, for example, the business party generates homomorphic encryption business party keys and business party public keys and sends the business party public keys to each data party, and the data party generates homomorphic encryption data party keys and data party public keys and sends the data party public keys to the business party for information transmission among different participating parties, thereby ensuring the security of data of each party. It should be noted that the public key of the data party is generated by a first data party (Host1), the first data party may be any one of the data parties, the data parties include a first data party and a second data party, and the second data party is another data party of the data parties except the first data party.
It should be noted that, before the gradient lifting tree model is established, sample data needs to be processed, that is, encrypted samples are aligned, since it is necessary for each participant to correspond features belonging to the same piece of data when the model is trained, in order to protect data privacy and security, data in the sample Set is aligned by using a privacy Protection Set Interaction (PSI), common users of both participants are confirmed without disclosing respective data of each participant, and users that do not overlap with each other are not exposed. E.g. PSI aligned service side data
Figure BDA0003248256560000091
And the label y ∈ RnAnd P data cube data X1,…,XPThe p-th participant data dimension is dp. The set of whole samples is X ═ X0,X1,…,XP]∈Rn×d
Figure BDA0003248256560000092
For n winds of p-th participantEach characteristic dimension k of the control data is 1, …, dpAnd performing binning to obtain L quantile points S of the characteristicsk={sk1,…,skLAs a threshold candidate to be split. The business side obtains the predicted value after each piece of wind control data iterates for m-1 times through calculation
Figure BDA0003248256560000093
First derivative of
Figure BDA0003248256560000094
And second derivative
Figure BDA0003248256560000095
i is 1, …, n. Carrying out homomorphic encryption on g and h according to the public key of the service party to obtain an encrypted first derivative
Figure BDA0003248256560000096
And encrypting the second derivative
Figure BDA0003248256560000097
To each of the data parties, wherein,
Figure BDA0003248256560000098
indicating that the encryption operation is performed using the public key of the business party. For each leaf node of the current mth tree, each participant p is required to determine whether it should split, how it should split. Participant p according to L thresholds SkAll data samples were divided into L-1 bin intervals.
In the embodiment of the present application, if the p-th party is a service party, the service party encrypts and encodes the sample set I of the current leaf node of the m-1 th tree known by the own party according to the public key of the data party sent by the data party, specifically, the first data party
Figure BDA0003248256560000099
The predicted value of each piece of wind control data calculated by the own party after iteration for m-1 times
Figure BDA00032482565600000910
First derivative g of (initially 0)iAnd second derivative hiGenerating business side encrypted aggregate values for the first and second derivatives within each data bin
Figure BDA00032482565600000911
And
Figure BDA00032482565600000912
wherein M is less than the preset number M. Wherein the first derivative and the second derivative of the service side encrypt the aggregate value
Figure BDA00032482565600000913
And
Figure BDA00032482565600000914
the calculation method of (2) is as follows:
Figure BDA00032482565600000915
Figure BDA00032482565600000916
wherein, giIterating each piece of wind control data for m-1 times to obtain a predicted value
Figure BDA0003248256560000101
First derivative of, hiIterating each piece of wind control data for m-1 times to obtain a predicted value
Figure BDA0003248256560000102
N is the number of wind control data of the p-th party of the current participant, pi is the code of the sample set I, pibinFor a set of samples I belonging to the current binbinIs coded, i.e.
Figure BDA0003248256560000103
And represents that the ith sample is in sample set I and belongs to the current data bin.
If the p-th party is a data party, the data party sends a public key of the service party, a sample set I of the current leaf node of the m-1 th tree known by the data party and an encrypted first-order derivative sent by the service party
Figure BDA0003248256560000104
And encrypting the second derivative
Figure BDA0003248256560000105
Generating a data-side encrypted aggregate value of the first and second derivatives within each data bin
Figure BDA0003248256560000106
And
Figure BDA0003248256560000107
and encrypts the data side with the aggregated value
Figure BDA0003248256560000108
And
Figure BDA0003248256560000109
and sending the data to a service party for decryption. Wherein the data side of the first and second derivatives encrypts the aggregate value
Figure BDA00032482565600001010
And
Figure BDA00032482565600001011
the calculation method comprises the following steps:
Figure BDA00032482565600001012
Figure BDA00032482565600001013
where π is the code of the sample set I, πbinBinning sample sets I for current databinIs coded, i.e.
Figure BDA00032482565600001014
And represents that the ith sample is in sample set I and belongs to the current data bin.
In the embodiment of the application, the data side encrypts the sample data to generate the encrypted code of the sample set
Figure BDA00032482565600001015
By utilizing the homomorphic encryption technology, the business side can still complete the construction of the model under the condition that the plaintext of the sample set of each node of the data side is not known, the security is higher, and the risk of data leakage of the data side is avoided.
The data side encrypts the sample data to generate an encrypted code of the sample set, which may be to encrypt the entire sample data, so that the overhead of sending the encrypted sample set to the service side by the data side is n ciphertexts. In practical application, according to the security level, the communication overhead can be reduced by using a form of confusion of partial sample data instead of confusion of the whole sample set.
S102, the service party encrypted aggregation value is sent to a first data party in the data parties.
Specifically, the service side encrypts the service side encrypted aggregate value generated in step S101
Figure BDA00032482565600001016
And
Figure BDA00032482565600001017
and sending the data to a first data side in the data sides for decryption.
S103, determining the target optimal splitting characteristic and the corresponding target optimal splitting point according to the maximum value of the business side information gain sent after the first data side decrypts the business side encrypted aggregation value, the corresponding business side splitting characteristic number and the business side splitting point number.
Specifically, after the first data party receives the service party encrypted aggregation value sent by the service party through step S102, decrypting the encrypted aggregate value of the service party, calculating the information gain corresponding to each splitting point (each split) of each characteristic of the service party, further obtain the maximum value of the information gain, i.e. the maximum value of the service side information gain, and the service side split characteristic number and the service side split point number corresponding to the maximum value of the service side information gain, and the maximum value of the service side information gain, and the corresponding service party splitting feature number and the corresponding service party splitting point number are sent to the service party, and the service party receives the maximum value of the information gain of the own party feature sent by the first data party, the corresponding splitting feature number and the corresponding splitting point number, namely the maximum value of the service party information gain, and the corresponding service party splitting feature number and the corresponding service party splitting point number.
Meanwhile, the service side calculates the information gain of each splitting of each characteristic of the data side according to the data side encryption aggregation value of the first-order derivative and the second-order derivative in each data sub-box of the data side, and further obtains the maximum value of the information gain, namely the maximum value of the data side information gain, the data side splitting characteristic number and the data side splitting point number corresponding to the maximum value of the data side information gain, and sends the maximum value of the data side information gain, the corresponding data side splitting characteristic number and the corresponding data side splitting point number to the corresponding data side. And the service party compares the maximum value of the information gain of the service party with the maximum value of the information gain of the data party to obtain a larger value of the two, if the larger value is smaller than a threshold value gamma, the current node is not split, otherwise, the splitting characteristic corresponding to the larger value is determined as the target optimal splitting characteristic, and the corresponding splitting point is determined as the target optimal splitting point.
And S104, acquiring the encryption codes of the sample sets of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point.
Specifically, the target splitting characteristic and the target optimal splitting point obtained in step S103 may belong to two cases, namely, a service party and a data party. And the service party or the data party splits the current leaf node into two new leaf nodes according to the target optimal splitting characteristic and the target optimal splitting point, obtains the encrypted codes of the sample sets of the two leaf nodes obtained after splitting, and sends the encrypted codes of the sample sets of the two leaf nodes to the other party. If the target optimal splitting characteristic belongs to the data side, the data side needs to send the encrypted codes of the sample sets of the two leaf nodes obtained after splitting to the service side, and if the target optimal splitting characteristic belongs to the service side, the service side needs to send the encrypted codes of the sample sets of the two leaf nodes obtained after splitting to the data side. The above process is performed cyclically for each leaf node of the mth tree until all leaf nodes can not be split any more or the depth of the tree reaches the set maximum depth, so that the service side and the data side can acquire the encrypted codes of the sample set of each leaf node of the mth tree.
And S105, calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node.
Specifically, the service side calculates the target weight of each leaf node according to the encrypted codes of the sample set of leaf nodes obtained in step S104 and the public key of the data side.
And S106, calculating according to the target weight to obtain a gradient lifting tree model.
Specifically, a gradient lifting tree model is calculated according to the target weight obtained in step S105.
In the method for generating a gradient spanning tree model according to the embodiment of the application, a service side generates a service side encryption aggregation value of a first derivative and a second derivative in each data sub-box according to a data side public key, encryption coding of a sample set of a current leaf node of an M-1 tree, a first derivative and a second derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M, sends the service side encryption aggregation value to a first data side in the data side, determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of service side information gain sent after the first data side decrypts the service side encryption aggregation value, a corresponding service side splitting characteristic number and a corresponding service side splitting point number, obtains the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, and obtains encryption coding of the sample set of two leaf nodes after splitting, and calculating the target weight of each leaf node according to the encryption codes of the sample set of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. And (3) by utilizing a homomorphic encryption technology, keeping a sample set held by a data party secret for a service party, and ensuring that the service party can correctly calculate the information gain corresponding to each splitting point of each characteristic under the condition of encryption, thereby completing the establishment of the model. The model generation method can ensure that the splitting characteristics and the meaning of the data party disclosed to the service party still cannot reveal the data privacy of the data party, the gradient lifting tree model can be generated in a lossless mode, and meanwhile the interpretability of the gradient lifting tree model is enhanced.
Fig. 2 is a schematic flow chart of a method for generating a gradient lifting tree model according to another embodiment of the present application. As shown in fig. 2, on the basis of the embodiment shown in fig. 1, the method for generating a gradient-lifting tree model in the embodiment of the present application specifically includes the following steps:
s201, generating a business side encryption aggregation value of the first-order derivative and the second-order derivative in each data sub-box according to the public key of the data side, the encryption code of the sample set of the current leaf node of the (M-1) th tree, and the first-order derivative and the second-order derivative of the predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M.
S202, the service party encryption aggregation value is sent to a first data party in the data parties.
Specifically, steps S201 to S202 are similar to steps S101 to S102 in the above embodiment, and are not described again here.
The step S103 in the above embodiment may specifically include the following steps S203 to S207.
S203, according to the maximum value of the business side information gain, the business side splitting feature number and the business side splitting point number, the business side optimal splitting feature, the business side optimal splitting point and the corresponding business side maximum information gain are determined.
Specifically, the service side determines the maximum value of the service side information gain sent by the first data side as the maximum information gain of the service side, the service side determines the service side splitting characteristic corresponding to the service side splitting characteristic number sent by the first data side as the optimal splitting characteristic of the service side, and the service side determines the service side splitting point corresponding to the service side splitting point number sent by the first data side as the optimal splitting point of the service side.
And S204, acquiring the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the corresponding maximum information gain of the data side.
Specifically, the service side determines the maximum value of the data side information gain obtained by the service side through calculation as the maximum information gain of the data side, the service side determines the data side splitting characteristic corresponding to the obtained data side splitting characteristic number as the optimal data side splitting characteristic, and the service side determines the data side splitting point corresponding to the obtained data side splitting point number as the optimal data side splitting point.
S205, determine the larger value of the maximum information gain of the service side and the maximum information gain of the data side as the target maximum information gain.
Specifically, the service side compares the maximum information gain of the service side obtained in step S203 with the maximum information gain of the data side obtained in step S204, and determines the larger value of the two as the target maximum information gain.
And S206, determining the optimal splitting characteristic corresponding to the maximum information gain of the target as the optimal splitting characteristic of the target.
Specifically, the service side determines the optimal splitting characteristic corresponding to the target maximum information gain determined in step S205 as the target optimal splitting characteristic.
And S207, determining the optimal splitting point corresponding to the target maximum information gain as the target optimal splitting point.
Specifically, the service side determines the optimal splitting point corresponding to the target maximum information gain determined in step S205 as the target optimal splitting point.
And S208, acquiring the encryption codes of the sample sets of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point.
S209, calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node.
And S210, calculating according to the target weight to obtain a gradient lifting tree model.
Specifically, steps S208 to S210 are similar to steps S104 to S106 in the above embodiment, and the detailed process is not described here again.
In the method for generating a gradient spanning tree model according to the embodiment of the application, a service side generates a service side encryption aggregation value of a first derivative and a second derivative in each data sub-box according to a data side public key, encryption coding of a sample set of a current leaf node of an M-1 tree, a first derivative and a second derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M, sends the service side encryption aggregation value to a first data side in the data side, determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of service side information gain sent after the first data side decrypts the service side encryption aggregation value, a corresponding service side splitting characteristic number and a corresponding service side splitting point number, obtains the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, and obtains encryption coding of the sample set of two leaf nodes after splitting, and calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. And (3) by utilizing a homomorphic encryption technology, keeping a sample set held by a data party secret for a service party, and ensuring that the service party can correctly calculate the information gain corresponding to each splitting point of each characteristic under the condition of encryption, thereby completing the establishment of the model. The model generation method can ensure that the splitting characteristics and the meaning of the data party disclosed to the service party still cannot reveal the data privacy of the data party, the gradient lifting tree model can be generated in a lossless mode, and meanwhile the interpretability of the gradient lifting tree model is enhanced.
As a possible implementation manner, on the basis of the foregoing embodiment, as shown in fig. 3, the step S204 of "obtaining the optimal splitting characteristic of the data side, the optimal splitting point of the data side, and the corresponding maximum information gain of the data side" may specifically include the following steps:
s301, after homomorphic encryption is carried out on the first derivative and the second derivative, an encrypted first derivative and an encrypted second derivative are obtained.
Specifically, the service side calculates the predicted value of each piece of wind control data after iteration for m-1 times
Figure BDA0003248256560000141
First derivative of
Figure BDA0003248256560000142
And second derivative
Figure BDA0003248256560000143
For the first derivative giAnd second derivative hiPerforming homomorphic encryption to obtain encrypted first derivative
Figure BDA0003248256560000144
And second order encryption
Figure BDA0003248256560000145
Wherein the content of the first and second substances,
Figure BDA0003248256560000146
the encryption operation is performed by using a public key of a homomorphic encryption system generated by a service party, namely, the public key of the service party.
And S302, sending the encrypted first-order derivative and the encrypted second-order derivative to a data side.
Specifically, the service side encrypts the first derivative obtained in step S301
Figure BDA0003248256560000147
And encrypting the second derivative
Figure BDA0003248256560000148
And sending the data to each data party.
And S303, receiving a data party encryption aggregation value of a first-order derivative and a second-order derivative in each data sub-box sent by a data party, wherein the data party encryption aggregation value is generated by the data party according to a service party public key, a sample set of a current leaf node, an encryption first-order derivative and an encryption second-order derivative.
Specifically, the service side receives the data side encryption aggregation value of the first derivative and the second derivative in each data sub-box sent by the data side
Figure BDA0003248256560000149
And
Figure BDA00032482565600001410
the data side calculates to obtain the data side encryption aggregation value
Figure BDA00032482565600001411
And
Figure BDA00032482565600001412
refer to the description of the embodiment shown in fig. 1, and are not repeated herein.
S304, carrying out decryption calculation on the encrypted aggregation value of the data side to obtain the maximum value of the information gain of the data side, and the corresponding data side splitting characteristic number and the data side splitting point number.
Specifically, the service side performs decryption calculation on the data side encryption aggregation value received in step S304 to obtain the maximum value of the data side information gain, and the corresponding data side split feature number and data side split point number.
S305, the maximum value of the data side information gain, the corresponding data side splitting characteristic number and the data side splitting point number are sent to the data side.
Specifically, the service side sends the maximum value of the data side information gain obtained in step S304, and the corresponding data side split feature number and data side split feature number to the data side.
And S306, receiving the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side sent by the data side, wherein the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side are determined by the data side according to the maximum value of the information gain of the data side, the splitting characteristic number of the data side and the splitting point number of the data side.
Specifically, the data side determines the maximum value of the data side information gain sent by the service side as the maximum information gain of the data side, the data side determines the data side splitting characteristic corresponding to the data side splitting characteristic number sent by the service side as the optimal splitting characteristic of the data side, the data side determines the data side splitting point corresponding to the data side splitting point number sent by the service side as the optimal splitting point of the data side, and sends the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side to the service side. And the service party receives the optimal splitting characteristic of the data party, the optimal splitting point of the data party and the maximum gain of the data party, which are sent by the data party.
As a feasible implementation manner, on the basis of the foregoing embodiment, in the step S208, "obtaining the encryption code of the sample set of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point" may specifically include two cases that the following target optimal splitting characteristic belongs to a service party or a data party:
in the first case, if the target optimal splitting characteristic belongs to the service party, calculating a sample set of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, encrypting the sample set of the two leaf nodes by using a data party public key to obtain an encrypted code of the sample set of the two leaf nodes, and sending the encrypted code of the sample set of the two leaf nodes to the data party.
Specifically, if the target optimal splitting characteristic belongs to the service side, the service side needs to calculate a sample set of two leaf nodes obtained after splitting, where the two leaf nodes are obtained after the current leaf node is split based on the target optimal splitting characteristic and the target optimal splitting point, and in addition, in the sample set of the two leaf nodes, the left side is a sample set whose corresponding characteristic is less than or equal to a threshold, and the right side is a sample set whose corresponding characteristic is greater than the threshold.
And the service party encrypts the sample sets of the two leaf nodes by using the public key of the data party to obtain the encrypted codes of the sample sets of the two leaf nodes, and sends the encrypted codes to the data party. Taking the left leaf node as an example, its samplesThe calculation method of the set is IL=I∩I′LWherein l'LAnd representing the sample sets of which the corresponding features are less than or equal to the threshold value in all the sample sets. Since the target optimal splitting characteristic is owned by the business side, the business side can directly obtain I'LOf encoded plaintext pi'LBy using
Figure BDA0003248256560000151
Calculating the code pi of the sample set of the left leaf nodeLOf the encrypted value, wherein the symbol
Figure BDA0003248256560000157
Representing multiplication by elements, and, likewise, being obtained by calculation
Figure BDA0003248256560000152
The data side receiving the transmission of the service side
Figure BDA0003248256560000153
And
Figure BDA0003248256560000154
obtaining respective sample sets I of two corresponding leaf nodes through decryptionLAnd IR
In the second case, the optimal target splitting characteristic belongs to the data side, and then the encrypted codes of the sample sets of the two leaf nodes sent by the data side are received, the encrypted codes of the sample sets of the two leaf nodes calculate the current leaf node based on the optimal target splitting characteristic and the optimal target splitting point for the data side, the sample sets of the two leaf nodes are obtained after splitting, and the sample sets of the two leaf nodes are encrypted by adopting a public key of the data side.
Specifically, if the target optimal splitting characteristic belongs to a data side, the data side calculates to obtain a sample set of two leaf nodes obtained after splitting, wherein the two leaf nodes obtained after splitting are the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, the two leaf nodes obtained after splitting are encrypted by the data side through a data side public key and are sent to a service side, and the service side receives encrypted codes of the sample set of the two leaf nodes.
For example, the data side computes a sample set of two leaf nodes resulting after splitting, left side ILIs a sample set with corresponding features less than or equal to a threshold value in the sample set I, and the right side IRFor the sample set with the corresponding characteristics larger than the threshold value in the I, obtaining an encrypted code after encrypting by using a public key of a data party
Figure BDA0003248256560000155
And
Figure BDA0003248256560000156
and sent to the service party.
As a possible implementation manner, on the basis of the foregoing embodiment, as shown in fig. 4, the step S209 of "calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node" may specifically include the following steps:
s401, generating a data side encryption aggregation value of each leaf node according to the encryption code, the data side public key, the first derivative and the second derivative of the sample set of each leaf node.
Specifically, the service side is based on the encryption coding of the sample set of each leaf node j
Figure BDA0003248256560000161
Public key of data side, first derivative giAnd second derivative hiI-1, … …, n, generating a data side encrypted aggregate value for each leaf node
Figure BDA0003248256560000162
And
Figure BDA0003248256560000163
the calculation method is as follows:
Figure BDA0003248256560000164
Figure BDA0003248256560000165
wherein pii(j) Is the ith element of the vector pi (j), and pi (j) is the code of the sample set of the jth leaf node.
S402, the data side encryption aggregation value is sent to the first data side.
Specifically, the service side sends the data side encryption aggregation value obtained in step S401 to the first data side.
And S403, receiving the data party decryption aggregation value and the corresponding number which are sent after the first data party decrypts the data party encryption aggregation value.
Specifically, the first data party receives the data party encrypted aggregation value sent by the service party in step S402, decrypts the data party encrypted aggregation value to obtain a data party decrypted aggregation value and a corresponding number, and sends the data party decrypted aggregation value and the corresponding number to the service party, and the service party receives the data party decrypted aggregation value and the corresponding number sent by the first data party.
S404, calculating to obtain the target weight according to the decryption aggregation value and the corresponding number of the data side.
Specifically, the service side calculates the optimal weight of the leaf node, i.e. the target weight, according to the decryption aggregation value and the corresponding number of the data side received in step S403
Figure BDA0003248256560000166
The calculation method is as follows:
Figure BDA0003248256560000167
it should be noted here that, for each leaf node of the mth tree, each participant needs to determine whether or not the leaf node needs to be split, how to split, until all leaf nodes can not be split any more or until the tree is not split any moreThe service side calculates the target weight of each leaf node, and constructs the mth tree according to the target weight. To protect the security of the traffic label, the plain text of the weights of the leaf nodes are all at the traffic, so the first and second derivatives giAnd hiOnly the business party can compute.
As a possible implementation manner, on the basis of the above embodiment, as shown in fig. 5, the step S210 of "calculating a gradient lifting tree model according to the target weight" may specifically include the following steps:
s501, constructing the mth tree according to the target weight.
Specifically, the parameters of the mth tree are updated according to the target weight, and the construction of the mth tree is completed.
S502, updating the predicted value of the mth sub-model to each piece of wind control data, wherein the mth sub-model comprises the 1 st tree to the mth tree.
Specifically, the predicted value of the mth sub-model established currently to each piece of wind control data is updated
Figure BDA0003248256560000171
Figure BDA0003248256560000172
Wherein the mth submodel fmIncluding the 1 st tree through the m-th tree.
M sub-model fmThe established goal of (1) is to minimize the Loss function Loss:
Figure BDA0003248256560000173
wherein the content of the first and second substances,
Figure BDA0003248256560000174
for the output of a model consisting of m-1 random forests that has been constructed previously,
Figure BDA0003248256560000175
is a vector
Figure BDA0003248256560000176
Is, { xiI 1, …, n is a training data set,
Figure BDA0003248256560000177
as a loss function, e.g. as a Mean Square Error (MSE) loss function
Figure BDA0003248256560000178
For fast calculation a second order approximation of the loss function can be used, i.e.
Figure BDA0003248256560000179
Each node split of each tree is determined by the information Gain calculated from all split points of all features:
Figure BDA00032482565600001710
wherein G isLAnd GRAggregating values for the gradient of the left and right nodes after splitting, i.e.
Figure BDA00032482565600001711
Figure BDA00032482565600001712
And S503, calculating according to the M submodels to obtain a gradient lifting tree model.
Specifically, the gradient lifting tree model is obtained by calculation according to the updated M submodels, and the calculation method is as follows:
Figure BDA00032482565600001713
in the method for generating a gradient spanning tree model according to the embodiment of the application, a service side generates a service side encryption aggregation value of a first derivative and a second derivative in each data sub-box according to a data side public key, encryption coding of a sample set of a current leaf node of an M-1 tree, a first derivative and a second derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M, sends the service side encryption aggregation value to a first data side in the data side, determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of service side information gain sent after the first data side decrypts the service side encryption aggregation value, a corresponding service side splitting characteristic number and a corresponding service side splitting point number, obtains the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, and obtains encryption coding of the sample set of two leaf nodes after splitting, and calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. And (3) by utilizing a homomorphic encryption technology, keeping a sample set held by a data party secret for a service party, and ensuring that the service party can correctly calculate the information gain corresponding to each splitting point of each characteristic under the condition of encryption, thereby completing the establishment of the model. The model generation method can ensure that the splitting characteristics and the meaning of the data party disclosed to the service party still cannot reveal the data privacy of the data party, the gradient lifting tree model can be generated in a lossless mode, and meanwhile the interpretability of the gradient lifting tree model is enhanced.
Fig. 6 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present application. The method for generating a gradient lifting tree model according to the embodiment of the present application may be executed by the apparatus for generating a gradient lifting tree model provided in the embodiment of the present application, and the apparatus for generating a gradient lifting tree model may be disposed in the electronic device of the first data party. As shown in fig. 6, the method for generating a gradient spanning tree model according to the embodiment of the present application may specifically include the following steps:
s601, receiving a service party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a service party, wherein the service party encrypted aggregate value is generated by the service party according to a public key of the data party, an encrypted code of a sample set of a current leaf node of an M-1 th tree, and the first-order derivative and the second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, and M is smaller than a preset number M.
S602, carrying out decryption calculation on the encrypted aggregation value of the service party to obtain the maximum value of the information gain of the service party, and the corresponding service party splitting characteristic number and the corresponding service party splitting point number.
S603, the maximum value of the information gain of the service party, the splitting feature number of the service party and the splitting point number of the service party are sent to the service party, so that the service party can determine the target optimal splitting feature and the corresponding target optimal splitting point according to the maximum value of the information gain of the service party, the splitting feature number of the service party and the splitting point number of the service party, obtain the encryption codes of the sample sets of the two leaf nodes obtained after splitting based on the target optimal splitting feature and the target optimal splitting point of the current leaf node, calculate the target weight of each leaf node according to the encryption codes of the sample set of each leaf node, and calculate and obtain the gradient lifting tree model according to the target weights.
It should be noted here that the above explanation of the embodiment of the gradient lifting tree model generation method is also applicable to the gradient lifting tree model generation method in the embodiment of the present application, and the specific process is not described here again.
The method for generating a gradient lifting tree model according to the embodiment of the application includes that a first data party receives a business party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a business party, the business party encrypted aggregate value is generated by the business party according to a public key of the data party, encrypted codes of a sample set of current leaf nodes of M-1 trees and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, M is smaller than a preset number M, decryption calculation is conducted on the business party encrypted aggregate value to obtain a maximum value of business party information gain, a corresponding business party splitting characteristic number and a corresponding business party splitting point number, and the maximum value of the business party information gain, the business party splitting characteristic number and the business party splitting point number are sent to the business party for the business party to send the maximum value of the business party information gain, the business party encrypted aggregate value and the second-order derivative to the business party according to the maximum value of the business party information gain, Determining a target optimal splitting characteristic and a corresponding target optimal splitting point by the service party splitting characteristic number and the service party splitting point number, acquiring encryption codes of a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculating a target weight of each leaf node according to the encryption codes of the sample set of each leaf node, and calculating according to the target weight to obtain a gradient lifting tree model. By sending the encryption codes of the sample set to the business side, the data security of the sample set is guaranteed, the characteristic meaning of the data side is disclosed to the business side on the premise, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on federal learning is enhanced.
As a possible implementation manner, on the basis of the foregoing embodiment, as shown in fig. 7, the method for generating a gradient lifting tree model provided in this embodiment of the present application may further include the following steps:
s701, generating a data party public key.
Specifically, the first data party generates a data party public key by using a homomorphic encryption technology.
S702, the public key of the data party is sent to the service party.
Specifically, the first data party sends the data party public key to the service party, and the first data party public key can be used for encrypted transmission of intermediate data in the model generation process, so that the security of the intermediate value is protected, and data leakage is prevented. In the embodiment of the present application,
Figure BDA0003248256560000191
for cryptographic operations with the public key of a homomorphic cryptographic system generated by the data side, the same applies
Figure BDA0003248256560000192
The method represents that the public key of a homomorphic encryption system generated by a service party is used for encryption operation, and homomorphic encryption technology can perform addition and multiplication operation based on ciphertext, so that the service party can still correctly calculate the information gain corresponding to each splitting point of each feature even if a sample set under a node held by the data party keeps the secrecy of the service party。
As a possible implementation manner, on the basis of the foregoing embodiment, as shown in fig. 8, the method for generating a gradient lifting tree model provided in this embodiment of the present application may further include the following steps:
s801, receiving the encrypted first derivative and the encrypted second derivative sent by the service party, wherein the encrypted first derivative and the encrypted second derivative are obtained by the service party through homomorphic encryption of the first derivative and the second derivative.
S802, generating a data side encryption aggregation value of the first derivative and the second derivative in each data sub-box according to the public key of the service side, the sample set of the current leaf node, the encryption first derivative and the encryption second derivative.
Specifically, the first data party encrypts the first derivative according to the public key of the service party, the sample set I of the current leaf node
Figure BDA0003248256560000193
And encrypting the second derivative
Figure BDA0003248256560000194
And generating a data side encryption aggregation value of the first derivative and the second derivative in each data sub-box, wherein the calculation method comprises the following steps:
Figure BDA0003248256560000201
Figure BDA0003248256560000202
where π is the code of the sample set I, πbinBinning sample sets I for current databinIs coded, i.e.
Figure BDA0003248256560000203
And represents that the ith sample is in sample set I and belongs to the current data bin.
And S803, sending the encrypted aggregation value of the data side to the service side.
S804, receiving the maximum value of the data side information gain sent by the service side after decrypting the data side encrypted aggregation value, and the corresponding data side split characteristic number and the data side split point number.
And S805, determining the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side according to the maximum value of the information gain of the data side, the splitting characteristic number of the data side and the splitting point number of the data side.
And S806, sending the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side to the service side.
It should be noted here that the above explanation of the embodiment of the gradient lifting tree model generation method is also applicable to the gradient lifting tree model generation method in the embodiment of the present application, and the specific process is not described here again.
As a possible implementation manner, on the basis of the foregoing embodiment, the method for generating a gradient-enhanced tree model provided in this embodiment of the present application may further include the following steps: if the target optimal splitting characteristic belongs to a service party, receiving encrypted codes of sample sets of two leaf nodes sent by the service party, calculating the target optimal splitting characteristic and the target optimal splitting point of the current leaf node for the service party by the encrypted codes of the sample sets of the two leaf nodes, obtaining the sample sets of the two leaf nodes after splitting, and encrypting the sample sets of the two leaf nodes by adopting a data party public key; and if the target optimal splitting characteristic belongs to a data party, calculating a sample set of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, encrypting the sample set of the two leaf nodes by adopting a public key of the data party to obtain encrypted codes of the sample set of the two leaf nodes, and sending the encrypted codes of the sample set of the two leaf nodes to a service party.
It should be noted here that the above explanation of the embodiment of the gradient lifting tree model generation method is also applicable to the gradient lifting tree model generation method in the embodiment of the present application, and the specific process is not described here again.
As a possible implementation manner, on the basis of the foregoing embodiment, as shown in fig. 9, the method for generating a gradient lifting tree model provided in this embodiment of the present application may further include the following steps:
and S901, receiving a data party encrypted aggregate value of each leaf node sent by a service party, wherein the data party encrypted aggregate value is generated by the service party according to the encrypted code of the sample set of each leaf node, a data party public key, a first derivative and a second derivative.
And S902, carrying out decryption calculation on the encrypted aggregation value of the data side to obtain the decrypted aggregation value of the data side and a corresponding serial number.
And S903, sending the decryption aggregation value of the data side and the corresponding number to the service side so that the service side can calculate to obtain the target weight according to the decryption aggregation value of the data side and the corresponding number.
It should be noted here that the above explanation of the embodiment of the gradient lifting tree model generation method is also applicable to the gradient lifting tree model generation method in the embodiment of the present application, and the specific process is not described here again.
The method for generating a gradient lifting tree model according to the embodiment of the application includes that a first data party receives a business party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a business party, the business party encrypted aggregate value is generated by the business party according to a public key of the data party, encrypted codes of a sample set of current leaf nodes of M-1 trees and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, M is smaller than a preset number M, decryption calculation is conducted on the business party encrypted aggregate value to obtain a maximum value of business party information gain, a corresponding business party splitting characteristic number and a corresponding business party splitting point number, and the maximum value of the business party information gain, the business party splitting characteristic number and the business party splitting point number are sent to the business party for the business party to send the maximum value of the business party information gain, the business party encrypted aggregate value and the second-order derivative to the business party according to the maximum value of the business party information gain, Determining a target optimal splitting characteristic and a corresponding target optimal splitting point by the service party splitting characteristic number and the service party splitting point number, acquiring encryption codes of a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculating a target weight of each leaf node according to the encryption codes of the sample set of each leaf node, and calculating according to the target weight to obtain a gradient lifting tree model. By sending the encryption codes of the sample set to the business side, the data security of the sample set is guaranteed, the characteristic meaning of the data side is disclosed to the business side on the premise, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on federal learning is enhanced.
Fig. 10 is a flowchart illustrating a method for generating a gradient spanning tree model according to another embodiment of the present application. The method for generating a gradient lifting tree model according to the embodiment of the present application may be executed by the apparatus for generating a gradient lifting tree model provided in the embodiment of the present application, and the apparatus for generating a gradient lifting tree model may be disposed in the electronic device of the second data party. As shown in fig. 10, the method for generating a gradient spanning tree model according to the embodiment of the present application may specifically include the following steps:
s1001, receiving an encrypted first derivative and an encrypted second derivative sent by a service party, wherein the encrypted first derivative and the encrypted second derivative are obtained by the service party through homomorphic encryption of the first derivative and the second derivative.
And S1002, generating a data side encryption aggregation value of the first derivative and the second derivative in each data sub-box according to the public key of the service side, the sample set of the current leaf node, the encryption first derivative and the encryption second derivative.
And S1003, sending the encrypted aggregation value of the data side to the service side.
And S1004, receiving the maximum value of the data side information gain sent after the service side decrypts the data side encrypted aggregation value, and the corresponding data side split characteristic number and the data side split point number.
S1005, determining the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side according to the maximum value of the information gain of the data side, the splitting characteristic number of the data side and the splitting point number of the data side.
S1006, the data side optimal splitting feature, the data side optimal splitting point and the data side maximum information gain are sent to the service side, so that the service side can determine the target optimal splitting feature and the corresponding target optimal splitting point according to the data side optimal splitting feature, the data side optimal splitting point, the data side maximum information gain, the maximum value of the service side information gain, the service side splitting feature number and the service side splitting point number, obtain the encryption codes of the sample sets of the two leaf nodes obtained after the splitting of the current leaf node based on the target optimal splitting feature and the target optimal splitting point, calculate the target weight of each leaf node according to the encryption codes of the sample sets of each leaf node, calculate the gradient lifting tree model according to the target weight, and obtain the maximum value of the service side information gain, the service side splitting feature number and the service side splitting point number, and the first data side sends the first data side to the service side, and the first and second derivatives of each data split box in the service side by the first data side And the service party encrypts the aggregation value to obtain the encryption aggregation value through decryption calculation.
It should be noted here that the above explanation of the embodiment of the gradient lifting tree model generation method is also applicable to the gradient lifting tree model generation method in the embodiment of the present application, and the specific process is not described here again.
As a possible implementation manner, on the basis of the foregoing embodiment, the method for generating a gradient lifting tree according to the embodiment of the present application may further include the following steps: if the target optimal splitting characteristic belongs to a service party, receiving encrypted codes of sample sets of two leaf nodes sent by the service party, calculating the target optimal splitting characteristic and the target optimal splitting point of the current leaf node for the service party by the encrypted codes of the sample sets of the two leaf nodes, obtaining the sample sets of the two leaf nodes after splitting, and encrypting the sample sets of the two leaf nodes by adopting a data party public key; and if the target optimal splitting characteristic belongs to a data party, calculating a sample set of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, encrypting the sample set of the two leaf nodes by adopting a public key of the data party to obtain encrypted codes of the sample set of the two leaf nodes, and sending the encrypted codes of the sample set of the two leaf nodes to a service party.
According to the generation method of the gradient lifting tree model, a second data party receives an encrypted first-order derivative and an encrypted second-order derivative which are sent by a service party, the encrypted first-order derivative and the encrypted second-order derivative are obtained by the service party through homomorphic encryption of the first-order derivative and the second-order derivative, a data party encrypted aggregation value of the first-order derivative and the second-order derivative in each data sub-box is generated according to a public key of the service party, a sample set of a current leaf node, the encrypted first-order derivative and the encrypted second-order derivative, the data party encrypted aggregation value is sent to the service party, the maximum value of data party information gain sent after the service party decrypts the encrypted aggregation value of the data party is received, and the corresponding data party splitting characteristic number and data party splitting point number, and the optimal splitting characteristic number of the data party is determined according to the maximum value of the data party information gain, the data party splitting characteristic number and the data party splitting point number, The data side optimal splitting point and the data side maximum information gain are sent to a service side, so that the service side can determine a target optimal splitting characteristic and a corresponding target optimal splitting point according to the data side optimal splitting characteristic, the data side optimal splitting point, the data side maximum information gain, the maximum value of the service side information gain, a service side splitting characteristic number and the service side splitting point number, obtain encrypted codes of a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculate a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and obtain a gradient lifting tree model according to the target weight calculation, and calculate the maximum value of the service side information gain, The service party splitting characteristic number and the service party splitting point number are obtained by carrying out decryption calculation on a service party encryption aggregation value of a first-order derivative and a second-order derivative in each data sub-box sent by the service party by the first data party. The characteristic meaning of the data side is disclosed to the business side on the premise of hiding the sample set, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on the federal learning is enhanced.
For clarity of the generation method of the gradient lifting tree model in the embodiment of the present application, the following description is made in detail with reference to fig. 11. As shown in fig. 11, the method for generating a gradient spanning tree model according to the embodiment of the present application may specifically include the following steps:
s1101, aligning the sample sets of the service side and the data side by using the PSI technique.
And S1102, the service party generates a service party key and a public key which are encrypted in a homomorphic way, the service party public key is sent to the data party, the first data party generates a data party key and a public key, and the data party public key is sent to the service party.
S1103, the service side performs binning on each feature dimension k of each piece of wind control data in the sample set to obtain L quantiles of the features, and the L quantiles serve as threshold candidates to be split.
And S1104, the business side calculates a first derivative and a second derivative of the predicted value after each piece of wind control data iterates for m-1 times, and transmits the first derivative and the second derivative to each data side after homomorphic encryption.
Each participant executes the following steps to judge the splitting condition of each leaf node of the current tree, and the specific process is as follows:
s1105, according to L threshold values SkAll sample sets were divided into L-1 bin intervals.
If the current participant is the business party, executing the steps S1106-S1109, and if the current participant is the data party, executing the steps S1110-S1114.
And S1106, generating a business side encryption aggregation value of the first derivative and the second derivative in each data sub-box according to the public key of the data side, the encryption code of the sample set of the current leaf node of the (M-1) th tree, and the first derivative and the second derivative of the predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M.
S1107, send the service party encrypted aggregated value to the first data party of the data parties.
S1108, the first data side decrypts the encrypted aggregation value of the service side, calculates and sends the maximum value of the information gain of the service side, the corresponding service side split characteristic number and the corresponding service side split point number to the service side.
S1109, the service side determines the target optimal splitting characteristic and the corresponding target optimal splitting point according to the maximum value of the service side information gain, the corresponding service side splitting characteristic number and the corresponding service side splitting point number.
And S1110, the data side generates data side encryption aggregation values of the first-order derivatives and the second-order derivatives in each data sub-box according to the public key of the service side, the sample set of the current leaf node, the encryption first-order derivatives and the encryption second-order derivatives.
And S1111, sending the encrypted aggregation value of the data side to the service side.
S1112, the service side decrypts the encrypted aggregation value of the data side, calculates and sends the maximum value of the data side information gain, and the corresponding data side split feature number and data side split point number to the data side.
And S1113, determining the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side by the data side according to the maximum value of the information gain of the data side, the splitting characteristic number of the data side and the splitting point number of the data side.
S1114, sending the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side to the service side.
S1115, comparing the maximum information gain of the service side with the maximum information gain of the data side, and determining that the larger value of the two is the target maximum information gain, and the corresponding characteristic is the target optimal splitting characteristic.
And S1116, if the optimal splitting characteristic belongs to the service party, the service party calculates the encrypted value of the sample set of the two leaf nodes obtained after splitting, and sends the encrypted value to each data party.
And S1117, if the optimal splitting characteristic belongs to the data side, the data side calculates the sample sets of the two leaf nodes obtained after splitting, encrypts and sends the sample sets to the service side.
S1118, whether all leaf nodes are split or not or whether the depth of the tree reaches the set maximum depth is judged. If so, go to step S1119, otherwise, go back to step S1105.
S1119, calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node.
And S1120, calculating according to the target weight to obtain a gradient lifting tree model.
In order to implement the foregoing embodiment, an apparatus for generating a gradient lifting tree model is further provided in the embodiment of the present application. Fig. 12 is a schematic structural diagram of a gradient lifting tree model generation apparatus according to an embodiment of the present application. As shown in fig. 12, the apparatus 1200 for generating a gradient spanning tree model according to the embodiment of the present application may specifically include: a first generation module 1201, a first sending module 1202, a first determination module 1203, an obtaining module 1204, a first calculation module 1205, and a second calculation module 1206.
The first generation module 1201 is configured to generate a business side encrypted aggregate value of the first order derivative and the second order derivative in each data bin according to a data side public key, encrypted codes of a sample set of a current leaf node of an M-1 th tree, and a first order derivative and a second order derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M.
A first sending module 1202 configured to send the traffic side encrypted aggregated value to a first one of the data sides.
The first determining module 1203 is configured to determine the target optimal splitting characteristic and the corresponding target optimal splitting point according to the maximum value of the business side information gain sent after the first data side decrypts the business side encrypted aggregation value, and the corresponding business side splitting characteristic number and the business side splitting point number.
An obtaining module 1204 is configured to obtain the encrypted codes of the sample sets of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point.
A first calculating module 1205 configured to calculate the target weight of each leaf node according to the cryptographic coding of the sample set of each leaf node.
A second calculation module 1206 configured to calculate a gradient lifting tree model according to the target weight.
In one embodiment of the present application, the public key of the data party is generated by the first data party.
In an embodiment of the present application, the first determining module 1203 may specifically include: the first determining unit is configured to determine the optimal splitting characteristic of the service party, the optimal splitting point of the service party and the corresponding maximum information gain of the service party according to the maximum value of the information gain of the service party, the splitting characteristic number of the service party and the splitting point number of the service party; an acquisition unit configured to acquire a data side optimal splitting characteristic, a data side optimal splitting point, and a corresponding data side maximum information gain; a second determination unit configured to determine a larger value of the traffic side maximum information gain and the data side maximum information gain as a target maximum information gain; the third determining unit is configured to determine the optimal splitting characteristic corresponding to the target maximum information gain as a target optimal splitting characteristic; and the fourth determining unit is configured to determine the optimal splitting point corresponding to the target maximum information gain as the target optimal splitting point.
In an embodiment of the present application, the obtaining unit may specifically include: the encryption subunit is configured to perform homomorphic encryption on the first derivative and the second derivative to obtain an encrypted first derivative and an encrypted second derivative; a first transmitting subunit configured to transmit the encrypted first-order derivative and the encrypted second-order derivative to a data side; the first receiving subunit is configured to receive a data party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a data party, wherein the data party encrypted aggregate value is generated by the data party according to a service party public key, a sample set of a current leaf node, an encrypted first-order derivative and an encrypted second-order derivative; the decryption subunit is configured to perform decryption calculation on the data side encrypted aggregation value to obtain a maximum value of data side information gain, and a corresponding data side split feature number and a corresponding data side split point number; a second transmitting subunit configured to transmit the maximum value of the data side information gain, and the corresponding data side split feature number and data side split point number to the data side; and the second receiving subunit is configured to receive the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side, which are sent by the data side, wherein the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side are determined by the data side according to the maximum value of the information gain of the data side, the splitting characteristic number of the data side and the splitting point number of the data side.
In an embodiment of the present application, the obtaining module 1204 may be specifically configured to: the first execution unit is configured to calculate a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and a target optimal splitting point if the target optimal splitting characteristic belongs to a service party, encrypt the sample set of the two leaf nodes by adopting a data party public key to obtain encrypted codes of the sample set of the two leaf nodes, and send the encrypted codes of the sample set of the two leaf nodes to the data party; and the second execution unit is configured to receive the encrypted codes of the sample sets of the two leaf nodes sent by the data side if the target optimal splitting characteristic belongs to the data side, calculate the target optimal splitting characteristic and the target optimal splitting point of the current leaf node for the data side by the encrypted codes of the sample sets of the two leaf nodes, obtain the sample sets of the two leaf nodes after splitting, and encrypt the sample sets of the two leaf nodes by adopting a public key of the data side.
In an embodiment of the present application, the first calculation module 1205 may specifically include a generation unit configured to generate a data-side encrypted aggregate value of each leaf node according to the encrypted code, the data-side public key, the first derivative, and the second derivative of the sample set of each leaf node; a transmitting unit configured to transmit the data side encryption aggregation value to the first data side; the receiving unit is configured to receive the data side decryption aggregation value and the corresponding number which are sent after the first data side decrypts the data side encryption aggregation value; and the first calculation unit is configured to calculate the target weight according to the decryption aggregation value of the data side and the corresponding number.
In an embodiment of the present application, the second calculating module 1206 may specifically include a constructing unit configured to construct an mth tree according to the target weight; the updating unit is configured to update the predicted value of the mth sub-model for each piece of wind control data, wherein the mth sub-model comprises 1 st tree to mth tree; and the second calculation unit is configured to calculate a gradient lifting tree model according to the M submodels.
It should be noted that the above explanation of the embodiment of the gradient lifting tree model generation method is also applicable to the gradient lifting tree model generation apparatus of this embodiment, and the specific process is not described herein again.
In the generation apparatus of the gradient spanning tree model in the embodiment of the application, a service side generates a service side encryption aggregation value of a first order derivative and a second order derivative in each data bin according to a data side public key, an encryption code of a sample set of a current leaf node of an M-1 th tree, a first order derivative and a second order derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M, sends the service side encryption aggregation value to a first data side in the data side, determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of a service side information gain sent after the first data side decrypts the service side encryption aggregation value, a corresponding service side splitting characteristic number and a corresponding service side splitting point number, obtains the encryption code of the sample set of two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the current leaf node, and calculating the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and calculating according to the target weight to obtain the gradient lifting tree model. And (3) by utilizing a homomorphic encryption technology, keeping a sample set held by a data party secret for a service party, and ensuring that the service party can correctly calculate the information gain corresponding to each splitting point of each characteristic under the condition of encryption, thereby completing the establishment of the model. The model generation device can ensure that the splitting characteristics and the meaning of the data party disclosed to the service party still cannot reveal the data privacy of the data party, the gradient lifting tree model is generated in a lossless mode, and meanwhile the interpretability of the gradient lifting tree model is enhanced.
In order to implement the foregoing embodiment, an apparatus for generating a gradient lifting tree model is further provided in the embodiment of the present application. Fig. 13 is a schematic structural diagram of a gradient lifting tree model generation apparatus according to another embodiment of the present application. As shown in fig. 13, the apparatus 1300 for generating a gradient spanning tree model according to an embodiment of the present application may specifically include: a first receiving module 1301, a first decrypting module 1302 and a second sending module 1303.
The first receiving module 1301 is configured to receive a service party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a service party, where the service party encrypted aggregate value is generated by the service party according to a data party public key, an encrypted code of a sample set of a current leaf node of an M-1 th tree, and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, and M is smaller than a preset number M.
The first decryption module 1302 is configured to perform decryption calculation on the service party encrypted aggregation value to obtain a maximum value of the service party information gain, and a corresponding service party splitting feature number and a corresponding service party splitting point number.
The second sending module 1303 is configured to send the maximum value of the information gain of the service party, the splitting feature number of the service party, and the splitting point number of the service party to the service party, so that the service party determines an optimal target splitting feature and a corresponding optimal target splitting point according to the maximum value of the information gain of the service party, the splitting feature number of the service party, and the splitting point number of the service party, and obtains an encrypted code of a sample set of two leaf nodes obtained after splitting based on the optimal target splitting feature and the optimal target splitting point of the current leaf node, calculates a target weight of each leaf node according to the encrypted code of the sample set of each leaf node, and calculates a gradient lifting tree model according to the target weight.
In an embodiment of the present application, the gradient lifting tree generating apparatus 1300 of the embodiment of the present application may further include: a second generation module configured to generate a public key of a data party; and the third sending module is configured to send the public key of the data party to the service party.
In an embodiment of the present application, the gradient lifting tree generating apparatus 1300 of the embodiment of the present application may further include: the second receiving module is configured to receive the encrypted first-order derivative and the encrypted second-order derivative sent by the service party, and the encrypted first-order derivative and the encrypted second-order derivative are obtained by the service party after homomorphic encryption is performed on the first-order derivative and the second-order derivative; a third generation module configured to generate a data side encrypted aggregation value of the first derivative and the second derivative in each data bin according to the public key of the service side, the sample set of the current leaf node, the encrypted first derivative and the encrypted second derivative; a fourth sending module configured to send the data side encrypted aggregate value to the service side; the third receiving module is configured to receive the maximum value of the data party information gain sent by the service party after decrypting the data party encrypted aggregation value, and the corresponding data party splitting feature number and the data party splitting point number; the second determining module is configured to determine the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side according to the maximum value of the information gain of the data side, the splitting characteristic number of the data side and the splitting point number of the data side; and the fifth sending module is configured to send the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side to the service side.
In an embodiment of the present application, the gradient lifting tree generating apparatus 1300 of the embodiment of the present application may further include: the fourth receiving module is configured to receive the encrypted codes of the sample sets of the two leaf nodes sent by the service party if the target optimal splitting characteristic belongs to the service party, calculate the target optimal splitting characteristic and the target optimal splitting point of the current leaf node for the service party according to the encrypted codes of the sample sets of the two leaf nodes, obtain the sample sets of the two leaf nodes after splitting, and encrypt the sample sets of the two leaf nodes by adopting a public key of a data party; and the third calculation module is configured to calculate a sample set of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point if the target optimal splitting characteristic belongs to the data side, encrypt the sample set of the two leaf nodes by adopting a public key of the data side to obtain encrypted codes of the sample set of the two leaf nodes, and send the encrypted codes of the sample set of the two leaf nodes to the service side.
In an embodiment of the present application, the gradient lifting tree generating apparatus 1300 of the embodiment of the present application may further include: a fifth receiving module, configured to receive a data side encryption aggregation value of each leaf node sent by a service side, where the data side encryption aggregation value is generated by the service side according to an encryption code of a sample set of each leaf node, a data side public key, a first order derivative, and a second order derivative; the second decryption module is configured to decrypt the encrypted aggregation value of the data side to obtain a decrypted aggregation value of the data side and a corresponding number; and the sixth sending module is configured to send the data side decryption aggregation value and the corresponding number to the service side so that the service side can calculate the target weight according to the data side decryption aggregation value and the corresponding number.
It should be noted that the above explanation of the embodiment of the gradient lifting tree model generation method is also applicable to the gradient lifting tree model generation apparatus of this embodiment, and the specific process is not described herein again.
In the apparatus for generating a gradient spanning tree model in an embodiment of the application, a first data party receives a service party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a service party, the service party encrypted aggregate value is generated by the service party according to a data party public key, an encrypted code of a sample set of a current leaf node of an M-1 th tree, and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, where M is smaller than a preset number M, and the service party encrypted aggregate value is decrypted to obtain a maximum value of a service party information gain, a corresponding service party splitting characteristic number and a corresponding service party splitting point number, and the maximum value of the service party information gain, the service party splitting characteristic number and the service party splitting point number are sent to the service party for the service party to send the maximum value of the service party information gain, the service party encrypted aggregate value and the second-order derivative to the service party according to the maximum value of the service party information gain, Determining a target optimal splitting characteristic and a corresponding target optimal splitting point by the service party splitting characteristic number and the service party splitting point number, acquiring encryption codes of a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculating a target weight of each leaf node according to the encryption codes of the sample set of each leaf node, and calculating according to the target weight to obtain a gradient lifting tree model. By sending the encryption codes of the sample set to the business side, the data security of the sample set is guaranteed, the characteristic meaning of the data side is disclosed to the business side on the premise, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on federal learning is enhanced.
In order to implement the foregoing embodiment, an apparatus for generating a gradient lifting tree model is further provided in the embodiment of the present application. Fig. 14 is a schematic structural diagram of a gradient spanning tree model generating device according to another embodiment of the present application, which is applied to a second data party, and as shown in fig. 14, the gradient spanning tree model generating device 1400 according to the embodiment of the present application may specifically include: a sixth receiving module 1401, a fourth generating module 1402, a seventh transmitting module 1403, a seventh receiving module 1404, a third determining module 1405 and an eighth transmitting module 1406.
A sixth receiving module 1401, configured to receive the encrypted first derivative and the encrypted second derivative sent by the service party, where the encrypted first derivative and the encrypted second derivative are obtained by the service party performing homomorphic encryption on the first derivative and the second derivative.
A fourth generating module 1402 configured to generate a data-side encrypted aggregate value of the first and second derivatives in each data bin according to the public key of the service side, the sample set of the current leaf node, the encrypted first derivative, and the encrypted second derivative.
A seventh sending module 1403 configured to send the data side encrypted aggregation value to the service side.
A seventh receiving module 1404, configured to receive a maximum value of the data side information gain sent after the service side decrypts the data side encrypted aggregation value, and a corresponding data side split feature number and a data side split point number.
A third determination module 1405 configured to determine the data side optimal splitting feature, the data side optimal splitting point, and the data side maximum information gain according to the maximum value of the data side information gain, the data side splitting feature number, and the data side splitting point number.
An eighth sending module 1406 configured to send the optimal splitting characteristic of the data side, the optimal splitting point of the data side, and the maximum information gain of the data side to the service side, so that the service side determines the optimal splitting characteristic of the target and the corresponding optimal splitting point according to the optimal splitting characteristic of the data side, the optimal splitting point of the data side, the maximum information gain of the data side, the maximum value of the information gain of the service side, the splitting characteristic number of the service side, and the splitting point number of the service side, obtains the encryption codes of the sample sets of the two leaf nodes obtained after splitting based on the optimal splitting characteristic of the target and the optimal splitting point of the target of the current leaf node, calculates the target weight of each leaf node according to the encryption codes of the sample set of each leaf node, obtains the gradient lifting tree model according to the target weight calculation, and obtains the maximum value of the information gain of the service side, the splitting characteristic number of the service side, and the splitting point number of the service side from the first data side to the data in each data split box sent by the service side And the business side encryption aggregation values of the first derivative and the second derivative are obtained by decryption calculation.
In an embodiment of the present application, the apparatus 1400 for generating a gradient lifting tree model according to an embodiment of the present application may further include: the eighth receiving module is configured to receive the encrypted codes of the sample sets of the two leaf nodes sent by the service party if the target optimal splitting characteristic belongs to the service party, calculate the target optimal splitting characteristic and the target optimal splitting point of the current leaf node for the service party according to the encrypted codes of the sample sets of the two leaf nodes, obtain the sample sets of the two leaf nodes after splitting, and encrypt the sample sets of the two leaf nodes by adopting a public key of a data party; and the fourth calculation module is configured to calculate a sample set of the two leaf nodes obtained after the splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point if the target optimal splitting characteristic belongs to the data side, encrypt the sample set of the two leaf nodes by using a public key of the data side to obtain encrypted codes of the sample set of the two leaf nodes, and send the encrypted codes of the sample set of the two leaf nodes to the service side.
It should be noted that the above explanation of the embodiment of the gradient lifting tree model generation method is also applicable to the gradient lifting tree model generation apparatus of this embodiment, and the specific process is not described herein again.
In the apparatus for generating a gradient spanning tree model according to the embodiment of the present application, a second data party receives an encrypted first-order derivative and an encrypted second-order derivative sent by a service party, where the encrypted first-order derivative and the encrypted second-order derivative are obtained by the service party by performing homomorphic encryption on the first-order derivative and the second-order derivative, and according to a public key of the service party, a sample set of a current leaf node, the encrypted first-order derivative and the encrypted second-order derivative, a data party encrypted aggregation value of the first-order derivative and the second-order derivative in each data bin is generated, the data party encrypted aggregation value is sent to the service party, a maximum value of data party information gain sent by the service party after decrypting the data party encrypted aggregation value is received, and corresponding data party split feature numbers and data party split point numbers, and according to the maximum value of data party information gain, the data party split feature number and the data party split point number, an optimal split feature number of the data party is determined, The data side optimal splitting point and the data side maximum information gain are sent to a service side, so that the service side can determine a target optimal splitting characteristic and a corresponding target optimal splitting point according to the data side optimal splitting characteristic, the data side optimal splitting point, the data side maximum information gain, the maximum value of the service side information gain, a service side splitting characteristic number and the service side splitting point number, obtain encrypted codes of a sample set of two leaf nodes obtained after splitting of a current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculate a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and obtain a gradient lifting tree model according to the target weight calculation, and calculate the maximum value of the service side information gain, The service party splitting characteristic number and the service party splitting point number are obtained by carrying out decryption calculation on a service party encryption aggregation value of a first-order derivative and a second-order derivative in each data sub-box sent by the service party by the first data party. The characteristic meaning of the data side is disclosed to the business side on the premise of hiding the sample set, the problem of data leakage caused by the fact that the sample set and the characteristic meaning are held by the same participant is solved, and the interpretability of the gradient lifting tree model generated based on the federal learning is enhanced.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 15 is a block diagram of an electronic device of a gradient lifting tree model generation method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as smart voice interaction devices, personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 15, the electronic apparatus includes: one or more processors 1501, memory 1502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor 1501 may process instructions executed within the electronic device, including instructions stored in or on a memory to display graphical information of a GUI on an external input/output device (such as a display device coupled to an interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 15 illustrates an example of a processor 1501.
The memory 1502 is a non-transitory computer readable storage medium provided herein. The memory stores instructions executable by the at least one processor, so that the at least one processor executes the gradient lifting tree generation method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the gradient boosting tree generation method provided herein.
The memory 1502 is a non-transitory computer readable storage medium, and can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the gradient boosting tree generation method in the embodiment of the present application (for example, the first generation module 1201, the first transmission module 1202, the first determination module 1203, the acquisition module 1204, the first calculation module 1205 and the second calculation module 1206 shown in fig. 12, or the first receiving module 1301, the first decryption module 1302 and the second transmission module 1303 shown in fig. 13, or the sixth receiving module 1401, the fourth generation module 1402, the seventh transmission module 1403, the seventh receiving module 1404, the third determination module 1405 and the eighth transmission module 1406 shown in fig. 14). The processor 1501 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 1502, that is, implements the generation method of the gradient lifting tree model in the above method embodiment.
The memory 1502 may include a program storage area that may store an operating system, an application program required for at least one function, and a data storage area; the storage data area may store data created according to use of an electronic device of a generation method of the gradient boosting tree model, and the like. Further, the memory 1502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1502 may optionally include memory remotely located from the processor 1501, which may be connected over a network to an electronic device of the method of generating a gradient boosting tree model. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the gradient lifting tree model generation method may further include: an input device 1503 and an output device 1504. The processor 1501, the memory 1502, the input device 1503, and the output device 1504 may be connected by a bus or other means, such as the bus connection shown in fig. 15.
The input device 1503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the method of generating the gradient boosting tree model, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 1504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS").
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
In the description of the present specification, the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (30)

1. A generation method of a gradient lifting tree model is applied to a business side, and comprises the following steps:
generating a business side encryption aggregation value of the first-order derivative and the second-order derivative in each data sub-box according to a data side public key, an encryption code of a sample set of a current leaf node of an M-1 th tree, and a first-order derivative and a second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M;
sending the business side encrypted aggregation value to a first data side in the data sides;
determining a target optimal splitting characteristic and a corresponding target optimal splitting point according to the maximum value of the business side information gain sent after the first data side decrypts the encrypted aggregation value of the business side, the corresponding business side splitting characteristic number and the business side splitting point number;
acquiring encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point;
calculating the target weight of each leaf node according to the encryption codes of the sample set of each leaf node;
and calculating according to the target weight to obtain a gradient lifting tree model.
2. The generation method according to claim 1, wherein the data party public key is generated by the first data party.
3. The method according to claim 1, wherein the determining a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of a service party information gain sent after the first data party decrypts the encrypted aggregation value of the service party, and a corresponding service party splitting characteristic number and a corresponding service party splitting point number comprises:
determining the optimal splitting characteristic of the service party, the optimal splitting point of the service party and the corresponding maximum information gain of the service party according to the maximum value of the information gain of the service party, the splitting characteristic number of the service party and the splitting point number of the service party;
acquiring optimal splitting characteristics of a data side, optimal splitting points of the data side and corresponding maximum information gain of the data side;
determining the larger value of the maximum information gain of the service party and the maximum information gain of the data party as a target maximum information gain;
determining the optimal splitting characteristic corresponding to the target maximum information gain as the target optimal splitting characteristic;
and determining the optimal splitting point corresponding to the target maximum information gain as the target optimal splitting point.
4. The method according to claim 1, wherein the obtaining the optimal splitting characteristic of the data side, the optimal splitting point of the data side, and the corresponding maximum information gain of the data side comprises:
after homomorphic encryption is carried out on the first derivative and the second derivative, an encrypted first derivative and an encrypted second derivative are obtained;
sending the encrypted first derivative and the encrypted second derivative to the data side;
receiving a data party encryption aggregation value of the first-order derivative and the second-order derivative in each data sub-box sent by the data party, wherein the data party encryption aggregation value is generated by the data party according to a service party public key, a sample set of a current leaf node, the encryption first-order derivative and the encryption second-order derivative;
carrying out decryption calculation on the encrypted aggregation value of the data side to obtain the maximum value of the information gain of the data side, and the corresponding data side splitting characteristic number and the data side splitting point number;
sending the maximum value of the data side information gain, the corresponding data side split characteristic number and the corresponding data side split point number to the data side;
and receiving the optimal splitting characteristic of the data party, the optimal splitting point of the data party and the maximum information gain of the data party, which are sent by the data party, wherein the optimal splitting characteristic of the data party, the optimal splitting point of the data party and the maximum information gain of the data party are determined by the data party according to the maximum value of the information gain of the data party, the splitting characteristic number of the data party and the splitting point number of the data party.
5. The generation method according to claim 1, wherein the obtaining of the encrypted codes of the sample sets of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point comprises:
if the target optimal splitting characteristic belongs to the service party, calculating sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, encrypting the sample sets of the two leaf nodes by adopting the data party public key to obtain encrypted codes of the sample sets of the two leaf nodes, and sending the encrypted codes of the sample sets of the two leaf nodes to the data party;
and if the target optimal splitting characteristic belongs to the data side, receiving encrypted codes of the sample sets of the two leaf nodes sent by the data side, calculating the sample sets of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point by the encrypted codes of the sample sets of the two leaf nodes for the data side, and encrypting the sample sets of the two leaf nodes by adopting the data side public key.
6. The method of generating as claimed in claim 1, wherein said calculating the target weight for each leaf node from the cryptographic encoding of the sample set for each leaf node comprises:
generating a data party encrypted aggregation value of each leaf node according to the encrypted codes of the sample set of each leaf node, the data party public key, the first-order derivative and the second-order derivative;
sending the data side encrypted aggregate value to the first data side;
receiving a data party decryption aggregation value and a corresponding number which are sent after the first data party decrypts the data party encryption aggregation value;
and calculating to obtain the target weight according to the decryption aggregation value of the data side and the corresponding number.
7. The method of generating as claimed in claim 1, wherein said calculating a gradient-boosted tree model from said target weights comprises:
constructing an m tree according to the target weight;
updating the predicted value of the mth sub-model for each piece of wind control data, wherein the mth sub-model comprises 1 st tree to the mth tree;
and calculating to obtain the gradient lifting tree model according to the M sub-models.
8. A generation method of a gradient lifting tree model is applied to a first data side, and comprises the following steps:
receiving a business party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a business party, wherein the business party encrypted aggregate value is generated by the business party according to a public key of the data party, an encrypted code of a sample set of a current leaf node of an M-1 th tree and the first-order derivative and the second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, and M is smaller than a preset number M;
carrying out decryption calculation on the encrypted aggregation value of the service party to obtain the maximum value of the information gain of the service party, and the corresponding service party splitting characteristic number and the corresponding service party splitting point number;
and sending the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number to the service party so that the service party determines a target optimal splitting feature and a corresponding target optimal splitting point according to the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number, acquires an encrypted code of a sample set of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting feature and the target optimal splitting point, calculates a target weight of each leaf node according to the encrypted code of the sample set of each leaf node, and calculates a gradient lifting tree model according to the target weight.
9. The generation method according to claim 8, further comprising:
generating the public key of the data side;
and sending the public key of the data party to the service party.
10. The generation method according to claim 8, further comprising:
receiving an encrypted first derivative and an encrypted second derivative sent by the service party, wherein the encrypted first derivative and the encrypted second derivative are obtained by the service party after homomorphic encryption is carried out on the first derivative and the second derivative;
generating a data side encryption aggregation value of the first derivative and the second derivative in each data sub-box according to a service side public key, a sample set of a current leaf node, the encryption first derivative and the encryption second derivative;
sending the data side encryption aggregation value to the service side;
receiving the maximum value of the data party information gain sent by the service party after decrypting the data party encrypted aggregation value, and the corresponding data party splitting characteristic number and the data party splitting point number;
determining the optimal splitting characteristic of the data party, the optimal splitting point of the data party and the maximum information gain of the data party according to the maximum value of the information gain of the data party, the number of the splitting characteristic of the data party and the number of the splitting point of the data party;
and sending the optimal splitting characteristic of the data party, the optimal splitting point of the data party and the maximum information gain of the data party to the service party.
11. The generation method according to claim 8, further comprising:
if the target optimal splitting characteristic belongs to the service party, receiving encrypted codes of sample sets of the two leaf nodes sent by the service party, wherein the encrypted codes of the sample sets of the two leaf nodes are used for calculating the sample sets of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point of the current leaf node and are obtained by encrypting the sample sets of the two leaf nodes by adopting the data party public key;
and if the target optimal splitting characteristic belongs to the data side, calculating sample sets of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, encrypting the sample sets of the two leaf nodes by adopting the data side public key to obtain encrypted codes of the sample sets of the two leaf nodes, and sending the encrypted codes of the sample sets of the two leaf nodes to the service side.
12. The generation method according to claim 8, further comprising:
receiving a data side encryption aggregation value of each leaf node sent by the service side, wherein the data side encryption aggregation value is generated by the service side according to an encryption code of a sample set of each leaf node, the data side public key, the first-order derivative and the second-order derivative;
carrying out decryption calculation on the encrypted aggregation value of the data side to obtain a decrypted aggregation value of the data side and a corresponding number;
and sending the data side decryption aggregation value and the corresponding number to the service side so that the service side can calculate the target weight according to the data side decryption aggregation value and the corresponding number.
13. A generation method of a gradient lifting tree model is applied to a second data party, and comprises the following steps:
receiving an encrypted first derivative and an encrypted second derivative sent by a service party, wherein the encrypted first derivative and the encrypted second derivative are obtained by the service party after homomorphic encryption is carried out on the first derivative and the second derivative;
generating a data side encryption aggregation value of the first derivative and the second derivative in each data sub-box according to a service side public key, a sample set of a current leaf node, the encryption first derivative and the encryption second derivative;
sending the data side encryption aggregation value to the service side;
receiving the maximum value of the data party information gain sent by the service party after decrypting the data party encrypted aggregation value, and the corresponding data party splitting characteristic number and the data party splitting point number;
determining the optimal splitting characteristic of the data party, the optimal splitting point of the data party and the maximum information gain of the data party according to the maximum value of the information gain of the data party, the number of the splitting characteristic of the data party and the number of the splitting point of the data party;
sending the optimal splitting characteristic of the data side, the optimal splitting point of the data side and the maximum information gain of the data side to the service side, so that the service side can determine the optimal splitting characteristic of a target and the corresponding optimal splitting point of the target according to the optimal splitting characteristic of the data side, the optimal splitting point of the data side, the maximum information gain of the data side, the maximum value of the information gain of the service side, the splitting characteristic number of the service side and the splitting point number of the service side, obtain the encrypted codes of a sample set of two leaf nodes obtained after splitting of the current leaf node based on the optimal splitting characteristic of the target and the optimal splitting point of the target, calculate the target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, calculate a gradient lifting tree model according to the target weight, and obtain the maximum value, the splitting characteristic of the information gain of the service side and the maximum value of the target lifting tree model are obtained according to the encrypted codes of the target weight of each leaf node, The service party splitting characteristic number and the service party splitting point number are obtained by the first data party through decryption calculation of service party encryption aggregation values of a first derivative and a second derivative in each data sub-box sent by the service party.
14. The generation method according to claim 13, further comprising:
if the target optimal splitting characteristic belongs to the service party, receiving encrypted codes of sample sets of the two leaf nodes sent by the service party, wherein the encrypted codes of the sample sets of the two leaf nodes are used for calculating the sample sets of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point of the current leaf node and are obtained by encrypting the sample sets of the two leaf nodes by adopting the data party public key;
and if the target optimal splitting characteristic belongs to the data side, calculating sample sets of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, encrypting the sample sets of the two leaf nodes by adopting the data side public key to obtain encrypted codes of the sample sets of the two leaf nodes, and sending the encrypted codes of the sample sets of the two leaf nodes to the service side.
15. A generation device of a gradient lifting tree model is applied to a business side and comprises the following components:
the first generation module is configured to generate a business side encryption aggregation value of the first order derivative and the second order derivative in each data bin according to a data side public key, encryption codes of a sample set of a current leaf node of an M-1 th tree, and a first order derivative and a second order derivative of a predicted value after each piece of wind control data iterates for M-1 times, wherein M is smaller than a preset number M;
a first sending module configured to send the business side encrypted aggregate value to a first one of the data sides;
a first determining module, configured to determine a target optimal splitting characteristic and a corresponding target optimal splitting point according to a maximum value of a service party information gain sent after the first data party decrypts the encrypted aggregation value of the service party, and a corresponding service party splitting characteristic number and a corresponding service party splitting point number;
the obtaining module is configured to obtain encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point;
a first calculation module configured to calculate a target weight for each leaf node from the cryptographic encoding of the sample set for each leaf node;
and the second calculation module is configured to calculate a gradient lifting tree model according to the target weight.
16. The generation apparatus of claim 15, wherein the public key of the data party is generated by the first data party.
17. The apparatus of claim 15, wherein the first determining module comprises:
a first determining unit, configured to determine an optimal splitting feature of a service party, an optimal splitting point of the service party, and a corresponding maximum information gain of the service party according to the maximum value of the information gain of the service party, the splitting feature number of the service party, and the splitting point number of the service party;
an acquisition unit configured to acquire a data side optimal splitting characteristic, a data side optimal splitting point, and a corresponding data side maximum information gain;
a second determination unit configured to determine a larger value of the traffic side maximum information gain and the data side maximum information gain as a target maximum information gain;
a third determining unit configured to determine an optimal splitting characteristic corresponding to the target maximum information gain as the target optimal splitting characteristic;
a fourth determining unit configured to determine an optimal split point corresponding to the target maximum information gain as the target optimal split point.
18. The generation apparatus according to claim 15, wherein the acquisition unit includes:
the encryption subunit is configured to perform homomorphic encryption on the first order derivative and the second order derivative to obtain an encrypted first order derivative and an encrypted second order derivative;
a first transmitting subunit configured to transmit the encrypted first-order derivative and the encrypted second-order derivative to the data side;
a first receiving subunit, configured to receive a data party encrypted aggregate value of the first order derivative and the second order derivative in each data sub-box sent by the data party, where the data party encrypted aggregate value is generated by the data party according to a service party public key, a sample set of a current leaf node, the encrypted first order derivative, and the encrypted second order derivative;
the decryption subunit is configured to perform decryption calculation on the data side encrypted aggregation value to obtain a maximum value of data side information gain, and a corresponding data side split feature number and a corresponding data side split point number;
a second transmitting subunit configured to transmit the maximum value of the data side information gain, and the corresponding data side split feature number and data side split point number to the data side;
a second receiving subunit, configured to receive the optimal splitting characteristic of the data party, the optimal splitting point of the data party, and the maximum information gain of the data party, which are sent by the data party, where the optimal splitting characteristic of the data party, the optimal splitting point of the data party, and the maximum information gain of the data party are determined by the data party according to the maximum value of the information gain of the data party, the splitting characteristic number of the data party, and the splitting point number of the data party.
19. The generation apparatus according to claim 15, wherein the obtaining module comprises:
a first execution unit, configured to calculate sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point if the target optimal splitting characteristic belongs to the service party, encrypt the sample sets of the two leaf nodes by using the data party public key to obtain encryption codes of the sample sets of the two leaf nodes, and send the encryption codes of the sample sets of the two leaf nodes to the data party;
and a second execution unit, configured to receive, if the target optimal splitting characteristic belongs to the data side, encryption codes of sample sets of the two leaf nodes sent by the data side, where the encryption codes of the sample sets of the two leaf nodes calculate, for the data side, a sample set of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point of the current leaf node, and encrypt the sample set of the two leaf nodes by using the data side public key.
20. The generation apparatus according to claim 15, wherein the first calculation module comprises:
a generating unit configured to generate a data-side encrypted aggregate value for each leaf node from the encrypted encoding of the sample set for each leaf node, the data-side public key, the first derivative, and the second derivative;
a transmitting unit configured to transmit the data side encrypted aggregate value to the first data side;
a receiving unit configured to receive a data side decryption aggregation value and a corresponding number, which are sent after the first data side decrypts the data side encryption aggregation value;
and the first calculation unit is configured to calculate the target weight according to the data side decryption aggregation value and the corresponding number.
21. The generation apparatus according to claim 15, wherein the second calculation module comprises:
a building unit configured to build an mth tree according to the target weight;
the updating unit is configured to update the predicted value of the mth sub-model for each piece of wind control data, and the mth sub-model comprises 1 st tree to the mth tree;
a second calculating unit configured to calculate the gradient lifting tree model according to the M sub-models.
22. A generation device of a gradient lifting tree model is applied to a first data side and comprises:
the first receiving module is configured to receive a service party encrypted aggregate value of a first-order derivative and a second-order derivative in each data sub-box sent by a service party, wherein the service party encrypted aggregate value is generated by the service party according to a data party public key, an encrypted code of a sample set of a current leaf node of an M-1 th tree and the first-order derivative and the second-order derivative of a predicted value after each piece of wind control data iterates for M-1 times, and M is smaller than a preset number M;
the first decryption module is configured to decrypt and calculate the encrypted aggregation value of the service party to obtain the maximum value of the information gain of the service party, and the corresponding service party splitting feature number and the corresponding service party splitting point number;
the second sending module is configured to send the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number to the service party, so that the service party determines a target optimal splitting feature and a corresponding target optimal splitting point according to the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number, obtains encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting feature and the target optimal splitting point, calculates a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and calculates a gradient lifting tree model according to the target weight.
23. The generation apparatus according to claim 22, further comprising:
a second generation module configured to generate the data party public key;
a third sending module configured to send the data party public key to the service party.
24. The generation apparatus according to claim 22, further comprising:
the second receiving module is configured to receive an encrypted first-order derivative and an encrypted second-order derivative sent by the service party, and the encrypted first-order derivative and the encrypted second-order derivative are obtained by the service party after homomorphic encryption is performed on the first-order derivative and the second-order derivative;
a third generating module configured to generate a data side encrypted aggregate value of the first and second derivatives in each data bin according to a public key of a service side, a sample set of a current leaf node, the encrypted first derivative and the encrypted second derivative;
a fourth sending module configured to send the data side encrypted aggregate value to the service side;
a third receiving module, configured to receive a maximum value of data party information gain sent after the service party decrypts the data party encrypted aggregation value, and a corresponding data party splitting feature number and a data party splitting point number;
a second determining module configured to determine the data side optimal splitting feature, the data side optimal splitting point, and the data side maximum information gain according to the maximum value of the data side information gain, the data side splitting feature number, and the data side splitting point number;
a fifth sending module configured to send the optimal splitting characteristic of the data side, the optimal splitting point of the data side, and the maximum information gain of the data side to the service side.
25. The generation apparatus according to claim 22, further comprising:
a fourth receiving module, configured to receive, if the target optimal splitting characteristic belongs to the service party, encryption codes of sample sets of the two leaf nodes sent by the service party, where the encryption codes of the sample sets of the two leaf nodes calculate, for the service party, a sample set of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point of the current leaf node, and encrypt the sample set of the two leaf nodes by using the data party public key;
and the third calculation module is configured to calculate sample sets of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point if the target optimal splitting characteristic belongs to the data party, encrypt the sample sets of the two leaf nodes by using the data party public key to obtain encrypted codes of the sample sets of the two leaf nodes, and send the encrypted codes of the sample sets of the two leaf nodes to the service party.
26. The generation apparatus according to claim 22, further comprising:
a fifth receiving module configured to receive a data side encrypted aggregate value of each leaf node sent by the service side, where the data side encrypted aggregate value is generated by the service side according to an encrypted code of a sample set of each leaf node, the data side public key, the first order derivative, and the second order derivative;
the second decryption module is configured to decrypt the data side encrypted aggregate value to obtain a data side decrypted aggregate value and a corresponding number;
and the sixth sending module is configured to send the data side decryption aggregation value and the corresponding number to the service side, so that the service side can calculate the target weight according to the data side decryption aggregation value and the corresponding number.
27. A generation device of a gradient lifting tree model is applied to a second data party, and comprises:
the sixth receiving module is configured to receive an encrypted first-order derivative and an encrypted second-order derivative sent by a service party, and the encrypted first-order derivative and the encrypted second-order derivative are obtained by the service party after homomorphic encryption is performed on the first-order derivative and the second-order derivative;
a fourth generation module configured to generate a data side encrypted aggregate value of the first order derivative and the second order derivative in each data bin according to a public key of a service side, a sample set of a current leaf node, the encrypted first order derivative and the encrypted second order derivative;
a seventh sending module configured to send the data side encrypted aggregate value to the service side;
a seventh receiving module, configured to receive a maximum value of data party information gain sent after the service party decrypts the data party encrypted aggregation value, and a corresponding data party splitting feature number and a data party splitting point number;
a third determining module configured to determine the data side optimal splitting feature, the data side optimal splitting point and the data side maximum information gain according to the maximum value of the data side information gain, the data side splitting feature number and the data side splitting point number;
an eighth sending module, configured to send the optimal splitting characteristic of the data party, the optimal splitting point of the data party, and the maximum information gain of the data party to the service party, so that the service party determines a target optimal splitting characteristic and a corresponding target optimal splitting point according to the optimal splitting characteristic of the data party, the optimal splitting point of the data party, the maximum information gain of the data party, the maximum value of the information gain of the service party, the splitting characteristic number of the service party, and the splitting point number of the service party, and obtains encrypted codes of sample sets of two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point, calculates a target weight of each leaf node according to the encrypted codes of the sample set of each leaf node, and obtains a gradient lifting tree model according to the target weight calculation, the maximum value of the service party information gain, the service party splitting feature number and the service party splitting point number are obtained by the first data party through decryption calculation on the service party encryption aggregation value of the first derivative and the second derivative in each data sub-box sent by the service party.
28. The generation apparatus of claim 27, further comprising:
an eighth receiving module, configured to receive, if the target optimal splitting characteristic belongs to the service party, encryption codes of sample sets of the two leaf nodes sent by the service party, where the encryption codes of the sample sets of the two leaf nodes calculate, for the service party, a sample set of the two leaf nodes obtained after splitting based on the target optimal splitting characteristic and the target optimal splitting point of the current leaf node, and encrypt the sample set of the two leaf nodes by using the data party public key;
and a fourth calculation module, configured to calculate a sample set of the two leaf nodes obtained after splitting of the current leaf node based on the target optimal splitting characteristic and the target optimal splitting point if the target optimal splitting characteristic belongs to the data side, encrypt the sample set of the two leaf nodes by using the data side public key to obtain encrypted codes of the sample set of the two leaf nodes, and send the encrypted codes of the sample set of the two leaf nodes to the service side.
29. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of generating a gradient-boosted tree model as defined in any one of claims 1 to 7, or to perform a method of generating a gradient-boosted tree model as defined in any one of claims 8 to 12, or to perform a method of generating a gradient-boosted tree model as defined in any one of claims 13 to 14.
30. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of generating a gradient-lifting tree model according to any one of claims 1-7, or to perform the method of generating a gradient-lifting tree model according to any one of claims 8-12, or to perform the method of generating a gradient-lifting tree model according to any one of claims 13-14.
CN202111038483.0A 2021-09-06 2021-09-06 Gradient lifting tree model generation method and device, electronic equipment and storage medium Active CN113722739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111038483.0A CN113722739B (en) 2021-09-06 2021-09-06 Gradient lifting tree model generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111038483.0A CN113722739B (en) 2021-09-06 2021-09-06 Gradient lifting tree model generation method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113722739A true CN113722739A (en) 2021-11-30
CN113722739B CN113722739B (en) 2024-04-09

Family

ID=78681947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111038483.0A Active CN113722739B (en) 2021-09-06 2021-09-06 Gradient lifting tree model generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113722739B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114529108A (en) * 2022-04-22 2022-05-24 北京百度网讯科技有限公司 Tree model based prediction method, apparatus, device, medium, and program product

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536650A (en) * 2018-04-03 2018-09-14 北京京东尚科信息技术有限公司 Generate the method and apparatus that gradient promotes tree-model
WO2018205776A1 (en) * 2017-05-10 2018-11-15 腾讯科技(深圳)有限公司 Parameter server-based method for implementing gradient boosting decision tree and related equipment
CN110232448A (en) * 2019-04-08 2019-09-13 华南理工大学 It improves gradient and promotes the method that the characteristic value of tree-model acts on and prevents over-fitting
WO2020029590A1 (en) * 2018-08-10 2020-02-13 深圳前海微众银行股份有限公司 Sample prediction method and device based on federated training, and storage medium
US20200175426A1 (en) * 2019-07-01 2020-06-04 Alibaba Group Holding Limited Data-based prediction results using decision forests
CN111695697A (en) * 2020-06-12 2020-09-22 深圳前海微众银行股份有限公司 Multi-party combined decision tree construction method and device and readable storage medium
CN112052954A (en) * 2019-06-06 2020-12-08 北京百度网讯科技有限公司 Gradient lifting tree modeling method and device and terminal
CN112101577A (en) * 2020-11-13 2020-12-18 同盾控股有限公司 XGboost-based cross-sample federal learning and testing method, system, device and medium
CN112182982A (en) * 2020-10-27 2021-01-05 北京百度网讯科技有限公司 Multi-party combined modeling method, device, equipment and storage medium
WO2021000572A1 (en) * 2019-07-01 2021-01-07 创新先进技术有限公司 Data processing method and apparatus, and electronic device
CN112199706A (en) * 2020-10-26 2021-01-08 支付宝(杭州)信息技术有限公司 Tree model training method and business prediction method based on multi-party safety calculation
CN112308157A (en) * 2020-11-05 2021-02-02 浙江大学 Decision tree-oriented transverse federated learning method
CN112381307A (en) * 2020-11-20 2021-02-19 平安科技(深圳)有限公司 Meteorological event prediction method and device and related equipment
CN112765652A (en) * 2020-07-31 2021-05-07 支付宝(杭州)信息技术有限公司 Method, device and equipment for determining leaf node classification weight
US20210150372A1 (en) * 2019-09-30 2021-05-20 Tencent Technology (Shenzhen) Company Limited Training method and system for decision tree model, storage medium, and prediction method
CN112989399A (en) * 2021-05-18 2021-06-18 杭州金智塔科技有限公司 Data processing system and method
CN113051239A (en) * 2021-03-26 2021-06-29 北京沃东天骏信息技术有限公司 Data sharing method, use method of model applying data sharing method and related equipment
CN113139818A (en) * 2021-04-30 2021-07-20 苏宁金融科技(南京)有限公司 Anti-fraud method and system based on automatic feature engineering

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018205776A1 (en) * 2017-05-10 2018-11-15 腾讯科技(深圳)有限公司 Parameter server-based method for implementing gradient boosting decision tree and related equipment
CN108536650A (en) * 2018-04-03 2018-09-14 北京京东尚科信息技术有限公司 Generate the method and apparatus that gradient promotes tree-model
WO2020029590A1 (en) * 2018-08-10 2020-02-13 深圳前海微众银行股份有限公司 Sample prediction method and device based on federated training, and storage medium
CN110232448A (en) * 2019-04-08 2019-09-13 华南理工大学 It improves gradient and promotes the method that the characteristic value of tree-model acts on and prevents over-fitting
CN112052954A (en) * 2019-06-06 2020-12-08 北京百度网讯科技有限公司 Gradient lifting tree modeling method and device and terminal
US20200175426A1 (en) * 2019-07-01 2020-06-04 Alibaba Group Holding Limited Data-based prediction results using decision forests
WO2021000572A1 (en) * 2019-07-01 2021-01-07 创新先进技术有限公司 Data processing method and apparatus, and electronic device
US20210150372A1 (en) * 2019-09-30 2021-05-20 Tencent Technology (Shenzhen) Company Limited Training method and system for decision tree model, storage medium, and prediction method
CN111695697A (en) * 2020-06-12 2020-09-22 深圳前海微众银行股份有限公司 Multi-party combined decision tree construction method and device and readable storage medium
CN112765652A (en) * 2020-07-31 2021-05-07 支付宝(杭州)信息技术有限公司 Method, device and equipment for determining leaf node classification weight
CN112199706A (en) * 2020-10-26 2021-01-08 支付宝(杭州)信息技术有限公司 Tree model training method and business prediction method based on multi-party safety calculation
CN112182982A (en) * 2020-10-27 2021-01-05 北京百度网讯科技有限公司 Multi-party combined modeling method, device, equipment and storage medium
CN112308157A (en) * 2020-11-05 2021-02-02 浙江大学 Decision tree-oriented transverse federated learning method
CN112101577A (en) * 2020-11-13 2020-12-18 同盾控股有限公司 XGboost-based cross-sample federal learning and testing method, system, device and medium
CN112381307A (en) * 2020-11-20 2021-02-19 平安科技(深圳)有限公司 Meteorological event prediction method and device and related equipment
CN113051239A (en) * 2021-03-26 2021-06-29 北京沃东天骏信息技术有限公司 Data sharing method, use method of model applying data sharing method and related equipment
CN113139818A (en) * 2021-04-30 2021-07-20 苏宁金融科技(南京)有限公司 Anti-fraud method and system based on automatic feature engineering
CN112989399A (en) * 2021-05-18 2021-06-18 杭州金智塔科技有限公司 Data processing system and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114529108A (en) * 2022-04-22 2022-05-24 北京百度网讯科技有限公司 Tree model based prediction method, apparatus, device, medium, and program product

Also Published As

Publication number Publication date
CN113722739B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
KR102476902B1 (en) Method, device, equipment and storage medium for obtaining intersection of privacy sets
US9158925B2 (en) Server-aided private set intersection (PSI) with data transfer
CN112906044B (en) Multi-party security calculation method, device, equipment and storage medium
CN105122721B (en) For managing the method and system for being directed to the trustship of encryption data and calculating safely
CN107196926B (en) Cloud outsourcing privacy set comparison method and device
CN113033828B (en) Model training method, using method, system, credible node and equipment
CN106487503B (en) Multi-element public key cryptosystem and method based on tailored Hopfield neural network
JP7280303B2 (en) Model association training method, device, electronic device, storage medium and computer program
CN110765473A (en) Data processing method, data processing device, computer equipment and storage medium
JPWO2015155896A1 (en) Support vector machine learning system and support vector machine learning method
KR102550812B1 (en) Method for comparing ciphertext using homomorphic encryption and apparatus for executing thereof
Jayapandian et al. Secure and efficient online data storage and sharing over cloud environment using probabilistic with homomorphic encryption
JP2021145388A (en) Digital signature method, signature information verification method, related equipment, and electronic device
CN115694777A (en) Privacy set intersection method, device, equipment and medium based on homomorphic encryption
CN114186256A (en) Neural network model training method, device, equipment and storage medium
Sinha et al. Chaotic image encryption scheme based on modified arnold cat map and henon map
CN115664747A (en) Encryption method and device
CN112769542A (en) Multiplication triple generation method, device, equipment and medium based on elliptic curve
CN113722739B (en) Gradient lifting tree model generation method and device, electronic equipment and storage medium
CN113449872A (en) Parameter processing method, device and system based on federal learning
CN113807534B (en) Model parameter training method and device of federal learning model and electronic equipment
Selvakumar et al. A cryptographic method to have a secure communication of health care digital data into the cloud
Ma et al. Development of video encryption scheme based on quantum controlled dense coding using GHZ state for smart home scenario
CN113806760B (en) Method, device, electronic equipment and storage medium for acquiring correlation coefficient between features
JP2023043175A (en) Training method and apparatus for distributed machine learning model, and device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant