CN114065950A - Gradient aggregation method and device in GBDT model training and electronic equipment - Google Patents

Gradient aggregation method and device in GBDT model training and electronic equipment Download PDF

Info

Publication number
CN114065950A
CN114065950A CN202210041304.7A CN202210041304A CN114065950A CN 114065950 A CN114065950 A CN 114065950A CN 202210041304 A CN202210041304 A CN 202210041304A CN 114065950 A CN114065950 A CN 114065950A
Authority
CN
China
Prior art keywords
integer
gradient
samples
data node
gradients
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210041304.7A
Other languages
Chinese (zh)
Other versions
CN114065950B (en
Inventor
陈智隆
郝天一
陈琨
王国赛
王西利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huakong Tsingjiao Information Technology Beijing Co Ltd
Original Assignee
Huakong Tsingjiao Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huakong Tsingjiao Information Technology Beijing Co Ltd filed Critical Huakong Tsingjiao Information Technology Beijing Co Ltd
Priority to CN202210041304.7A priority Critical patent/CN114065950B/en
Publication of CN114065950A publication Critical patent/CN114065950A/en
Application granted granted Critical
Publication of CN114065950B publication Critical patent/CN114065950B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Complex Calculations (AREA)

Abstract

The application discloses a gradient aggregation method, a gradient aggregation device and electronic equipment in GBDT model training, which are applied to a ciphertext computing node in a training system and used for receiving integer gradients of a plurality of samples sent by an active data node ciphertext mode, wherein the integer gradients are obtained by converting floating point type gradients of the plurality of samples into integer types by the active data node; based on the received binning results of all the characteristics sent by the passive data nodes, aggregating the integer gradients of the samples included in the binning for each binning result of each characteristic to obtain an integer gradient aggregation result for the characteristic; and sending the integer type gradient aggregation result of each characteristic to the master data node ciphertext mode, so that the master data node converts the integer type gradient aggregation result into the floating point type gradient aggregation result. By adopting the scheme, the GBDT model training efficiency is improved.

Description

Gradient aggregation method and device in GBDT model training and electronic equipment
Technical Field
The present application relates to the field of machine learning technologies and multi-party security computing technologies, and in particular, to a gradient aggregation method and apparatus in GBDT model training, and an electronic device.
Background
GBDT (Gradient Boosting Decision Tree) is a set of supervised learning algorithms that use Decision Tree techniques for model training. And (3) enabling the model prediction result to be close to the true value used in training by a mode of fitting a plurality of decision trees, wherein the target value fitted by each decision tree is equal to the difference value between the true value of the training set and the predicted values of a plurality of decision trees. The GBDT model is typically applied to classification, regression, and other problems.
Currently, in practical applications of the GBDT model, feature data of various features of a plurality of samples when the GBDT model is trained are held by different participants, and the participants do not want to leak the feature data of the features of the plurality of samples, so that effective training of the GBDT model cannot be realized.
In order to solve the problem, a vertical training method of the GBDT model is provided in the industry, in the method, between a data node holding characteristic data of various characteristics of a sample and a ciphertext computing node, through information interaction and data computation in a ciphertext mode, vertical training of the GBDT model is achieved, and each data node participating in training can be ensured not to leak characteristic data which is self-supported with the characteristics.
In the vertical training method of the GBDT model, the data node of the active side needs to send the gradient of the sample to the ciphertext computing node in a ciphertext mode, the ciphertext computing node completes the computation of gradient aggregation, and the computation result is returned to the data node of the active side for the subsequent model training process. Because the transmission and the aggregation calculation of the gradient are both ciphertext, when the number of samples and the number of features participating in training are large, the transmission amount and the calculation amount of data are very large, and the efficiency of model training is seriously influenced.
Disclosure of Invention
The embodiment of the application provides a gradient aggregation method and device in GBDT model training and electronic equipment, and aims to solve the problem that in the prior art, the GBDT model training efficiency is low.
The embodiment of the application provides a gradient aggregation method in GBDT model training, which is applied to ciphertext computing nodes in a training system, wherein the training system comprises: data node and ciphertext computational node, the data node includes: an active data node and at least one passive data node, each of said data nodes holding characteristic data of at least one characteristic of a plurality of samples, said method comprising:
receiving integer gradients of the multiple samples sent by the master data node in a ciphertext mode, wherein the integer gradients are obtained by converting gradients of floating point number types of the multiple samples into integer types by the master data node;
based on the received binning results of all the characteristics sent by the passive data nodes, aggregating the integer gradients of the samples included in the binning for each binning result of each characteristic to obtain an integer gradient aggregation result for the characteristic, wherein the binning result of each characteristic is obtained by binning the plurality of samples by the passive data node having the characteristic;
and sending the integer type gradient aggregation result of each characteristic to the master data node in a ciphertext mode, so that the master data node converts the integer type gradient aggregation result into a floating point type gradient aggregation result.
Further, after the receiving the integer gradients of the plurality of samples sent in the master data node ciphertext manner, the method further includes:
storing the integer-type gradient of the plurality of samples;
the aggregating, for each bin of the binning result of each feature, the integer-type gradients of the samples included in the bin based on the received binning results of all the features sent by the passive data nodes includes:
and aggregating the integer gradients of the samples included in each bin of the characteristic binning result by using the stored integer gradients of the plurality of samples for each bin of the binning result of each characteristic based on the received binning results of all the characteristics sent by the passive data nodes.
Further, the integer-type gradient of the plurality of samples includes: a first-order gradient vector and a constant integer second-order gradient, wherein the first-order gradient vector comprises the integer first-order gradient of the plurality of samples, the integer first-order gradient is obtained by converting the first-order gradient of the floating point number type of the plurality of samples into an integer type by the active data node, the integer second-order gradients of the plurality of samples are the same constant, and the integer second-order gradient is obtained by converting the second-order gradient of the same floating point number type of the plurality of samples into an integer type by the active data node;
the method for aggregating the integer gradients of the samples included in each bin of the received binning result of all the characteristics sent by each passive data node according to each bin of the binning result of each characteristic to obtain an integer gradient aggregation result for the characteristic includes:
and based on the received binning results of all the characteristics sent by the passive data nodes, for each binning result of each characteristic, aggregating the integer type first-order gradients of the samples included in the binning to obtain an integer type first-order gradient aggregation result for the characteristic, and multiplying the number of the samples included in the binning by the integer type second-order gradient to obtain a product as an integer type second-order gradient aggregation result for the characteristic.
The embodiment of the application provides a gradient aggregation method in GBDT model training, which is applied to an active side data node in a training system, wherein the training system comprises: data node and ciphertext computational node, the data node includes: an active data node and at least one passive data node, each of said data nodes holding characteristic data of at least one characteristic of a plurality of samples, said method comprising:
sending the integer type gradients of the multiple samples to a ciphertext computing node according to a ciphertext mode, wherein the integer type gradients are obtained by converting the gradients of the floating point number types of the multiple samples into integer types by the active data node, so that the ciphertext computing node aggregates the integer type gradients of the samples included in each bin based on the received bin dividing result of the characteristic sent by each passive data node for each bin of the bin dividing result of each characteristic to obtain an integer type gradient aggregation result for the characteristic, and the bin dividing result of each characteristic is obtained by bin dividing the multiple samples by the passive data node with the characteristic;
receiving the integer type gradient aggregation result of each characteristic sent by the ciphertext computing node ciphertext mode;
and converting the integer type gradient aggregation result into a floating point type gradient aggregation result.
Further, the integer-type gradient of the plurality of samples includes: the first-order gradient vector comprises integer first-order gradients of the multiple samples, the integer first-order gradients are obtained by converting the first-order gradients of the floating point number types of the multiple samples into integer types by the active data node, the integer second-order gradients of the multiple samples are the same constant, and the integer second-order gradients are obtained by converting the second-order gradients of the same floating point number types of the multiple samples into integer types by the active data node.
The embodiment of the application provides a gradient aggregation device in GBDT model training, is applied to ciphertext computational node in a training system, and the training system includes: data node and ciphertext computational node, the data node includes: an active data node and at least one passive data node, each of said data nodes holding characteristic data of at least one characteristic of a plurality of samples, said apparatus comprising:
the gradient receiving module is used for receiving the integer gradients of the multiple samples sent by the master data node in a ciphertext mode, wherein the integer gradients are obtained by converting the gradients of the floating point number types of the multiple samples into integer types by the master data node;
a gradient aggregation module, configured to aggregate, for each binning result of each feature, the integer gradients of the samples included in the binning based on the received binning results of all the features sent by the passive data nodes, so as to obtain an integer gradient aggregation result for the feature, where the binning result of each feature is obtained by binning the multiple samples by the passive data node having the feature;
and the aggregation result sending module is used for sending the integer type gradient aggregation result of each characteristic to the master data node in a ciphertext mode, so that the master data node converts the integer type gradient aggregation result into a floating point type gradient aggregation result.
Further, the method also comprises the following steps:
a gradient storage module for storing the integer-type gradients of the plurality of samples;
the gradient aggregation module is specifically configured to, based on the received binning results of all the features sent by the passive data nodes, aggregate, for each binning result of each feature, the integer gradients of the samples included in the binning by using the stored integer gradients of the multiple samples.
Further, the integer-type gradient of the plurality of samples includes: a first-order gradient vector and a constant integer second-order gradient, wherein the first-order gradient vector comprises the integer first-order gradient of the plurality of samples, the integer first-order gradient is obtained by converting the first-order gradient of the floating point number type of the plurality of samples into an integer type by the active data node, the integer second-order gradients of the plurality of samples are the same constant, and the integer second-order gradient is obtained by converting the second-order gradient of the same floating point number type of the plurality of samples into an integer type by the active data node;
the gradient aggregation module is specifically configured to aggregate, for each binning result of each feature, the integer first-order gradients of the samples included in the binning based on the received binning results of all the features sent by the passive data nodes, to obtain an integer first-order gradient aggregation result for the feature, and multiply the number of the samples included in the binning by the integer second-order gradient, so that an obtained product serves as an integer second-order gradient aggregation result for the feature.
The embodiment of the present application provides a gradient aggregation device in GBDT model training, which is applied to an active data node in a training system, where the training system includes: data node and ciphertext computational node, the data node includes: an active data node and at least one passive data node, each of said data nodes holding characteristic data of at least one characteristic of a plurality of samples, said apparatus comprising:
a gradient sending module, configured to send integer gradients of the multiple samples to the ciphertext computing node according to a ciphertext manner, where the integer gradients are obtained by converting, by the active-side data node, gradients of floating-point number types of the multiple samples into integer types, so that the ciphertext computing node aggregates, for each binning result of each feature, the integer gradients of the samples included in the binning based on the received binning result of the feature sent by each passive-side data node, to obtain an integer gradient aggregation result for the feature, where the binning result of each feature is obtained by binning the multiple samples by the passive-side data node having the feature;
the aggregation result receiving module is used for receiving the integer type gradient aggregation result of each characteristic sent by the ciphertext computing node ciphertext mode;
and the aggregation result conversion module is used for converting the integer type gradient aggregation result into a floating point type gradient aggregation result.
Further, the integer-type gradient of the plurality of samples includes: the first-order gradient vector comprises integer first-order gradients of the multiple samples, the integer first-order gradients are obtained by converting the first-order gradients of the floating point number types of the multiple samples into integer types by the active data node, the integer second-order gradients of the multiple samples are the same constant, and the integer second-order gradients are obtained by converting the second-order gradients of the same floating point number types of the multiple samples into integer types by the active data node.
An embodiment of the application provides an electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: and realizing the gradient aggregation method in any GBDT model training.
Embodiments of the present application provide a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the gradient aggregation method in any of the above GBDT model training.
Embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform any of the above-described gradient aggregation methods in GBDT model training.
The beneficial effect of this application includes:
in the method provided by the embodiment of the application, when the active side data node transmits gradient data of a sample to the ciphertext computing node, the floating point type gradient is converted into the integer type gradient of the integer type, and then the integer type gradient ciphertext mode is transmitted to the ciphertext computing node, so that the ciphertext computing node can perform gradient aggregation computing based on the integer type gradient.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application and not to limit the application. In the drawings:
fig. 1 is a schematic structural diagram of a longitudinal training system of a GBDT model according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a gradient aggregation method applied to training a GBDT model of a ciphertext computing node according to an embodiment of the present application;
fig. 3 is a flowchart of a gradient aggregation method applied in the GBDT model training of an active data node according to an embodiment of the present disclosure;
fig. 4 is a flowchart of a gradient aggregation method in the GBDT model training provided in another embodiment of the present application;
fig. 5-1 is a schematic structural diagram of a gradient aggregation apparatus applied in the GBDT model training of ciphertext computing nodes according to the embodiment of the present application;
fig. 5-2 is a schematic structural diagram of a gradient aggregation apparatus applied in the GBDT model training of ciphertext computing nodes according to another embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a gradient aggregation apparatus applied in the GBDT model training of an active data node according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to provide an implementation scheme for improving the efficiency of the GBDT model training, the embodiments of the present application provide a gradient aggregation method, an apparatus and an electronic device in the GBDT model training, and the following description is made in conjunction with the accompanying drawings of the specification to describe preferred embodiments of the present application, it should be understood that the preferred embodiments described herein are only used for illustrating and explaining the present application, and are not used to limit the present application. And the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Currently, in the technical field of multi-party secure computing, a longitudinal training scheme of a GBDT model is proposed, as shown in fig. 1, and the scheme is applied to a training system including data nodes and ciphertext computing nodes, where the data nodes include: an active side data node and at least one passive side data node.
Each data node in the training system belongs to a data provider, each data node has a plurality of samples for training, holds feature data of at least one feature of the samples, different data nodes can hold feature data of different features, and the data nodes are mainly used for local feature data storage and plaintext calculation.
The active data node of the data nodes also holds a target value of each sample required for performing the GBDT model training, which may also be referred to as target data or label data, for example, 0-1 label data of each sample if the data node is a classification model for a classification problem, or a target value of each sample if the data node is a regression model for a regression problem.
The ciphertext computing node implements ciphertext computing in the GBDT model training by using a ciphertext computing protocol for multi-party secure computing, and the ciphertext computing protocol may be a feasible protocol, for example, in the embodiment of the present application, an SS4 protocol, which is an encryption protocol based on Secret sharing (Secret sharing), may be used.
Multiple ciphertext computing nodes may be included, depending on the needs of the ciphertext computing protocol employed.
In a vertical training scheme of the GBDT model, vertical training of the GBDT model, namely training of vertical segmentation federal learning (Federated learning), is completed among an active side data node, a passive side data node and a ciphertext computing node through information interaction and data computation, wherein the information interaction and the data computation between the ciphertext computing node and the data node adopt a ciphertext mode, so that each data node can be ensured not to leak characteristic data which is self-owned with characteristics.
In the longitudinal training scheme of the GBDT model, the active data node needs to send the gradient of the sample participating in the training to the ciphertext computing node in a ciphertext mode, the ciphertext computing node completes the computation of gradient aggregation, and the computation result is returned to the active data node for the subsequent model training process. In order to improve the efficiency of the GBDT model training, the following optimization scheme is proposed in the embodiment of the present application for the method flow of gradient aggregation.
The embodiment of the application provides a gradient aggregation method in GBDT model training, which is applied to ciphertext computing nodes in a training system, wherein the training system comprises: data node and ciphertext computational node, data node includes: an active data node and at least one passive data node, each data node holding characteristic data of at least one characteristic of a plurality of samples, as shown in fig. 2, the method comprising:
step 21, receiving integer gradients of a plurality of samples sent by the master data node in a ciphertext mode, wherein the integer gradients are obtained by converting gradients of floating point number types of the plurality of samples into integer types by the master data node;
step 22, aggregating the integer gradients of the samples included in the binning based on the received binning results of all the characteristics sent by the passive data nodes, and obtaining an integer gradient aggregation result for each characteristic, wherein the binning result of each characteristic is obtained by binning a plurality of samples by the passive data node having the characteristic;
and step 23, sending the integer type gradient aggregation result of each characteristic to the master data node ciphertext mode, so that the master data node converts the integer type gradient aggregation result into a floating point type gradient aggregation result.
Correspondingly, the embodiment of the present application further provides a gradient aggregation method in the GBDT model training, which is applied to an active data node in a training system, where the training system includes: data node and ciphertext computational node, data node includes: an active data node and at least one passive data node, each data node holding characteristic data of at least one characteristic of a plurality of samples, as shown in fig. 3, the method comprising:
step 31, according to a ciphertext mode, sending integer gradients of a plurality of samples to a ciphertext computing node, wherein the integer gradients are obtained by converting gradients of floating point types of the plurality of samples into integer types by an active side data node, so that the ciphertext computing node aggregates the integer gradients of the samples included in each sub-box according to each sub-box of the sub-box result of each characteristic based on the received sub-box result of each characteristic sent by each passive side data node, and obtains an integer gradient aggregation result of each characteristic, and the sub-box result of each characteristic is obtained by sub-box a plurality of samples by the passive side data node having the characteristic;
step 32, receiving an integer type gradient aggregation result of each characteristic sent by the ciphertext computing node ciphertext mode;
and step 33, converting the integer type gradient aggregation result into a floating point type gradient aggregation result.
By adopting the gradient aggregation method in the GBDT model training provided by the embodiment of the application, when the active side data node transmits the gradient data of the sample to the ciphertext computing node, the floating point type gradient is converted into the integer type gradient, and then the integer type gradient ciphertext mode is transmitted to the ciphertext computing node, so that the ciphertext computing node can perform gradient aggregation computing based on the integer type gradient.
The method provided by the present application is described in detail below with specific embodiments in conjunction with the accompanying drawings.
An embodiment of the present application provides a gradient aggregation method in GBDT model training, as shown in fig. 4, which may include the following steps:
and step 41, calculating the gradient of each sample by the initiative data node aiming at a plurality of samples participating in training.
During the training of the GBDT model, the gradient of the calculated samples is calculated for the decision tree comprised by the GBDT model, i.e. the gradient of each of the plurality of samples is calculated according to the loss function of the initial GBDT model.
In this step, for the current decision tree to be trained, the gradient of each sample in the multiple samples may be specifically calculated by using the following formula:
Figure 4420DEST_PATH_IMAGE001
Figure 503535DEST_PATH_IMAGE002
wherein,
Figure 6192DEST_PATH_IMAGE003
for the first of a plurality of samples
Figure 655479DEST_PATH_IMAGE004
The first order gradient of the individual samples,
Figure 51825DEST_PATH_IMAGE005
is as follows
Figure 361321DEST_PATH_IMAGE004
The second order gradient of the individual samples,
Figure 616853DEST_PATH_IMAGE006
as a loss function of the initial GBDT model,
Figure 397727DEST_PATH_IMAGE007
is as follows
Figure 55105DEST_PATH_IMAGE004
The target value for each of the samples is,
Figure 771388DEST_PATH_IMAGE008
to train the first
Figure 373271DEST_PATH_IMAGE009
When making decision tree, before using
Figure 603176DEST_PATH_IMAGE010
A decision tree pair
Figure 115060DEST_PATH_IMAGE004
The sum of the predicted values for the prediction is performed for each sample.
In the embodiment of the present application, the loss functions used for calculating the gradients of the samples are different, and the calculated gradients are also different, and when the loss functions used are quadratic functions, for example, the loss functions are MSEs (mean square error), and the second-order gradients of all the samples are the same and are a constant.
In practical applications, the data type of the calculated gradient of the sample is a floating point number type.
And 42, converting the floating point type gradient of the multiple samples into an integer type by the active data node to obtain the integer type gradient of the multiple samples.
In this step, when the gradients of the samples include a first order gradient and a second order gradient, and the second order gradient is a constant, the first order gradient of the floating point number type of the multiple samples may be converted into an integer type to obtain an integer first order gradient, and the second order gradient of the same floating point number type of the multiple samples may be converted into an integer type to obtain an integer second order gradient, where the integer type gradients of the multiple samples include: the integer first order gradient of the plurality of samples may be represented by a first order gradient vector, and a constant integer second order gradient.
And 43, the data node of the active side sends the integer gradients of the multiple samples to the ciphertext computing node according to the ciphertext mode.
Because the gradient of the floating point number type is converted into the integer type, the data volume is reduced, and the communication traffic between the master data node and the ciphertext computing node is also reduced.
Further, when the second-order gradients of all samples are the same and are a constant, it is not necessary to send a second-order gradient for each sample, and only one constant needs to be sent, thereby further reducing the communication traffic.
And step 44, after receiving the integer gradients of the multiple samples, the ciphertext computing node stores the integer gradients of the multiple samples.
And step 45, the ciphertext computing node aggregates the integer gradients of the samples included in the sub-box according to the received sub-box results of all the characteristics sent by the passive data nodes of all the characteristics to obtain an integer gradient aggregation result of the characteristics, wherein the integer gradient aggregation result of each characteristic is obtained by sub-box a plurality of samples by the passive data node of each characteristic.
In the training process of the GBDT model, aiming at a current decision tree to be trained and a current layer, each data node (including a passive data node and an active data node) is subjected to binning according to the binning number of each characteristic held by each node in the current layer, so as to obtain a binning result, and the binning result is used as the binning result of the characteristic and used for gradient aggregation of subsequent samples. In the embodiments of the present application, the binning operation will not be described in detail.
In this step, when the integer gradient of the plurality of samples includes: when the first-order gradient vector and the integer second-order gradient of a constant are used, correspondingly, the integer first-order gradients of the samples included in the binning are aggregated for each binning of the binning result of each feature based on the received binning results of all the features sent by the passive data nodes, so as to obtain an integer first-order gradient aggregation result for the feature, and the integer second-order gradient aggregation result for the feature is obtained by multiplying the number of the samples included in the binning by the integer second-order gradient.
For example, the second order gradient is 2/n, where n is the number of the multiple samples, when n is 8, the second order gradients of the multiple samples are 1/4, and the binning result of a certain characteristic is that a first bin includes samples 2, 4, and 7, for a total of 3 samples, a second bin includes sample 1, for a total of 1 sample, a third bin includes samples 3 and 5, for a total of two samples, and a fourth bin includes samples 6 and 8, for a total of 2 samples, according to the above second order gradient aggregation method, the second order gradient aggregation result for the four bins sequentially is: 3/4 (3 times 1/4), 1/4, 2/4, 2/4.
By adopting the second-order gradient aggregation mode, aggregation can be performed by using multiplication calculation without using addition calculation, so that the calculation efficiency is further improved.
For training of a decision tree, gradient aggregation is often required to be performed based on the bin result of each feature, and currently, in a scheme of practical application, when gradient aggregation is generally performed on the bin result of each feature, a gradient of a sample is transmitted between an active data node and a ciphertext computing node in a uniform ciphertext manner, so that a large amount of communication traffic is increased.
In the embodiment of the application, the ciphertext computing node stores the integer gradients of the multiple samples after receiving the integer gradients of the multiple samples, so that the stored integer gradients of the multiple samples can be used for aggregation when performing gradient aggregation based on the binning result of each feature subsequently, the gradient of the sample does not need to be transmitted once, and the communication traffic between the master data node and the ciphertext computing node is further reduced.
And step 46, the ciphertext computing node sends the integer type gradient aggregation result of each characteristic to the data node ciphertext mode of the active side.
And step 47, after receiving the integer gradient aggregation result, the master data node converts the integer gradient aggregation result into a floating point type gradient aggregation result for subsequent training of the GBDT model.
In the subsequent training of the GBDT model, the active data node may train the node splitting criterion of the current layer of the current decision tree through data calculation based on the gradient aggregation result, which is not described in detail in the embodiment of the present application.
Based on the same inventive concept, according to the gradient aggregation method applied to the GBDT model training of ciphertext computing nodes provided in the foregoing embodiments of the present application, correspondingly, another embodiment of the present application further provides a gradient aggregation device applied to the GBDT model training, where the gradient aggregation device is applied to ciphertext computing nodes in a training system, and the training system includes: data node and ciphertext computational node, data node includes: an active data node and at least one passive data node, each data node holding feature data of at least one feature of a plurality of samples, a schematic structural diagram of which is shown in fig. 5-1, specifically includes:
the gradient receiving module 51 is configured to receive an integer gradient of the multiple samples sent in the form of the master data node ciphertext, where the integer gradient is obtained by converting a gradient of a floating point number type of the multiple samples into an integer type by the master data node;
a gradient aggregation module 52, configured to aggregate, for each binning result of each feature, integer gradients of samples included in the binning result based on the received binning results of all the features sent by each passive data node, to obtain an integer gradient aggregation result for the feature, where the binning result of each feature is obtained by binning a plurality of samples by the passive data node having the feature;
and an aggregation result sending module 53, configured to send the integer gradient aggregation result of each feature to the master data node ciphertext manner, so that the master data node converts the integer gradient aggregation result into a floating-point type gradient aggregation result.
Further, as shown in fig. 5-2, the method further includes:
a gradient storage module 54 for storing integer gradients of the plurality of samples;
the gradient aggregation module 52 is specifically configured to, based on the received binning results of all the features sent by the passive data nodes, aggregate, for each binning result of each feature, the integer gradients of the samples included in the binning by using the stored integer gradients of the multiple samples.
Further, an integer-type gradient of the plurality of samples, comprising: the first-order gradient vector comprises integer first-order gradients of a plurality of samples, the integer first-order gradients are obtained by converting first-order gradients of floating point number types of the plurality of samples into integer types through an active data node, the integer second-order gradients of the plurality of samples are the same constant, and the integer second-order gradients are obtained by converting second-order gradients of the same floating point number types of the plurality of samples into integer types through the active data node;
the gradient aggregation module 52 is specifically configured to aggregate, for each binning result of each feature, integer first-order gradients of samples included in the binning based on the received binning results of all features sent by each passive data node, to obtain an integer first-order gradient aggregation result for the feature, and multiply the number of samples included in the binning by the integer second-order gradient, so as to obtain a product, which is used as an integer second-order gradient aggregation result for the feature.
Based on the same inventive concept, according to the gradient aggregation method applied to the GBDT model training of the active data node in the foregoing embodiment of the present application, correspondingly, another embodiment of the present application further provides a gradient aggregation device applied to the GBDT model training, where the gradient aggregation device is applied to the active data node in a training system, and the training system includes: data node and ciphertext computational node, data node includes: an active data node and at least one passive data node, each data node holding feature data of at least one feature of a plurality of samples, a schematic structural diagram of which is shown in fig. 6, specifically includes:
the gradient sending module 61 is configured to send integer gradients of the multiple samples to the ciphertext computing node according to a ciphertext manner, where the integer gradients are obtained by converting gradients of floating point number types of the multiple samples into integer types by the active-side data node, so that the ciphertext computing node aggregates, for each bin of a bin result of each feature, the integer gradients of the samples included in the bin based on received bin results of all features sent by each passive-side data node, to obtain an integer gradient aggregation result for the feature, where the bin result of each feature is obtained by binning the multiple samples by the passive-side data node holding the feature;
the aggregation result receiving module 62 is configured to receive an integer gradient aggregation result of each feature sent in a ciphertext mode of the ciphertext computing node;
and an aggregation result conversion module 63, configured to convert the integer type gradient aggregation result into a floating point type gradient aggregation result.
Further, an integer-type gradient of the plurality of samples, comprising: the first-order gradient vector comprises integer first-order gradients of a plurality of samples, the integer first-order gradients are obtained by converting first-order gradients of floating point number types of the plurality of samples into integer types through an active data node, the integer second-order gradients of the plurality of samples are the same constant, and the integer second-order gradients are obtained by converting second-order gradients of the same floating point number types of the plurality of samples into integer types through the active data node.
The functions of the above modules may correspond to the corresponding processing steps in the flows shown in fig. 1 to 4, and are not described herein again.
The gradient aggregation device in the GBDT model training provided by the embodiments of the present application may be implemented by a computer program. It should be understood by those skilled in the art that the above-mentioned module division is only one of many module division, and if the module division is divided into other modules or not, it is within the scope of the present application as long as the gradient aggregation apparatus in the GBDT model training has the above-mentioned functions.
An electronic device, as shown in fig. 7, includes a processor 71 and a machine-readable storage medium 72, where the machine-readable storage medium 72 stores machine-executable instructions that can be executed by the processor 71, and the processor 71 is caused by the machine-executable instructions to: and realizing the gradient aggregation method in any GBDT model training.
An embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements any of the gradient aggregation methods in the GBDT model training.
Embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform any of the above-described gradient aggregation methods in GBDT model training.
The machine-readable storage medium in the electronic device may include a Random Access Memory (RAM) and a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A gradient aggregation method in gradient boosting decision tree GBDT model training is applied to ciphertext computing nodes in a training system, and the training system comprises: data node and ciphertext computational node, the data node includes: an active data node and at least one passive data node, each of said data nodes holding characteristic data of at least one characteristic of a plurality of samples, said method comprising:
receiving integer gradients of the multiple samples sent by the master data node in a ciphertext mode, wherein the integer gradients are obtained by converting gradients of floating point number types of the multiple samples into integer types by the master data node;
based on the received binning results of all the characteristics sent by the passive data nodes, aggregating the integer gradients of the samples included in the binning for each binning result of each characteristic to obtain an integer gradient aggregation result for the characteristic, wherein the binning result of each characteristic is obtained by binning the plurality of samples by the passive data node having the characteristic;
and sending the integer type gradient aggregation result of each characteristic to the master data node in a ciphertext mode, so that the master data node converts the integer type gradient aggregation result into a floating point type gradient aggregation result.
2. The method of claim 1, wherein after receiving the integer-type gradient of the plurality of samples sent ciphertext-wise from the master data node, further comprising:
storing the integer-type gradient of the plurality of samples;
the aggregating, for each bin of the binning result of each feature, the integer-type gradients of the samples included in the bin based on the received binning results of all the features sent by the passive data nodes includes:
and aggregating the integer gradients of the samples included in each bin of the characteristic binning result by using the stored integer gradients of the plurality of samples for each bin of the binning result of each characteristic based on the received binning results of all the characteristics sent by the passive data nodes.
3. The method of claim 1, wherein the integer-type gradient of the plurality of samples comprises: a first-order gradient vector and a constant integer second-order gradient, wherein the first-order gradient vector comprises the integer first-order gradient of the plurality of samples, the integer first-order gradient is obtained by converting the first-order gradient of the floating point number type of the plurality of samples into an integer type by the active data node, the integer second-order gradients of the plurality of samples are the same constant, and the integer second-order gradient is obtained by converting the second-order gradient of the same floating point number type of the plurality of samples into an integer type by the active data node;
the method for aggregating the integer gradients of the samples included in each bin of the received binning result of all the characteristics sent by each passive data node according to each bin of the binning result of each characteristic to obtain an integer gradient aggregation result for the characteristic includes:
and based on the received binning results of all the characteristics sent by the passive data nodes, for each binning result of each characteristic, aggregating the integer type first-order gradients of the samples included in the binning to obtain an integer type first-order gradient aggregation result for the characteristic, and multiplying the number of the samples included in the binning by the integer type second-order gradient to obtain a product as an integer type second-order gradient aggregation result for the characteristic.
4. A gradient aggregation method in the training of a Gradient Boosting Decision Tree (GBDT) model is characterized in that the method is applied to an active data node in a training system, and the training system comprises the following steps: data node and ciphertext computational node, the data node includes: an active data node and at least one passive data node, each of said data nodes holding characteristic data of at least one characteristic of a plurality of samples, said method comprising:
sending the integer type gradients of the multiple samples to a ciphertext computing node according to a ciphertext mode, wherein the integer type gradients are obtained by converting the gradients of the floating point number types of the multiple samples into integer types by the active data node, so that the ciphertext computing node aggregates the integer type gradients of the samples included in each bin based on the received bin dividing result of the characteristic sent by each passive data node for each bin of the bin dividing result of each characteristic to obtain an integer type gradient aggregation result for the characteristic, and the bin dividing result of each characteristic is obtained by bin dividing the multiple samples by the passive data node with the characteristic;
receiving the integer type gradient aggregation result of each characteristic sent by the ciphertext computing node ciphertext mode;
and converting the integer type gradient aggregation result into a floating point type gradient aggregation result.
5. The method of claim 4, wherein the integer-type gradient of the plurality of samples comprises: the first-order gradient vector comprises integer first-order gradients of the multiple samples, the integer first-order gradients are obtained by converting the first-order gradients of the floating point number types of the multiple samples into integer types by the active data node, the integer second-order gradients of the multiple samples are the same constant, and the integer second-order gradients are obtained by converting the second-order gradients of the same floating point number types of the multiple samples into integer types by the active data node.
6. A gradient aggregation device in the training of a Gradient Boosting Decision Tree (GBDT) model is applied to a ciphertext computing node in a training system, and the training system comprises: data node and ciphertext computational node, the data node includes: an active data node and at least one passive data node, each of said data nodes holding characteristic data of at least one characteristic of a plurality of samples, said apparatus comprising:
the gradient receiving module is used for receiving the integer gradients of the multiple samples sent by the master data node in a ciphertext mode, wherein the integer gradients are obtained by converting the gradients of the floating point number types of the multiple samples into integer types by the master data node;
a gradient aggregation module, configured to aggregate, for each binning result of each feature, the integer gradients of the samples included in the binning based on the received binning results of all the features sent by the passive data nodes, so as to obtain an integer gradient aggregation result for the feature, where the binning result of each feature is obtained by binning the multiple samples by the passive data node having the feature;
and the aggregation result sending module is used for sending the integer type gradient aggregation result of each characteristic to the master data node in a ciphertext mode, so that the master data node converts the integer type gradient aggregation result into a floating point type gradient aggregation result.
7. The apparatus of claim 6, wherein the integer-type gradient of the plurality of samples comprises: a first-order gradient vector and a constant integer second-order gradient, wherein the first-order gradient vector comprises the integer first-order gradient of the plurality of samples, the integer first-order gradient is obtained by converting the first-order gradient of the floating point number type of the plurality of samples into an integer type by the active data node, the integer second-order gradients of the plurality of samples are the same constant, and the integer second-order gradient is obtained by converting the second-order gradient of the same floating point number type of the plurality of samples into an integer type by the active data node;
the gradient aggregation module is specifically configured to aggregate, for each binning result of each feature, the integer first-order gradients of the samples included in the binning based on the received binning results of all the features sent by the passive data nodes, to obtain an integer first-order gradient aggregation result for the feature, and multiply the number of the samples included in the binning by the integer second-order gradient, so that an obtained product serves as an integer second-order gradient aggregation result for the feature.
8. A gradient aggregation device in the training of a GBDT model, which is applied to an active data node in a training system, the training system comprising: data node and ciphertext computational node, the data node includes: an active data node and at least one passive data node, each of said data nodes holding characteristic data of at least one characteristic of a plurality of samples, said apparatus comprising:
a gradient sending module, configured to send integer gradients of the multiple samples to the ciphertext computing node according to a ciphertext manner, where the integer gradients are obtained by converting, by the active-side data node, gradients of floating-point number types of the multiple samples into integer types, so that the ciphertext computing node aggregates, for each binning result of each feature, the integer gradients of the samples included in the binning based on the received binning result of the feature sent by each passive-side data node, to obtain an integer gradient aggregation result for the feature, where the binning result of each feature is obtained by binning the multiple samples by the passive-side data node having the feature;
the aggregation result receiving module is used for receiving the integer type gradient aggregation result of each characteristic sent by the ciphertext computing node ciphertext mode;
and the aggregation result conversion module is used for converting the integer type gradient aggregation result into a floating point type gradient aggregation result.
9. An electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: carrying out the method of any one of claims 1 to 3, or carrying out the method of any one of claims 4 to 5.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 3, or carries out the method of any one of claims 4 to 5.
CN202210041304.7A 2022-01-14 2022-01-14 Gradient aggregation method and device in GBDT model training and electronic equipment Active CN114065950B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210041304.7A CN114065950B (en) 2022-01-14 2022-01-14 Gradient aggregation method and device in GBDT model training and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210041304.7A CN114065950B (en) 2022-01-14 2022-01-14 Gradient aggregation method and device in GBDT model training and electronic equipment

Publications (2)

Publication Number Publication Date
CN114065950A true CN114065950A (en) 2022-02-18
CN114065950B CN114065950B (en) 2022-05-03

Family

ID=80230901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210041304.7A Active CN114065950B (en) 2022-01-14 2022-01-14 Gradient aggregation method and device in GBDT model training and electronic equipment

Country Status (1)

Country Link
CN (1) CN114065950B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364908A (en) * 2020-11-05 2021-02-12 浙江大学 Decision tree-oriented longitudinal federal learning method
CN113037460A (en) * 2021-03-03 2021-06-25 北京工业大学 Federal learning privacy protection method based on homomorphic encryption and secret sharing
CN113051557A (en) * 2021-03-15 2021-06-29 河南科技大学 Social network cross-platform malicious user detection method based on longitudinal federal learning
WO2021158313A1 (en) * 2020-02-03 2021-08-12 Intel Corporation Systems and methods for distributed learning for wireless edge dynamics
CN113407963A (en) * 2021-06-17 2021-09-17 北京工业大学 Federal learning gradient safety aggregation method based on SIGNSGD

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021158313A1 (en) * 2020-02-03 2021-08-12 Intel Corporation Systems and methods for distributed learning for wireless edge dynamics
CN112364908A (en) * 2020-11-05 2021-02-12 浙江大学 Decision tree-oriented longitudinal federal learning method
CN113037460A (en) * 2021-03-03 2021-06-25 北京工业大学 Federal learning privacy protection method based on homomorphic encryption and secret sharing
CN113051557A (en) * 2021-03-15 2021-06-29 河南科技大学 Social network cross-platform malicious user detection method based on longitudinal federal learning
CN113407963A (en) * 2021-06-17 2021-09-17 北京工业大学 Federal learning gradient safety aggregation method based on SIGNSGD

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DK学到头秃: "12.Paper小结——《BatchCrypt: Efficient Homomorphic Encryption for Cross-SiloFederated Learning》", 《HTTPS://BLOG.CSDN.NET/M0_57126939/ARTICLE/DETAILS/121084245》 *
UMAIR MOHAMMAD ETC.: "Asynchronous Task Allocation for Federated and Parallelized Mobile Edge Learning", 《ARXIV》 *
YONGJEONG OH ETC.: "Communication-Efficient Federated Learning via Quantized Compressed Sensing", 《ARXIV》 *
董业等: "基于秘密分享和梯度选择的高效安全联邦学习", 《计算机研究与发展》 *

Also Published As

Publication number Publication date
CN114065950B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
CN113038302B (en) Flow prediction method and device and computer storage medium
CN110740356B (en) Live broadcast data monitoring method and system based on block chain
CN112328962B (en) Matrix operation optimization method, device and equipment and readable storage medium
WO2023174018A1 (en) Vertical federated learning methods, apparatuses, system and device, and storage medium
Raja et al. Passivity analysis for uncertain discrete-time stochastic BAM neural networks with time-varying delays
Peng et al. Synchronization for the integer-order and fractional-order chaotic maps based on parameter estimation with JAYA-IPSO algorithm
Strelkovskaya et al. Different extrapolation methods in Problems of Forecasting
Huang et al. Accelerating federated edge learning via topology optimization
CN114065950B (en) Gradient aggregation method and device in GBDT model training and electronic equipment
Zhao et al. Generalized finite-time synchronization between coupled chaotic systems of different orders with unknown parameters
CN117196014B (en) Model training method and device based on federal learning, computer equipment and medium
CN112437022B (en) Network traffic identification method, device and computer storage medium
Elgamal et al. Framework for evaluating reliability of stochastic flow networks under different constraints
CN116362526B (en) Cloud edge cooperative resource management and control method and system for digital power plant
US11231961B2 (en) Scheduling operations
CN114118312B (en) Vertical training method, device, electronic equipment and system for GBDT model
CN116384513A (en) Yun Bianduan collaborative learning system and method
Godoy et al. A novel input design approach for systems with quantized output data
CN116306905A (en) Semi-supervised non-independent co-distributed federal learning distillation method and device
CN114329127A (en) Characteristic box dividing method, device and storage medium
Fan et al. Convergence analysis for sparse Pi-sigma neural network model with entropy error function
CN109446020B (en) Dynamic evaluation method and device of cloud storage system
CN114386533B (en) Transverse training method, device, electronic equipment and system for GBDT model
CN110163249A (en) Base station classifying identification method and system based on customer parameter feature
CN114118638B (en) Wind power plant power prediction method, GBDT model transverse training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant