CN114282688A - Two-party decision tree training method and system - Google Patents
Two-party decision tree training method and system Download PDFInfo
- Publication number
- CN114282688A CN114282688A CN202210196287.4A CN202210196287A CN114282688A CN 114282688 A CN114282688 A CN 114282688A CN 202210196287 A CN202210196287 A CN 202210196287A CN 114282688 A CN114282688 A CN 114282688A
- Authority
- CN
- China
- Prior art keywords
- vector
- gradient
- party
- fragment
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 101
- 238000003066 decision tree Methods 0.000 title claims abstract description 68
- 238000012549 training Methods 0.000 title claims abstract description 42
- 239000013598 vector Substances 0.000 claims abstract description 612
- 239000012634 fragment Substances 0.000 claims abstract description 208
- 238000004364 calculation method Methods 0.000 claims abstract description 36
- 238000013467 fragmentation Methods 0.000 claims description 27
- 238000006062 fragmentation reaction Methods 0.000 claims description 27
- 230000008569 process Effects 0.000 claims description 12
- 238000003860 storage Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 description 32
- 238000010586 diagram Methods 0.000 description 11
- 239000011159 matrix material Substances 0.000 description 10
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000005192 partition Methods 0.000 description 3
- 230000000644 propagated effect Effects 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000000712 assembly Effects 0.000 description 1
- 238000000429 assembly Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000010977 jade Substances 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The embodiment of the specification discloses a two-party decision tree training method and a two-party decision tree training system so as to protect data privacy of two parties. For any feature, the devices of the two parties can sort the first gradient vector and the second gradient vector according to the feature value of the feature by a secure sorting method based on the sorting vector, obtain the fragments and the groups of the first/second gradient sorting result vector, and calculate the fragments of the gradient sum corresponding to each group under the feature. The devices of the two parties interact according to a multi-party security calculation protocol, and the split gain slices respectively corresponding to the groups under the characteristics are calculated based on the gradients and the slices respectively corresponding to the groups under the characteristics. Furthermore, the devices of the two sides determine the characteristics and the groups corresponding to the maximum splitting gain through a multi-party safety comparison protocol, and split the nodes according to the characteristics and the groups corresponding to the maximum splitting gain.
Description
Technical Field
The present disclosure relates to the field of information technology, and in particular, to a two-way decision tree training method and system.
Background
In order to protect the privacy of data of all parties, a distributed training scheme is adopted in the field of machine learning. That is, any participant can train the model belonging to itself without revealing sample data held by any participant. The decision tree is a tree structure model, i.e., a tree model, and has a very wide application, for example, in decision making, classification, and the like of various scenes.
It is currently desirable to provide a distributed training scheme for decision trees.
Disclosure of Invention
One of the embodiments of the present specification provides a two-party decision tree training method, in which feature values and label values of one or more features of each sample in a sample set are vertically distributed in the two parties, the method is performed by a device of a first party, the first party is either one of the two parties, and a second party is the other one of the two parties; the method comprises the following steps:
splitting any node according to the following splitting steps:
obtaining a first slice of a landmark vector, a first slice of a first gradient vector, and a first slice of a second gradient vector of the node; the flag vector indicates samples belonging to the respective node, the first gradient vector includes a first gradient corresponding to the samples belonging to the respective node, and the second gradient vector includes a second gradient corresponding to the samples belonging to the respective node;
for any feature of the first party:
sorting the sample set according to the characteristic value of the characteristic and obtaining a first arrangement vector, wherein the arrangement vector is used for identifying the operation of sorting the equal-length sequence, and the element of the arrangement vector indicates the position of the data corresponding to the position in the equal-length sequence in a sorting result sequence; based on a first fragment of a first gradient vector, the first arrangement vector and a second fragment of a second party based on the first gradient vector, arranging the first gradient vector according to the characteristic value by a safe arrangement method to obtain a first fragment of the first gradient arrangement result vector, and simultaneously obtaining a second fragment of the first gradient arrangement result vector by the second party; based on the first fragment of the second gradient vector, the first arrangement vector and the second fragment of the second party based on the second gradient vector, arranging the second gradient vector according to the characteristic value by a safe arrangement method to obtain the first fragment of the second gradient arrangement result vector, and simultaneously obtaining the second fragment of the second gradient arrangement result vector by the second party;
dividing elements in a first segment of the first gradient sorting result vector and elements in a first segment of the second gradient sorting result vector according to a preset dividing mode to obtain a plurality of first groups of the first segment of the first gradient vector and a plurality of first groups of the first segment of the second gradient vector, and for each of the plurality of first groups: calculating the sum of elements included in the first group of the first segment of the first gradient vector to obtain a first segment of a first gradient sum corresponding to the first group; calculating the sum of elements included in the first group of the first segment of the second gradient vector to obtain a first segment of a second gradient sum corresponding to the first group;
for any feature of the second party:
arranging the first gradient vector according to the characteristic value of the characteristic by a safe arrangement method based on the first fragment of the first gradient vector and the second fragment and the second arrangement vector of the second party based on the first gradient vector to obtain the first fragment of the first gradient ordering result vector, and simultaneously obtaining the second fragment of the first gradient ordering result vector by the second party; based on the first fragment of the second gradient vector, and the second fragment and the second arrangement vector of the second party based on the second gradient vector, arranging the second gradient vector according to the characteristic value by a safe arrangement method to obtain the first fragment of the second gradient ordering result vector, and simultaneously obtaining the second fragment of the second gradient ordering result vector by the second party; the second arrangement vector is obtained by ordering the sample set by a second party according to the characteristic value of the characteristic;
dividing elements in a first segment of the first gradient sorting result vector and elements in a first segment of the second gradient sorting result vector according to a preset dividing mode to obtain a plurality of second groups of the first segment of the first gradient vector and a plurality of second groups of the first segment of the second gradient vector, and for each of the plurality of second groups: calculating the sum of the elements included in the second grouping of the first patches of the first gradient vector to obtain first patches of the first gradient sum corresponding to the second grouping; calculating the sum of the elements included in the second grouping of the first patches of the second gradient vector to obtain first patches of second gradient sums corresponding to the second grouping;
interacting with equipment of a second party according to a multi-party safety calculation protocol to calculate first fragments of splitting gains corresponding to the groups under the characteristics respectively on the basis of first fragments of a first gradient sum and first fragments of a second gradient sum corresponding to the groups under the characteristics respectively;
interacting with equipment of a second party according to a multi-party safety comparison protocol, determining the maximum splitting gain based on the first fragments of the splitting gains corresponding to the groups under the characteristics, and recording the splitting information of the nodes according to the characteristics and the groups corresponding to the maximum splitting gain;
when the maximum splitting gain corresponds to a feature of a first party, generating a left sub-tree vector and a right sub-tree vector of the node, the left sub-tree vector indicating samples in a left subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the right sub-tree vector indicating samples in a right subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the left subset corresponding to the left sub-tree, and the right subset corresponding to the right sub-tree; splitting the left sub-tree vector into a first slice and a second slice, and sending the second slice of the left sub-tree vector to a device of a second party; splitting the right sub-tree vector into a first slice and a second slice, and sending the second slice of the right sub-tree vector to a device of a second party; receiving, from a device of a second party, a first tile of a left sub-tree vector and a first tile of a right sub-tree vector of the node when the maximum splitting gain corresponds to a feature of the second party;
interacting with a device of a second party according to a multi-party secure computing protocol to compute a first shard of a landmark vector of the node based on the first shard of the landmark vector of the node and the first shard of the left sub-tree vector; interacting with a device of a second party according to a multi-party secure computation protocol to compute a first shard of a landmark vector of the node based on the first shard of the landmark vector of the node and the first shard of the right subtree vector;
interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a first gradient vector of a left sub-tree of the node based on the first tile of the first gradient vector of the node and a first tile of a landmark vector of the left sub-tree of the node; interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a second gradient vector of a left sub-tree of the node based on the first tile of the second gradient vector of the node and the first tile of the landmark vector of the left sub-tree of the node;
interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a first gradient vector of a right sub-tree of the node based on the first tile of the first gradient vector of the node and a first tile of a landmark vector of the right sub-tree of the node; interacting with a device of a second party in accordance with a multi-party secure computing protocol to compute a first tile of a second gradient vector of a right sub-tree of the node based on the first tile of the second gradient vector of the node and the first tile of the landmark vector of the right sub-tree of the node.
One embodiment of the present specification provides a two-party decision tree training system, in which feature values and label values of one or more features of each sample in a sample set are vertically distributed in the two parties, and the system is implemented on a device of a first party, where the first party is either one of the two parties, and the second party is the other one of the two parties; the system comprises a first obtaining module, a first arranging module, a second arranging module, a first grouping gradient and fragment computing module, a second grouping gradient and fragment computing module, a splitting gain comparison module, a left sub-tree vector fragment obtaining module, a right sub-tree vector fragment computing module, a sub-node sign vector fragment computing module and a sub-node gradient vector fragment computing module.
For any node that is split:
the first obtaining module is used for obtaining a first segment of the mark vector of the node, a first segment of a first gradient vector and a first segment of a second gradient vector; the flag vector indicates samples belonging to the respective node, the first gradient vector includes a first gradient corresponding to the samples belonging to the respective node, and the second gradient vector includes a second gradient corresponding to the samples belonging to the respective node;
for any feature held by the first party, the first arrangement module is to: sorting the sample set according to the characteristic value of the characteristic and obtaining a first arrangement vector, wherein the arrangement vector is used for identifying the operation of sorting the equal-length sequence, and the element of the arrangement vector indicates the position of the data corresponding to the position in the equal-length sequence in a sorting result sequence; based on a first fragment of a first gradient vector, the first arrangement vector and a second fragment of a second party based on the first gradient vector, arranging the first gradient vector according to the characteristic value by a safe arrangement method to obtain a first fragment of the first gradient arrangement result vector, and simultaneously obtaining a second fragment of the first gradient arrangement result vector by the second party; based on the first fragment of the second gradient vector, the first arrangement vector and the second fragment of the second party based on the second gradient vector, arranging the second gradient vector according to the characteristic value by a safe arrangement method to obtain the first fragment of the second gradient arrangement result vector, and simultaneously obtaining the second fragment of the second gradient arrangement result vector by the second party;
for any feature held by the first party, the first packet gradient and fragmentation computation module is to: dividing elements in a first segment of the first gradient sorting result vector and elements in a first segment of the second gradient sorting result vector according to a preset dividing mode to obtain a plurality of first groups of the first segment of the first gradient vector and a plurality of first groups of the first segment of the second gradient vector, and for each of the plurality of first groups: calculating the sum of elements included in the first group of the first segment of the first gradient vector to obtain a first segment of a first gradient sum corresponding to the first group; calculating the sum of elements included in the first group of the first segment of the second gradient vector to obtain a first segment of a second gradient sum corresponding to the first group;
for any feature held by a second party, the second ranking module is to: arranging the first gradient vector according to the characteristic value of the characteristic by a safe arrangement method based on the first fragment of the first gradient vector and the second fragment and the second arrangement vector of the second party based on the first gradient vector to obtain the first fragment of the first gradient ordering result vector, and simultaneously obtaining the second fragment of the first gradient ordering result vector by the second party; based on the first fragment of the second gradient vector, and the second fragment and the second arrangement vector of the second party based on the second gradient vector, arranging the second gradient vector according to the characteristic value by a safe arrangement method to obtain the first fragment of the second gradient ordering result vector, and simultaneously obtaining the second fragment of the second gradient ordering result vector by the second party; the second arrangement vector is obtained by ordering the sample set by a second party according to the characteristic value of the characteristic;
for any feature held by the second party, the second packet gradient and shard calculation module is to: dividing elements in a first segment of the first gradient sorting result vector and elements in a first segment of the second gradient sorting result vector according to a preset dividing mode to obtain a plurality of second groups of the first segment of the first gradient vector and a plurality of second groups of the first segment of the second gradient vector, and for each of the plurality of second groups: calculating the sum of the elements included in the second grouping of the first patches of the first gradient vector to obtain first patches of the first gradient sum corresponding to the second grouping; calculating the sum of the elements included in the second grouping of the first patches of the second gradient vector to obtain first patches of second gradient sums corresponding to the second grouping;
the split gain slice calculation module is used for interacting with equipment of a second party according to a multi-party security calculation protocol so as to calculate first slices of split gains corresponding to groups under each characteristic respectively based on first slices of a first gradient sum and first slices of a second gradient sum corresponding to the groups under each characteristic respectively;
the splitting gain comparison module is used for interacting with equipment of a second party according to a multi-party safety comparison protocol, determining the maximum splitting gain based on the first fragmentation of the splitting gain corresponding to each group under each characteristic, and recording the splitting information of the node according to the characteristic and the group corresponding to the maximum splitting gain;
the left and right subtree vector sharding obtaining module is used for: when the maximum splitting gain corresponds to a feature of a first party, generating a left sub-tree vector and a right sub-tree vector of the node, the left sub-tree vector indicating samples in a left subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the right sub-tree vector indicating samples in a right subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the left subset corresponding to the left sub-tree, and the right subset corresponding to the right sub-tree; splitting the left sub-tree vector into a first slice and a second slice, and sending the second slice of the left sub-tree vector to a device of a second party; splitting the right sub-tree vector into a first slice and a second slice, and sending the second slice of the right sub-tree vector to a device of a second party; receiving, from a device of a second party, a first tile of a left sub-tree vector and a first tile of a right sub-tree vector of the node when the maximum splitting gain corresponds to a feature of the second party;
the child node marker vector fragment calculation module is configured to: interacting with a device of a second party according to a multi-party secure computing protocol to compute a first shard of a landmark vector of the node based on the first shard of the landmark vector of the node and the first shard of the left sub-tree vector; interacting with a device of a second party according to a multi-party secure computation protocol to compute a first shard of a landmark vector of the node based on the first shard of the landmark vector of the node and the first shard of the right subtree vector;
the child node gradient vector segment calculation module is configured to: interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a first gradient vector of a left sub-tree of the node based on the first tile of the first gradient vector of the node and a first tile of a landmark vector of the left sub-tree of the node; interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a second gradient vector of a left sub-tree of the node based on the first tile of the second gradient vector of the node and the first tile of the landmark vector of the left sub-tree of the node; interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a first gradient vector of a right sub-tree of the node based on the first tile of the first gradient vector of the node and a first tile of a landmark vector of the right sub-tree of the node; interacting with a device of a second party in accordance with a multi-party secure computing protocol to compute a first tile of a second gradient vector of a right sub-tree of the node based on the first tile of the second gradient vector of the node and the first tile of the landmark vector of the right sub-tree of the node.
One of the embodiments of the present specification provides a two-way decision tree training apparatus, which includes a processor and a storage device, where the storage device is used to store instructions, and when the processor executes the instructions, the two-way decision tree training apparatus implements a two-way decision tree training method according to any embodiment of the present specification.
Drawings
The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:
FIG. 1 is a schematic diagram of an application scenario of a model training system according to some embodiments of the present description;
FIG. 2 is a schematic diagram of a tree model of parties A and B and their corresponding equivalent models, shown in accordance with some embodiments of the present description;
FIG. 3 is a schematic diagram of inputs and variable initialization for two-way decision tree training in accordance with some embodiments of the present description;
4-6 are exemplary flow diagrams of node splitting shown in accordance with some embodiments of the present description;
FIG. 7 is an exemplary flow diagram illustrating the computation of a shard of leaf node weights for an equivalent model according to some embodiments of the present description;
FIG. 8 is an exemplary flow diagram illustrating computing a patch of gradient vectors for training a next tree in accordance with some embodiments of the present description;
FIG. 9 is a schematic diagram of partitioning left and right subsets, according to some embodiments of the present description;
FIG. 10 is a block diagram of a two-way decision tree training system in accordance with some embodiments of the present description.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.
It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.
As used in this specification, the terms "a", "an" and/or "the" are not intended to be inclusive of the singular, but rather are intended to be inclusive of the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.
Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.
First, the relevant knowledge of the decision tree is introduced.
The nodes of the decision tree may be divided into split nodes having child nodes (e.g., left and right sub-trees) and leaf nodes (leaf nodes), and each split node may correspond to a feature, which may be referred to as an associated feature of the split node. The parameters of a split node (which may be referred to as split parameters) may include a threshold for determining to which child node a sample/prediction object belongs, the threshold being related to the associated feature of the split node, e.g., a certain feature value of the associated feature of the split node may be used as the threshold.
The decision tree based model (i.e., tree model) may include one or more decision trees, wherein the tree model including a plurality (two or more) of decision trees includes a tree model under an XGB (eXtreme Gradient Boosting) framework.
For the regression problem, each leaf node may correspond to a score (which may be referred to as a leaf node score/leaf node weight), the weights of all leaf nodes on a single decision tree constitute a leaf node weight vector of the decision tree, and the weight of a leaf node (which may be referred to as a predicted leaf node) where a predicted object arrives along a predicted path on the decision tree may be obtained based on the inner product of the leaf node weight vector of the decision tree and the predicted leaf node vector of the predicted object corresponding to the decision tree. The predicted leaf node vector of the predicted object corresponding to the decision tree indicates the weight of the leaf node reached by the predicted object along the predicted path on the decision tree, and it can be understood that the bit number (dimension) of the predicted leaf node vector corresponding to a single decision tree is consistent with the number of leaf nodes of the decision tree, and usually the predicted leaf node vector position 1 corresponding to the leaf node reached by the predicted object along the predicted path on the decision tree and the rest of the predicted leaf node vector positions 0 are used, so that the weight of the predicted leaf node is equal to the inner product of the leaf node weight vector and the predicted leaf node vector. Further, one tree model may correspond to one base score. Then, the prediction score of the predicted object (e.g., sample) may be calculatedCalculating, where pred denotes the prediction score of the predicted object, f0Representing the base score of the tree model, T (being ≧ 1) representing the number of decision trees contained by the tree model, WtLeaf node weight vector, S, representing a single decision treetRepresenting the correspondence of a single decision treeA predicted leaf node vector of the predicted object,indicating that the vector inner product is calculated.
In some embodiments, the prediction score of the prediction object may be used as the prediction value of the prediction object. In other embodiments, the prediction score of the predicted object may be processed using a non-linear function (e.g., sigmoid function), the output of which is the predicted value of the predicted object.
In some embodiments, a single decision tree may be trained as a tree model, with the prediction score of a predicted object being equal to the sum of the weights of the leaf nodes that the predicted object reaches along the predicted path on the decision tree and the base score of the tree model. In some embodiments, for example, under an XGB (eXtreme Gradient Boosting) framework, a plurality of decision trees may be trained as a tree model, and a prediction score of a prediction object is equal to a sum of a weight of a leaf node of the prediction object arriving along a prediction path on the plurality of decision trees and a base score of the tree model.
As mentioned previously, a split node may be split into left and right subtrees based on some feature value of some feature. Accordingly, starting from the root node, the samples in the sample set are divided into various child nodes until the samples are divided into leaf nodes, at which point the predicted scores of the samples may be determined based on the base scores of the decision tree and the weights of the leaf nodes at which the samples arrive. The splitting gains corresponding to the splitting of the same splitting node according to different characteristic values of different characteristics are different, so that the splitting gain corresponding to which characteristic value of which characteristic a node is split is better judged according to the magnitude of the splitting gain corresponding to different characteristic values of different characteristics respectively in the training process.
In particular, the splitting gain may reflect a decreasing value of the objective function when a node is split by a certain eigenvalue of a certain characteristic. The splitting (or training) of the decision tree aims to make the objective function value after splitting smaller than the objective function value before splitting as much as possible and make the difference between the two as large as possible, so that the feature and the feature value corresponding to the larger splitting gain can be selected to split the node in the training process. The objective function is derived based on at least the loss function (reflecting the difference between the predicted value and the tag value) of each sample in the sample set, e.g. the objective function may be the sum of the loss functions of each sample. The loss function can be further described by using a first gradient and a second gradient, the corresponding objective function is equivalent to the sum of sub-objective functions respectively corresponding to each leaf node on a single decision tree, and the sub-objective function corresponding to each leaf node can be obtained based on the first gradient sum and the second gradient sum corresponding to the leaf node. Wherein the first gradient is a first order gradient related to a loss function of the sample, the second gradient is a second order gradient related to a loss function of the sample, the first gradient sum is a sum of first gradients of the samples belonging to the corresponding node, and the second gradient sum is a sum of second gradients of the samples belonging to the corresponding node. It is understood that the original node of the split and the left and right subtrees of the split are regarded as leaf nodes at different time instants when the split gain is calculated, i.e. the leaf nodes regarded as the decision tree are increased by one.
When a certain node is split, two left and right subtrees can grow under an original node, and the gain of the objective function can be interpreted as a reduction value of an objective function value corresponding to the split decision tree relative to an objective function value corresponding to the original decision tree. Combining the splitting gain, the objective function, the sub-objective functions, and the relationship between the gradient sums, the splitting gain can be obtained based on the first gradient sum and the second gradient sum corresponding to the left sub-tree and the first gradient sum and the second gradient sum corresponding to the right sub-tree.
For more details on the splitting gain, reference may be made to the related description below.
FIG. 1 is a schematic diagram of an application scenario of a model training system according to some embodiments of the present description. As shown in fig. 1, the system 100 may include an a-party device 110, a B-party device 120, a third party server 130, and a network 140.
And the characteristic value and the label value of one or more characteristics of each sample in the sample set are vertically distributed on the A side and the B side. The vertical distribution refers to that data (such as characteristic values of characteristics of a sample and tag values) are distributed in different databases according to data types (such as different characteristics of height, age and the like, and tag values). As an example, party a holds a feature value of at least one feature (e.g., height) of each sample in the set of samples and a label value of each sample in the set of samples, and party B holds a feature value of at least one feature (e.g., age) of each sample in the set of samples. It is understood that the features, feature values, and tag values held by party a belong to party a's privacy and the features and feature values held by party B belong to party B's privacy.
During the two-party model training process, neither party a nor party B wishes to expose their private data to the other. In order to protect data privacy of both parties, input (such as label values of samples) and output (such as gradient vectors, predicted scores of samples, leaf node weights and the like) of a plurality of computing links involved in a training process are all stored in equipment of both parties in a slicing mode, and the party A and the party B respectively execute one slice.
For the decision tree, the tree models trained by the a-party and the B-party may have the same structure, such as the number of nodes, the connection relationship between the nodes, the positions of the nodes, and the like. However, the tree models of party a and party B have different parameters. For convenience of illustration, the description often refers to equivalent models corresponding to a tree model of party a and a tree model of party B, which can be obtained by performing centralized training based on sample data held by party a and party B, respectively. The equivalent model has complete parameters, and the parameters of the tree model of the A side/B side are equivalently split from the parameters of the equivalent model.
Specifically, the method comprises the following steps: the tree model of any party only has the parameters of partial split nodes, namely, the tree model of any party comprises local split nodes with the parameters and non-local split nodes without the parameters; the weight of a leaf node in any one of the tree models is equivalent to a fragment of the weight of a corresponding leaf node in the equivalent model, the basic score of any one of the tree models is equivalent to a fragment of the basic score of the equivalent model, and from the vector perspective, the leaf node weight vector of any one of the tree models is equivalent to a fragment of the leaf node weight vector of the equivalent model.
Referring to FIG. 2, the split nodes are represented by circles and the leaf nodes are represented by rectangles. Party a holds feature X1 and party B holds feature X2, and accordingly, the tree model for party a has local split nodes and parameters p1 corresponding to feature X1 and the tree model for party B has local split nodes and parameters p2 corresponding to feature X2. As shown in fig. 2, the local split node of the a-side can be denoted as (X1, p1), the local split node of the B-side can be denoted as (X2, p2), the leaf nodes can be denoted as leaf, and the leaf nodes in the same position have the same number. Taking a binary tree as an example, the parameters of a split node may include a threshold associated with a feature corresponding to the node, e.g., the parameters of a node corresponding to a feature of age may include a threshold for distinguishing age groups.
As shown in fig. 2: for party A, the weight of leaf1 is w11The weight of leaf2 is w21The weight of leaf3 is w31The basic score of the tree model is< f0>1(ii) a For the B-side, the weight of leaf1 is w12The weight of leaf2 is w22The weight of leaf3 is w32The basic score of the tree model is< f0>2(ii) a For the equivalent model, the weight of leaf1 is w1The weight of leaf2 is w2The weight of leaf3 is w3The basic score is f0. In some embodiments, w may be satisfied11+ w12= w1、w21+ w22= w2、w31+ w32= w3And< f0>1+< f0>2= f0. From the leaf node weight vector's perspective, satisfy (w)11, w21, w31)+ (w12, w22, w32)= (w1, w2, w3)。
Assume that the predicted leaf node vector of the prediction object is(s)1,s2,s3) Then predicting the predicted score of the object WhereinI.e. leaf node weight vector (w)1, w2, w3) And predicting leaf node vectors(s)1,s2,s3) The inner product of (d). Assuming leaf2 is a leaf node, the leaf node vector for the prediction object is (0,1,0), and the prediction score for the prediction object is (0,1,0)。
When both the tree model and the equivalent model include TWhen the tree is decided, for the equivalent model, each tree corresponds to an inner product of a leaf node weight vector and a predicted leaf node vector, and the inner products corresponding to the T trees respectively and the basic score of the equivalent model are summed to obtain the predicted score of the predicted object.
In some embodiments, the servers may be independent servers or groups of servers, which may be centralized or distributed. In some embodiments, the server may be regional or remote. In some embodiments, the server may execute on a cloud platform. For example, the cloud platform may include one or any combination of a private cloud, a public cloud, a hybrid cloud, a community cloud, a decentralized cloud, an internal cloud, and the like.
The third party server 130 may assist the a-party device 110 and the B-party device 120 in running a two-party secure multiplication protocol. Multiplication is often involved in the two-party model training process, when one factor of the product belongs to the private data of the A party and the other factor belongs to the private data of the B party, the device of any party cannot directly calculate the product, and the product can interact with the computing device of the other party according to the two-party safe multiplication protocol to obtain one fragment of the product based on the private data calculation of the own party. That is, party a and party B each obtain one slice of the product.
Network 140 connects the various components of the system so that communication can occur between the various components. The network between the various parts in the system may include wired networks and/or wireless networks. For example, network 140 may include a cable network, a wired network, a fiber optic network, a telecommunications network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Public Switched Telephone Network (PSTN), a bluetooth network, a ZigBee network (ZigBee), Near Field Communication (NFC), an intra-device bus, an intra-device line, a cable connection, and the like, or any combination thereof. The network connection between each two parts may be in one of the above-mentioned ways, or in a plurality of ways.
Fig. 3-8 are exemplary flow diagrams of a two-way decision tree training method according to some embodiments described herein.
First, the representation in fig. 3 to 8 will be explained: (1) the symbol { } represents a vector, a matrix or a set, and when the dimension of the vector/set is M or a certain dimension of the matrix is M, it indicates that the M dimensions correspond to M samples in a sample set one by one, for convenience of description, the M samples may be represented by numbers 1 to M, and the dimensions of the vector, the matrix and the set are corresponding to the numbers of the M samples one by one; (2) 3-8 have a central dotted line, the steps indicated in the rounded boxes to the left of the dotted line can be performed independently by party A, the steps indicated in the rounded boxes to the right of the dotted line can be performed independently by party B, and the steps in the rounded boxes passing through the dotted line are performed cooperatively by party A and party B and each obtain a slice of sliced data; (3) the feature is denoted as j and the feature value is denoted as k, and since each group corresponds to one feature value, the group can also be denoted as k.
As shown in FIG. 3, the input held by party A includes the feature data of the M samples, and the feature data of each sample includes NAThe feature value of each feature, party a, also holds the label values of the M samples. Wherein the storage form of the characteristic data of the M samples held by the party AMay be M NAFeature matrix of dimension (denoted as { X _ A)ijAnd X _ a contains the feature values of the M samples for the a-side feature), and the storage form of the tag values of the M samples held by the a-side can be a tag vector (denoted by { y } dimension) in M × 1 dimensionsiY denotes the label value of the sample). The input held by the party B comprises the characteristic data of the M samples, and the characteristic data of each sample comprises NBThe feature value of each feature, and the storage form of the feature data of M samples held by the B party may be M × NBA feature matrix of the dimension. The feature data and the label values of the M samples owned by the party a and the party B may be aligned in advance, for example, the sample numbers of the feature data and the label values of the two parties are in one-to-one correspondence or the same, and the one-to-one correspondence or the same sample number corresponds to the same sample.
The initialization phase in figure 3 shows the preparation steps that parties a and B do for the splitting phase.
The device on the side of the initialization phase A can calculate the basic score (denoted as f) of the equivalent model0) Initialization is performed. In some embodiments, party a may average the label values of the M samples, and use the average as the base score of the equivalent model. It is worth noting that the base score of the equivalent model can be used as the initial sample prediction value, and the A-side device can calculate the first gradient vector { g ] of the root node based on the initial sample prediction value and the sample label valueiH and a second gradient vector hi}. The first gradient vector of any node comprises a first gradient g corresponding to the samples belonging to the node, and the second gradient vector of any node comprises a second gradient h corresponding to the samples belonging to the node. It will be appreciated that the number of bits (dimension) of the first/second gradient vector may be M, each bit corresponding to one of the M samples, the gradient vector bit corresponding to a sample not belonging to the respective node being 0. It should be noted that although the sample i does not belong to a node, the element corresponding to the sample i in the first/second gradient vector of the node is derived from the first gradient g corresponding to the sample iiSecond gradient hiBecomes 0, but the element in the first/second gradient vector of the node corresponding to sample i is still available as g for ease of descriptioni/ hiTo indicate.
For example, the first/second gradient vector of the first split root node includes the first/second gradients corresponding to the M samples, and it is not assumed that M =4, and then the first gradient vector of the root node may be denoted as (g)1, g2, g3, g4) The second gradient vector of the root node may be denoted as (h)1, h2, h3, h4). If the root node is split according to a certain eigenvalue of a certain characteristic, such that samples numbered 1 and 4 are divided into the left sub-tree of the root node, and samples numbered 2 and 3 are divided into the right sub-tree of the root node, the first gradient vector and the second gradient vector of the left sub-tree of the root node are (g) respectively1,0,0, g4)、(h1,0,0, h4) The first gradient vector and the second gradient vector of the right subtree of the root node are (0, g) respectively2, g3,0)、(0, h2, h3,0)。
Initialization phase A-side equipment may be directed to a token vector of nodes siAnd initializing, wherein the mark vector of the initialized node is the mark vector of the root node. The flag vector for any node indicates the samples belonging to that node. It will be appreciated that the number of bits (dimension) of the flag vector for a node may be M, each bit corresponding to one of the M samples, typically the flag vector bit for the sample belonging to the respective node is 1 and the flag vector bit for the sample not belonging to the respective node is 0. For example, the flag vector bits of the root node are all 1. If the root node is split according to a certain eigenvalue of a certain characteristic such that samples numbered 1 and 4 are divided into the left subtree of the root node and samples numbered 2 and 3 are divided into the right subtree of the root node, the flag vector of the left subtree of the root node is (1,0,0,1) and the flag vector of the right subtree of the root node is (0,1,1, 0).
To avoid divulgence, the device of party a in the initialization phase may assign a base score f of the equivalent model0First gradient vector of root node { giH, a second gradient vector of the root nodei}, a flag vector of root nodes { si}, label vector yiAnd splitting the two fragments into two fragments allocated to the two parties respectively. Should understand thatThe splitting of the vector or matrix comprises splitting each element of the vector or matrix, i.e. each element of the vector or matrix is also split into two slices allocated to both parties. Likewise, encrypting/decrypting the vector or matrix includes encrypting/decrypting each element of the vector or matrix. As shown in fig. 3, subscript a corresponds to the partition allocated to party a, subscript B corresponds to the partition allocated to party B, and a device on party a needs to allocate to a partition on party B<f0>B、{<gi>B}、{<hi>B}、{<si>B}、{<yi>BSending to B-party's equipment and reserving at least the fragment allocated to itself<f0>A、{<gi>A}、{<hi>A}、{<si>A}、{<yi>A}. Wherein,<f0>Acan be used as an initial score for the tree model for party a,<f0>Bcan be used as an initial score of the tree model of party B.
It should be noted that, for a tree model under an XGB (eXtreme Gradient Boosting) framework, the initialized first/second Gradient vector is the first/second Gradient vector of the root node of the first trained tree, and the flag vector bits of the root node of each tree are all 1.
Fig. 4 to 6 respectively show three links involved in (decision tree) node splitting in the training of the two-sided model: calculating the fragmentation of the fragmentation gain, comparing the fragmentation gain and recording the fragmentation information. These three links are described in turn below.
Referring to fig. 4, fig. 4 illustrates a flow of computing a slice of a gradient sum and computing a slice of a splitting gain based on the slice of the gradient sum by taking a B-side feature as an example.
For any feature j of the B-party, the device of the B-party may sort the samples in the sample set { i } according to the feature value of the feature j (for example, sort the samples according to the feature value from small to large or from large to small), and may obtain a corresponding permutation vector after sorting, which may be referred to as a first permutation vector, and is used in fig. 4And (4) showing. It will be appreciated that for any feature j of party a, the device of party a may order the samples in the sample set { i } and obtain a corresponding permutation vector according to the same method as described above, and the permutation vector obtained by party a may be referred to as a second permutation vector. The permutation vector may be used to identify operations that sort long sequences (i.e., sequences of equal length to the permutation vector), with elements in the permutation vector indicating positions in the sorted result sequence of data corresponding to positions in the sorted equal length sequences. As an example, vectors { a, b, a, c, b } are ordered in ascending dictionary to obtain vectors { a, a, b, b, c } and ordered vectorsIs {0,2,1,4,3},where 0 denotes that the position of the 0 th element in the arranged equal-length sequence in the sorting result sequence is 0 th bit, 2 denotes that the position of the 1 st element in the arranged equal-length sequence in the sorting result sequence is 2 nd bit, and other elements 1,4, and 3 in the arrangement vector are also similar identifiers. Based on permutation vector{0,2,1,4,3} is arranged for another sequence of equal length { b, c, a, c, b }, so that the corresponding sequence of ordering results { b, a, c, b, c } is obtained.
After the permutation vector is obtained, the first/second gradient vectors may be permuted by a secure permutation method based on the permutation vector to obtain a first/second gradient ordering result vector, and it can be understood that the first/second gradient vectors are permuted based on the first permutation vector, that is, the first/second gradient vectors are ordered according to the feature value of the feature j. The secure permutation method is a method for securely permuting a sequence in a secret manner, and a secret sharing sequence can be permuted in a secret manner by using a permutation vector private to one of a plurality of parties (for example, party B), and an obtained sequencing result is still in a secret sharing form (the method can also be called as an inadvertent permutation). The security permutation method can be further described as an ObliviousPerm function or operator:,<>the representation of the ciphertext form, which may specifically be in the form of a slice,which represents the sequence to be arranged,representing the sequence of the ordering result. An ObliviousPerm function may be understood as a function whose set of input data includes a permutation functionAnd a first fragment of the sequence to be arrangedFrom a participant whose other set of input data comprises a second slice of the sequence to be arrangedFrom another party; the output data of which is a first slice comprising a sequence of ordered resultsAnd a second sectionThe first fragment of the result sequence is obtained by the one participant and the second fragment of the ordered result sequence is obtained by the other participant. At present, various implementation modes of the ObliviousPerm function exist, and the description does not limit the internal implementation algorithm and only calls the ObliviousPerm function as a black box operator. It should be understood that, all the ways that the data processing/computing unit, the program code, the machine learning model and the like capable of implementing the obreviousperm function can be mentioned as the present specification, which are present at present and will appear in the futureBased on a secret sharing arrangement protocol.
Taking a first participant and a second participant in cooperation as an example, where the first participant holds a target permutation vector and a first slice of a data column to be sorted, and the second participant holds a second slice of the data column to be sorted, an implementation manner of the obreviousserver function may include:
a first participant obtains, from a trusted third party, a first alignment vector, a first tile of a first data column, and a first tile of a first sequence of results that orders the first data column based on the first alignment vector. The second participant obtains a second slice of the first data column and a second slice of the first sequence of results from the trusted third party.
The first participant determines a second permutation vector based on the target permutation vector and the first permutation vector and sends it to the second participant; in some embodiments, the target permutation vector is ordered based on the first permutation vector, and the resulting sequence may be the second permutation vector.
The first participant makes a difference between the first fragment of the data column to be sorted and the first fragment of the first data column to obtain the first fragment of the second data column, and simultaneously obtains the second fragment of the second data column from the second party; the second slice of the second data column is obtained by the second participant subtracting the second slice of the data column to be sorted from the second slice of the first data column.
The first participant obtains a second data column based on the first fragment of the second data column and the second fragment of the second data column, and sorts the second data column based on the target permutation vector to obtain a second result data column; sorting the first slices of the first result data column based on a second permutation vector to obtain a third result data column; and finally, summing the second result data column and the third result data column to obtain a first fragment of the target data column.
The second participant orders the second slices of the first result sequence based on the second permutation vector to obtain second slices of the target data column. The target data column is equal to a result sequence of sorting the data columns to be sorted based on the target permutation vector.
First gradient vector giH and a second gradient vector hiThe first gradient vector { g } is distributed in the A-side device and the B-side device in a slicing mode, and based on the distribution, the first gradient vector { g } is obtainediArranging by way of example, instructions or procedures implemented by an inadvertent arrangement operator may include: b-party fragment based on first gradient vector<gi>BAnd the first alignment vectorAnd party A fragment a last based on the first gradient vector<gi>ASecret sorting is carried out, and then a first gradient sorting result vector is obtained by a B partyFirst segment ofWhile the party A obtains the second section. For the second gradient vector hiArranging, and partitioning by B square based on second gradient vector<hi>BAnd the first alignment vectorAnd party A fragment the last page based on the second gradient vector<hi>ASecret sorting is carried out, and then a second gradient sorting result vector is obtained by the B partyFirst segment ofWhile the party A obtains the second section。
For the features of party A, party B tiles based on the first gradient vector<gi>BAt a predetermined distance from the origin of the image, and party A sharding based on a first gradient vector<gi>AThe second permutation vector and the first gradient ordering result vector are obtained by the B sideFirst segment ofWhile the party A obtains the second section. For the second gradient vector hiArranging, and partitioning by B square based on second gradient vector<hi>BAnd party A is based on second gradient vector piece-<hi>ACarrying out secret sorting on the obtained data and a second arrangement vector, and then obtaining a second gradient sorting result vector by the B partyFirst segment ofWhile the party A obtains the second section。
Thus, for the current node, the A party and the B party obtain the first/second gradient sorting result vector fragments corresponding to all the features.
For the first/second gradient sorting result vector fragments of the B-party device and the first/second gradient sorting result vector fragments of the a-party device, the B-party device and the a-party device may divide elements in their respective local vector fragments according to a preset frequency (e.g., 10 are a group) (i.e., an equal frequency binning or equal width binning algorithm), the B-party device may obtain a plurality of first packets of the fragments of the first/second gradient vectors, and the a-party device may obtain a plurality of second packets of the fragments of the first/second gradient vectors. It can be understood that the vector lengths of the first/second gradient vector fragment of the party a and the first/second gradient vector fragment of the party B are the same, the number of the obtained groups is the same, and the elements at the same position in each group correspond to the same sample in the sample set. For convenience of description, the number of the divided packets is denoted as K, and the number of the K packets is 1-K. In some embodiments, a threshold corresponding to each group may also be determined based on the feature value of the sample in the group (e.g., equal to the mean of the feature values, or equal to the feature value of a certain sample in the group), and the threshold corresponding to the group may be used as a splitting parameter according to which a subsequent node is split.
For the B-side device and the A-side device, each group corresponds to a fragment of a first gradient sum and a fragment of a second gradient sum respectively, the fragment of the first gradient sum corresponding to any group is the sum of elements corresponding to samples belonging to the first group in the fragment of the first gradient vector, and the fragment of the second gradient sum corresponding to any group is the sum of elements corresponding to samples belonging to the first group in the fragment of the second gradient vector.
To this end, for the current node, the party a and the party B obtain the slice of the first gradient sum and the slice of the second gradient sum of all the groups of the features.
For any node, the A party and the B party can interact according to a multi-party security computation protocol, so that the fragmentation of the splitting gain corresponding to each group under each characteristic is computed based on the fragmentation of the first gradient sum and the fragmentation of the second gradient sum corresponding to each group under each characteristic, and the A party and the B party respectively execute one fragmentation of the same splitting gain. Wherein, since each group corresponds to one eigenvalue, each splitting gain actually corresponds to one eigenvalue k of one eigenvalue and one eigenvalue j. And the splitting gain corresponding to the characteristic j and the characteristic value k of the characteristic j reflects the descending value of the objective function after the corresponding node is split according to the characteristic j and the characteristic value k of the characteristic j.
It was mentioned before that the splitting gain can be based on the left sub-tree first gradient sum (denoted as G)L) The second gradient sum of the left subtree (denoted as H)L) Right subtree first gradient sum(note as G)R) Right subtree second gradient sum (denoted as H)R) It is obtained that if the equivalent model is trained on a device on one side (not marked as C), referring to FIG. 4, the device on the C side can initialize GL、HL、GR、HRAccording to the first/second gradient sum G of the left subtreeL/HLThe first/second gradient sum G corresponding to a certain group (denoted as k) is accumulatedk/HkThe first/second gradient sum G from the right subtree respectivelyR/HRUp-subtract the first/second gradient sum G corresponding to the group kk/HkIn the method, the gradient and G of the left sub-tree corresponding to each group under the feature j (i.e. corresponding to each eigenvalue of the feature j) are obtained in turnLAnd the right subtree gradient and GR. For any group, the sum of the first gradient sum of the A-side device and the first gradient sum of the B-side device is the first gradient sum G corresponding to the groupkThe sum of the slice of the second gradient sum of the A-party device and the slice of the second gradient sum of the B-party device is the second gradient sum H corresponding to the groupk。
As shown in fig. 9, the sample set is divided into K groups according to the feature value of the feature j, and since the samples are sorted according to the feature value of the feature j before the grouping result is obtained, the K groups present a certain order according to the feature value of the samples in each group, and the increase of the feature value of the samples in the group is not represented by the increment of the group sequence numbers (1-K) (indicated by a horizontal arrow in fig. 9), the feature value of any sample in the 2 nd group is not less than the feature value of any sample in the 1 st group, the feature value of any sample in the 3 rd group is not less than the feature value of any sample in the 2 nd group. It can be seen that if the grouping is taken as the minimum unit, and the sample set is divided into a left subset corresponding to the left subtree and a right subset corresponding to the right subtree, there are K division cases (indicated by vertical arrows in fig. 9), in other words, there are K splitting possibilities for the node to be split under the feature j, where each splitting possibility corresponds to one feature value (i.e., a feature value corresponding to each grouping).
The device at the side C needs to calculate the K possible splitting gains, that is, the splitting gains corresponding to the K groups under the characteristic j, in order to determine the optimal splitting parameter of the node to be split. With combined reference to FIGS. 4 and 9, taking the first gradient as an example, the C-side device may first sum the first gradient of the left sub-tree with GLInitialize to 0, and sum G of the first gradient of the right subtreeRInitialise to the first gradient sum G of the node to be split. The first gradient sum is a sum of elements in the first gradient vector corresponding to samples belonging to the respective node. Then, the device on the side C can sequentially obtain the first gradient and G of the left subtree corresponding to the K groups respectively in an "increasing-decreasing" mannerLAnd the right subtree first gradient sum GR: grouping the 1 st gradient into corresponding G1Add to left subtree first gradient sum GL(GL= 0), the left subtree gradient and G corresponding to the 1 st group are obtainedL(GL=0+G1) And, summing the first gradient of the right subtree with GR(GR= G) minus the first gradient sum G corresponding to the 1 st group1Obtaining the first gradient sum G of the right subtree corresponding to the 1 st groupR(GR=G-G1) (ii) a Grouping the 2 nd corresponding first gradients and G2Add to left subtree first gradient sum GL(GL=0+G1) Obtaining the gradient sum G of the left subtree corresponding to the 2 nd groupL(GL=0+G1+G2) And, summing the first gradient of the right subtree with GR(GR=G-G1) Subtract the first gradient sum G corresponding to the 2 nd group2Obtaining the first gradient sum G of the right subtree corresponding to the 2 nd groupR(GR=G-G1-G2) (ii) a ...; and the like until the first gradient sum G of the left subtree corresponding to the kth group is obtainedL(GL= G) and the first gradient sum G of the right subtreeR(GR= 0). Of course, G can be calculated for the aboveL、GRIs suitably deformed, e.g. by summing the first gradient of the left sub-tree with GLInitialising to a first gradient sum G of nodes to be split, and rightSubtree first gradient sum GRThe initialization is 0. Accordingly, it is necessary to be at GLSequentially subtracting the first gradient sums respectively corresponding to the K groups to obtain the first gradient sums respectively corresponding to the left subtrees of the K groups, and obtaining the first gradient sums of the left subtrees of the K groups in GRAnd sequentially adding first gradient sums corresponding to the K groups respectively to obtain first gradient sums corresponding to the K groups respectively in the right subtree. It should be understood that G on both sides of the equationLG on both sides of equation for the first gradient sum of the left subtrees corresponding to the adjacent groupsRThe first gradient sum is for the corresponding right subtree of the adjacent group.
Similarly, the C-side device may obtain the left sub-tree second gradient sum and the right sub-tree second gradient sum corresponding to the K groups under the feature j.
Respectively corresponding first gradient and G of left sub-tree based on K groups under characteristic jLRight subtree first gradient and GRLeft subtree second gradient and HLRight subtree second gradient and HRThe device on the C side may calculate the splitting gains Gain corresponding to the K groups under the feature j, respectively.
It should be noted that some binning algorithms, such as equal frequency binning, may cause samples with unequal feature values to appear in adjacent groups, resulting in the left/right subtree gradient sum (G) calculated as described aboveL、GR、HL、HR) There is some error, but such error is negligible in engineering terms.
The above describes how to compute the left/right subtree gradients and the splitting gain by training the equivalent model on one side (not denoted as C) of the device, and on the basis of this, we will continue to describe how to break the above computation process on the two sides of the device.
First, the initialization of the gradient sum of the left and right subtrees to sum GLInitialization to 0 and GRInitialization to G for exampleL=0, the a-party's device may generate a random number as initialized GLIs divided into<GL>AAnd will be<GL>ASent to the B-party's device, which can calculate the satisfaction<GL>A+<GL>B=0<GL>BOr, satisfaction can be generated by a third party device<GL>A+<GL>B=0<GL>AAnd<GL>Band divide the pieces into pieces<GL>ADevice to send to party A and to fragment<GL>BThe device sent to the B-party, or the A-party device and the B-party device are initialized locally<GL>A=0、<GL>BAnd = 0. For GR= G, the device on the a side can locally calculate a patch of the first gradient vector &<gi>AThe sum of the elements in the gradient is obtained, and the first gradient and the fragment of G are obtained<G>AAs initialized GRIs divided into<GR>AAnd, the B-party's device may locally compute a fragment of the first gradient vector<gi>BThe sum of the elements in the gradient is obtained, and the first gradient and the fragment of G are obtained<G>BAs initialized GRIs divided into<GR>B. Similarly, the party A device may locally compute a fragment of the second gradient vector<hi>AThe sum of the elements in the gradient is obtained to obtain a second gradient and H slices<H>AH as initializationRIs divided into<HR>AAnd, the B-party's device may locally compute a fragment of the first gradient vector<hi>BThe sum of the elements in the gradient is obtained to obtain a second gradient and H slices<H>BH as initializationRIs divided into<HR>B。
Then, the calculation of the gradient sums corresponding to the K groups respectively is carried out, and the A side/B side equipment can be used as follows by taking the first gradient as an example<GL>A=<GL>A +<Gk>AAnd<GR>A=<GR>A -<Gk>Aand (4) performing iterative computation to obtain the fragments of the first gradient sum of the left sub-tree and the fragments of the first gradient sum of the right sub-tree corresponding to the K groups respectively.Similarly, the A/B devices may press<HL>A=<HL>A+<Hk>AAnd<HR>A=<HR>A -<Hk>Aand (4) performing iterative computation to obtain the fragments of the second gradient sum of the left sub-tree and the fragments of the second gradient sum of the right sub-tree corresponding to the K groups respectively.
Obtaining GL、GR、HL、HRAfter fragmentation, the a-side/B-side device may calculate a fragmentation of the fragmentation gain. In some embodiments, the sub-goal functions corresponding to the nodes may be as follows:
wherein,represents a preset coefficient, G represents the sum of the first gradients of all samples under the node, and H represents the sum of the second gradients of all samples under the node. Correspondingly, for the node, after dividing the samples belonging to the node into the left and right subtrees based on the feature value corresponding to a certain group under a certain feature, the calculation formula of the splitting Gain is as follows:
wherein, when difference comparison is carried out on different splitting gains GainAs a constant, the constant is cancelled, so that the constant in the Gain can be split during actual operationAre ignored. In addition, since the splitting of data in this specification is performed based on addition, the calculation formula before splitting is equivalent in all places where the data is split into piecesIn Taylor form. E.g. constants in the calculation of the split gain hereOn the basis of neglect, the first and second pairsTaylor expansion is performed to obtain an equivalent calculation of the splitting gain as shown in FIG. 4 as follows:
the subscript k of Gain may be used as an identifier of a corresponding group, or may be used as a feature value corresponding to the group. Inputs in the above equations (e.g. G)L) Output (i.e. output)) Both contain two splits, one for each of party a and party B. After the input slices are substituted into the above calculation formula, the expansion formula involves the calculation of two types of product terms: one type can be called as a local product term, and two factors of the local product term belong to the fragment data of the same party (party A/party B), so that the calculation can be independently completed by equipment of one party; another type may be referred to as a cross product term, where two factors of the cross product term belong to the fragmented data of different parties, i.e. one factor belongs to the fragmented data of party a and the other factor belongs to the fragmented data of party B, so to protect the data privacy of both parties, party a and party B may calculate two fragments of the cross product term by running a two-party secure multiplication protocol, and party a and party B each execute one fragment to obtain an output fragment. For example, whereinandfor the local product term, Party A can compute locallyThe B party can compute locally,For the cross product term, the A side and the B side can calculate the fragment of the cross product term through a two-side safe multiplication protocol.
As mentioned previously, the packet is traversed by traversing the packet under the same feature (the packet is identified by the feature value k in FIG. 4) and traversing (N)A+ NB) A feature (denoted j in FIG. 4) is obtained for each of parties A and B (N)A+ NB) Grouping the first gradient sums together and (N)A+ NB) And grouping the second gradient sums, wherein each group of the first/second gradient sums comprises the number of the sections equal to the number K of the groups under the corresponding characteristic. Further, each of the parties A and B will obtain (N)A+ NB) The set of split gain slices, each set of split gain slices comprising a number of slices equal to the number of groups under the corresponding feature, K. Wherein the number K of packets under different characteristics may be different.
Referring to fig. 5, fig. 5 shows a comparison of the magnitude of the splitting gain corresponding to each packet under each feature. The index j of Gain indicates a feature and the index k indicates a group (and also indicates the corresponding feature value of the group). By traversing the feature pair j1, j2 and the packet pair k1, k2, i.e. pair (N)A+ NB) And comparing the splitting gains in the component splitting gains in pairs, and selecting the characteristic corresponding to the maximum splitting gain and the component to split. Where k1 is the eigenvalue of signature j1, k2 is the eigenvalue of signature j2, j1 and j2 may be the same signature, but the groupings under that signature k1, k2 are different groupings. Based on the blockWhen the splitting gain is obtained based on the equivalent calculation formula of the splitting gain shown in fig. 4, it can be understood that the larger the splitting gain is, the more suitable the corresponding feature and the splitting threshold (or the feature value) are as the splitting condition of the node.
Because both the A side and the B side only have Gainj1,k1、Gainj2,k2The party a and the party B can compare the magnitude of the two split gains through a multi-party security comparison protocol. After defining the same feature j1 and group k2 corresponding to the two split gains that are compared, as shown in fig. 5, the device on the a-side can calculate<v>A=<Gainj1,k1>A-<Gainj2,k2>AThe B-party's device can calculate<v>B=<Gainj2,k2>B-<Gainj1,k1>B. Further, the device of party A and the device of party B can interact according to a multi-party security comparison protocol so as not to reveal<v>AAnd<v>Bon the premise of determining the specific numerical value of<v>AAnd<v>Bthe magnitude relationship of (1). If it is<v>AGreater than (or not less than)<v>BThen the feature j1 is "retained" along with the feature value k 1. Otherwise, feature j2 and the feature value k2 are "retained". And comparing the two sides with the next characteristic and the characteristic value thereof based on the reserved characteristic and the fragment of the characteristic value, and repeating the steps until the comparison of the splitting gains corresponding to all the characteristics and the characteristic values is finished, and finally, the reserved characteristic and the fragment of the characteristic value thereof of the two sides correspond to the optimal splitting condition of the node. Of course, it may also be the A-party's device computing<v>A=<Gainj2,k2>A -<Gainj1,k1>AAnd device calculation of B party<v>B=<Gainj1,k1>B-<Gainj2,k2>B. Accordingly, if<v>AGreater than (or not less than)<v>BThen preferentially splitting the node to be split according to the characteristic j2 and the characteristic value k 2. Otherwise, the node to be split is split according to the characteristic j1 and the characteristic value k 1.
Referring to fig. 6, for any node (denoted as X), when the splitting parameter (the feature and the feature value corresponding to the maximum splitting gain) of node X is determined, only one of the parties a and B records the splitting parameter, because node X splits according to only one feature (of one party). Assuming that the determined splitting parameters are j1, k1, and feature j1 is a feature for party a, only the device for party a can record the splitting parameters for node X. For example only, as shown in fig. 6, the splitting information split (X, j1, k1) recorded on the a-side indicates that the splitting parameters of the node X are own-side feature j1 and feature value k1, and the splitting information split (X, dummy, dummy) recorded on the B-side indicates that the node X is a non-local splitting node, i.e., the parameters of the node X in the tree model on the B-side are unknown. It is understood that the a-party and the B-party can respectively determine whether each node is a local split node and determine (split) parameters of the local split node according to the recorded split information of each node.
It should be appreciated that with respect to node splitting, parties a and B may have some consensus: the method includes the steps that 1, an A party and a B party can define the structure of a tree model to be trained, such as the number of decision trees, the number of nodes of each tree, the connection relation among the nodes, the positions of the nodes, the depth and the like, correspondingly, under some scenes, the A party and the B party can determine whether current operation aims at the same decision tree or the same node, for example, the A party and the B party can carry out uniform identification (such as numbering) on each decision tree in the tree model, and the A party and the B party can carry out uniform identification (such as numbering) on each node of the same decision tree; the party a and the party B can agree on the identity of the features of both parties and the identity of each group under each feature without disclosure, for example, assuming that the features of the party a include age and height, and the features of the party B include distance and orientation, the age and height can be identified by a1 and a2, and the distance and orientation can be identified by B1 and B2, so that the party B only knows that a1 and a2 are two features of the party a, and the party a also knows that B1 and B2 are two features of the party B, and specifically, the party a and the party B can determine whether the obtained splitting gain corresponds to the same feature and group (feature value) or not and determine which feature the splitting gain corresponds to which one through the feature identity and the group identity.
Still taking the example that the node X is split according to the feature j1 and the feature value k1 of the a-side, the device of the a-side needs to further divide the sample set { i } of the M samples into a left subset corresponding to the left subtree and a right subset corresponding to the right subtree according to the magnitude relationship between the feature value of the feature j1 and the feature value k1 of the M samples, so that the two parties can obtain the shards of the first/second gradient vectors and the shards of the flag vectors of the child nodes (i.e., the left and right subtrees) of the node X. If the child node continues to split, the first/second gradient vector segment and the flag vector segment of the child node may be used for the calculation related to the splitting of the child node, and the specific details may refer to fig. 4 to 6 and the related description thereof. If the child node does not continue to split, i.e. becomes a leaf node, the a-party and the B-party may cooperatively calculate two segments of the leaf node weight based on the segments of the first/second gradient vectors of the child node (i.e. the leaf node), and each of the a-party and the B-party may execute one segment, and specific details may refer to fig. 8 and its related description.
As shown in FIG. 6, the device of party A may generate a left sub-tree vector for node XAnd right subtree vectorWherein, the left sub-tree vectorIndicating samples in a left subset, right sub-tree vectors, obtained by dividing a sample set according to features and feature values corresponding to a maximum splitting gainAnd indicating the samples in the right subset obtained by dividing the sample set according to the characteristic and the characteristic value corresponding to the maximum splitting gain. It will be appreciated that the left/right sub-tree vector may have a number of bits (dimension) of M, each bit corresponding to a sample, typically a left subset vector position 1 and the remaining left subset vector positions 0 corresponding to samples belonging to the left subset, and similarly a right subset vector position 1 and the remaining right subset vector positions corresponding to samples belonging to the right subsetThe subset vector positions are all at 0. Since the samples belong to either the left or right subset, satisfy. The device of party A will be the left sub-tree vector of node XSplitting into slicesAnd slicingTo divide into piecesSending to B-party's device and retaining (at least) the slice. Similarly, the right subtree vector of node XDivided into fragments by party AAnd slicingTo divide into piecesStoring in prescription A, slicingStored in prescription B.
Following the description above, taking the left sub-tree as an example, assume that the token vector { s ] for the left sub-tree of node X is computed on the C-party devicei,L}, first gradient vector { gi,LH, a second gradient vector hi,L},As shown in fig. 6, the device of party C may: calculating a token vector s for node Xi,XAnd left sub-tree vectorThe result of the bit-wise multiplication results in a flag vector s of the left sub-tree of node Xi,L}; compute a first gradient vector { g) for node Xi,X} and the flag vector s of the left subtree of node Xi,LThe result of the bit-wise multiplication results in a first gradient vector g of the left sub-tree of node Xi,L}; computing a second gradient vector { h) for node Xi,X} and the flag vector s of the left subtree of node Xi,LThe result of the bit-wise multiplication results in a second gradient vector h of the left sub-tree of node Xi,L}。
Similarly, the C-side device can calculate the flag vector { s ] of the right subtree of the node Xi,R}, first gradient vector { gi,RH and a second gradient vector hi,R}。
Referring to the foregoing related description, taking the left sub-tree as an example, the device of the a-party and the device of the B-party may interact according to a multi-party security computing protocol to: node X based token vector si,XSlice and left sub-tree vector of }Computing a token vector s of the left sub-tree of node Xi,LSlicing of the points; first gradient vector g based on node Xi,XThe slice of node X and the flag vector s of the left sub-tree of node Xi,LH, compute a first gradient vector g for the left sub-tree of node Xi,LSlicing of the points; second gradient vector h based on node Xi,XThe slice of node X and the flag vector s of the left sub-tree of node Xi,LH, computing a second gradient vector h of the left sub-tree of node Xi,LAnd (4) slicing. For example, mixing si,X=<si,X>A+<si,X>BAndsubstitution intoTo obtainIts development is as follows:
wherein,can be calculated locally on A side and used as output fragment of A side<si,X>AA part of (a) of (b),can be locally calculated at the B party and used as output fragments of the B party<si,X>BA part of, cross product termsCan be calculated by two-party secure multiplication protocol, obtained by party AAs output shards of party A<si,X>AA part of, obtained by party BAs output slices of party B<si,X>BA part of (a).
For a single decision tree, the splitting of each node can be performed in sequence according to the splitting links shown in fig. 4 to 6 until the growth termination condition is satisfied. It should be understood that growth termination means that no child nodes are again split on a single tree, i.e. all leaf nodes on a single tree have been obtained. In some embodiments, the growth termination condition may include a depth of the single tree reaching a preset depth.
With reference to figure 7 of the drawings,figure 7 shows a method of computing a slice of leaf node weights. For the same leaf node obtained by training, the A-side device may calculate a first gradient vector { g } for that leaf nodeiQuick check<gi>AThe sum of the elements in the leaf node, the first gradient of the leaf node and the slice G of GAThe B-party's device may compute the first gradient vector { g for the leaf nodeiQuick check<gi>BThe sum of the elements in the leaf node, the first gradient of the leaf node and the slice G of GB. Similarly, the device on the A-side may compute a second gradient vector { h) for the leaf nodeiQuick check<hi>AThe sum of the elements in the leaf node, the slice H of the second gradient sum H of the leaf nodeAThe B-side device may calculate a second gradient vector { h) for the leaf nodeiQuick check<hi>BThe sum of the elements in the leaf node, the slice H of the second gradient sum H of the leaf nodeB. In some embodiments, the leaf node weights are calculated as follows:
wherein w represents a leaf node weight, G represents a first gradient sum of the leaf node, H represents a second gradient sum of the leaf node,representing a preset coefficient.
It should be noted that, since the splitting of data is based on addition, the slice w of the leaf node weight wAAnd wBCan be calculated by disassembling the Taylor expansion typeIs obtained, wherein, wAWeight, w, as leaf node in the tree model for the A-partyBAs the weight of the same leaf node in the tree model of party B. Party A may be based on the first gradient of the leaf node and the slice G of GASecond gradient and fragmentation of HAAnd B may beSlice G of the first gradient and G at the leaf nodeBSecond gradient and fragmentation of HBCalculating to obtain a first fragment of the weight of the leaf node (which can correspond to w of the A party) according to the leaf node weight calculation formula by a two-party secure multiplication protocolA) And a second segment (which may correspond to w of party B)B). Regarding the disassembly of the calculation formula related to the multiplication, many descriptions have been given in the foregoing embodiments of the present specification, and will not be described herein and hereafter.
Under the XGB framework, the tree model and the equivalent model of any party comprise a plurality of trees. Referring to FIG. 8, when each of parties A and B completes the training of a tree, the slice of the M samples' prediction scores may be updated to calculate a first gradient vector { g ] for the root node of the next treeiSlice of and second gradient vector hiAnd (4) slicing. In FIG. 8, predi,tRepresents the weight of the leaf node (i.e. the leaf node to which the sample i belongs, also called the predicted leaf node of the sample i) reached by the sample i (i.e. the sample numbered i) along the predicted path in the t-th tree in the equivalent model. For the t-th tree, the tree,wherein s isi,nToken vector s representing the leaf node numbered n in the t-th tree in the equivalent modeli,nElement in } corresponding to sample i, NtRepresenting the number of leaf nodes of the t-th tree. The weights of the predicted leaf nodes of M samples in the t-th tree of the equivalent model may form a predicted weight vector { pred }i,t}。
The device of party A and the device of party B may interact according to a multiparty security computing protocol to base the token vectors { s) of the leaf nodes of the tth treei,nSlice of the t tree and leaf node weight vector of t tree in the equivalent model wnCompute pred (i.e., leaf node weight vector of the t-th tree in the tree model of A/B), andi,tto be divided into pieces. Further, as shown in FIG. 8, the A-party device may prei,tIs divided into<predi,t>AThe prediction score pred added to the current sample iiIs divided into<predi>ATo update the prediction score pred of sample iiIs divided into<predi>A. It will be appreciated that the prediction score pred of the current sample iiIs the sum of the base score of the equivalent model and the weight of the predicted leaf nodes of sample i on the first t-1 trees of the equivalent model. In particular, the prediction score pred of sample iiIs the basic score f of the equivalent model0I.e. fragmentation of party A<predi>AIs the initial score of the tree model of party A<f0>AFragmentation of party B<predi>BIs the initial score of the tree model of party B<f0>B. The prediction scores for the M samples may constitute a prediction score vector { pred }iAccordingly, the A-side device may obtain an updated prediction score vector { pred }iQuick check<predi>AThe B-side device can obtain an updated prediction score vector { pred }iQuick check<predi>B}。
In some embodiments, the predicted values for the samples may be derived based on the predicted scores for the samples and the non-linear activation function, as described above. Taking sigmoid as an example of a nonlinear activation function, the predicted value of the sampleWhere pred represents the prediction score for that sample. Accordingly, since the splitting of data is based on addition, taylor expansion can be adopted. For two-party training, the device of party A may pressUpdate the slice of the predicted value of sample i, the B-side device can be as followsAnd updating the slice of the predicted value of the sample i.
In some embodiments of the present invention, the,loss function of sampleFrom this, a first gradient corresponding to the sample i can be derivedAnd a second gradient corresponding to sample i. For two-party training, the device of party A may pressUpdating the fragment of the first gradient corresponding to the sample i used for training the next tree, thereby obtaining the fragment of the first gradient vector of the root node of the next tree<gi>A}. Similarly, the B-party's device may pressUpdating the fragment of the first gradient corresponding to sample i to obtain a fragment of the first gradient vector of the root node of the next tree to be trained<gi>B}. In view ofThe decomposition of (a) involves cross product terms, and the device of party A and the device of party B can interact according to a multi-party safety calculation protocol to base the predicted value of the updated sample iAnd a tag value yiCalculating the fragment of the second gradient corresponding to the sample i used for training the next tree, thereby obtaining the fragment of the second gradient vector of the root node of the next tree<hi>A}。
It should be noted that the above description of the flow is for illustration and description only and does not limit the scope of the application of the present specification. Various modifications and alterations to the flow may occur to those skilled in the art, given the benefit of this description. However, such modifications and variations are intended to be within the scope of the present description.
FIG. 10 is a block diagram of a two-way decision tree training system in accordance with some embodiments of the present description. The system 200 may be implemented on a device of a first party, which may be either of the a-party and the B-party, the second party being the other of the a-party and the B-party. Wherein the eigenvalues and label values of one or more characteristics of each sample in the set of samples are distributed vertically on the A side and the B side. As shown in fig. 10, system 200 may include a first obtaining module 202, a first ordering module 204, a first packet gradient and fragmentation computation module 206, a second ordering module 208, a second packet gradient and fragmentation computation module 210, a split gain fragmentation computation module 212, a split gain comparison module 214, a left and right sub-tree vector fragmentation obtaining module 216, a sub-node landmark vector fragmentation computation module 218, and a sub-node gradient vector fragmentation computation module 220.
For any node (denoted as X) that is split, the functions of the modules in system 200 are as follows:
the first obtaining module 202 may be configured to obtain a first slice of the landmark vector of node X, a first slice of the first gradient vector, and a first slice of the second gradient vector. Wherein the flag vector indicates samples belonging to the respective node, the first gradient vector comprises a first gradient corresponding to the samples belonging to the respective node, and the second gradient vector comprises a second gradient corresponding to the samples belonging to the respective node.
For any feature held by the first party, the first ranking module 204 may be to: sorting the sample set according to the characteristic value of the characteristic and obtaining a first arrangement vector, wherein the arrangement vector is used for identifying the operation of sorting the equal-length sequence, and the element of the arrangement vector indicates the position of the data corresponding to the position in the equal-length sequence in a sorting result sequence; based on a first fragment of a first gradient vector, the first arrangement vector and a second fragment of a second party based on the first gradient vector, arranging the first gradient vector according to the characteristic value by a safe arrangement method to obtain a first fragment of the first gradient arrangement result vector, and simultaneously obtaining a second fragment of the first gradient arrangement result vector by the second party; and arranging the second gradient vector according to the characteristic value by a safe arrangement method based on the first fragment of the second gradient vector, the first arrangement vector and the second fragment of the second party based on the second gradient vector to obtain the first fragment of the second gradient arrangement result vector, and simultaneously obtaining the second fragment of the second gradient arrangement result vector by the second party.
For any feature held by the first party, the first packet gradient and fragmentation computation module 206 may be to: dividing elements in a first fragment of the first gradient sorting result vector and elements in a first fragment of the second gradient sorting result vector according to a preset frequency respectively to obtain a plurality of first groups of the first fragment of the first gradient vector and a plurality of first groups of the first fragment of the second gradient vector, and for each of the plurality of first groups: calculating the sum of elements included in the first group of the first segment of the first gradient vector to obtain a first segment of a first gradient sum corresponding to the first group; and calculating the sum of elements included in the first group of the first segment of the second gradient vector to obtain a first segment of a second gradient sum corresponding to the first group.
For any feature held by the second party, the second ranking module 208 may be configured to: arranging the first gradient vector according to the characteristic value of the characteristic by a safe arrangement method based on the first fragment of the first gradient vector and the second fragment and the second arrangement vector of the second party based on the first gradient vector to obtain the first fragment of the first gradient ordering result vector, and simultaneously obtaining the second fragment of the first gradient ordering result vector by the second party; based on the first fragment of the second gradient vector, and the second fragment and the second arrangement vector of the second party based on the second gradient vector, arranging the second gradient vector according to the characteristic value by a safe arrangement method to obtain the first fragment of the second gradient ordering result vector, and simultaneously obtaining the second fragment of the second gradient ordering result vector by the second party; and the second arrangement vector is obtained by ordering the sample set by a second party according to the characteristic value of the characteristic.
For any feature held by the second party, the second packet gradient and sharding calculation module 210 may be operable to: dividing elements in a first slice of the first gradient sorting result vector and elements in a first slice of the second gradient sorting result vector according to a preset frequency number respectively to obtain a plurality of second sub-groups of the first slice of the first gradient vector and a plurality of second sub-groups of the first slice of the second gradient vector, and for each of the plurality of second sub-groups: calculating the sum of the elements included in the second grouping of the first patches of the first gradient vector to obtain first patches of the first gradient sum corresponding to the second grouping; and calculating the sum of the elements included in the second grouping of the first patches of the second gradient vector to obtain the first patches of the second gradient sum corresponding to the second grouping.
The split gain slice calculation module 212 may be configured to interact with a device of the second party according to a multi-party security calculation protocol, so as to calculate a first slice of a split gain corresponding to each packet under each feature based on a first slice of a first gradient sum and a first slice of a second gradient sum corresponding to each packet under each feature.
The splitting gain comparison module 214 may be configured to interact with a device of the second party according to a multi-party security comparison protocol, so as to determine a maximum splitting gain based on a first segment of the splitting gain corresponding to each packet under each feature, and record splitting information of the node according to the feature and the packet corresponding to the maximum splitting gain.
Left and right subtree vector shard obtaining module 216 may be configured to: when the maximum splitting gain corresponds to a feature of a first party, generating a left sub-tree vector and a right sub-tree vector of the node, the left sub-tree vector indicating samples in a left subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the right sub-tree vector indicating samples in a right subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the left subset corresponding to the left sub-tree, and the right subset corresponding to the right sub-tree; splitting the left sub-tree vector into a first slice and a second slice, and sending the second slice of the left sub-tree vector to a device of a second party; splitting the right sub-tree vector into a first slice and a second slice, and sending the second slice of the right sub-tree vector to a device of a second party; receiving, from a device of a second party, a first tile of a left sub-tree vector and a first tile of a right sub-tree vector of the node when the maximum splitting gain corresponds to a feature of the second party.
The child node flag vector shard calculation module 218 may be configured to: interacting with a device of a second party according to a multi-party secure computing protocol to compute a first shard of a landmark vector of the node based on the first shard of the landmark vector of the node and the first shard of the left sub-tree vector; interacting with a device of a second party in accordance with a multi-party secure computation protocol to compute a first tile of a landmark vector of a right subtree of the node based on the first tile of the landmark vector of the node and the first tile of the right subtree vector.
The child gradient vector shard calculation module 220 may be configured to: interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a first gradient vector of a left sub-tree of the node based on the first tile of the first gradient vector of the node and a first tile of a landmark vector of the left sub-tree of the node; interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a second gradient vector of a left sub-tree of the node based on the first tile of the second gradient vector of the node and the first tile of the landmark vector of the left sub-tree of the node; interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a first gradient vector of a right sub-tree of the node based on the first tile of the first gradient vector of the node and a first tile of a landmark vector of the right sub-tree of the node; interacting with a device of a second party in accordance with a multi-party secure computing protocol to compute a first tile of a second gradient vector of a right sub-tree of the node based on the first tile of the second gradient vector of the node and the first tile of the landmark vector of the right sub-tree of the node.
For more details of the system 200 and its modules, reference may be made to fig. 3-8 and their associated description.
It should be understood that the system and its modules shown in FIG. 10 may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).
It should be noted that the above description of the system and its modules is for convenience only and should not limit the present disclosure to the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, in some embodiments, the first packet gradient and fragmentation computation module 206 and the second packet gradient and fragmentation computation module 210 may be different modules in a system or may be a single module that implements the functionality of both modules. For another example, the first obtaining module 202, the first arranging module 204, the first packet gradient and slice calculation module 206, the second arranging module 208, the second packet gradient and slice calculation module 210, the splitting gain slice calculation module 212, and the splitting gain comparison module 214 may be extracted from the training system shown in fig. 10 and combined into a node splitting system. Such variations are within the scope of the present disclosure.
The beneficial effects that may be brought by the embodiments of the present description include, but are not limited to: (1) the two-party decision tree training method is provided, so that the data privacy of the two parties can be protected; (2) the model effect can be improved by training with the sample data of the two parties; (3) the method comprises the steps of sorting the fragments of the first/second gradient vectors of the A party and the B party by adopting an inadvertent arrangement method, dividing the fragments into a plurality of groups based on a preset frequency, calculating the fragments of the first/second gradient sum corresponding to each group of the fragments of the first/second gradient vectors by the A party and the B party respectively locally, calculating the splitting gain corresponding to each group by the two parties according to a multi-party safety calculation protocol, determining a splitting mode based on the splitting gain corresponding to each group, splitting a tree, reducing interaction of the two parties in a two-party decision tree training process, and greatly reducing the operation amount and communication traffic. It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.
Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the embodiments herein. Various modifications, improvements and adaptations to the embodiments described herein may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the embodiments of the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.
Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.
Moreover, those skilled in the art will appreciate that aspects of the embodiments of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of embodiments of the present description may be carried out entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the embodiments of the present specification may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.
The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.
Computer program code required for operation of various portions of the embodiments of the present description may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, VisualBasic, Fortran2003, Perl, COBOL2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or processing device. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).
In addition, unless explicitly stated in the claims, the order of processing elements and sequences, use of numbers and letters, or use of other names in the embodiments of the present specification are not intended to limit the order of the processes and methods in the embodiments of the present specification. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing processing device or mobile device.
Similarly, it should be noted that in the preceding description of embodiments of the specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more embodiments of the invention. This method of disclosure, however, is not intended to imply that more features are required than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.
For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are possible within the scope of the embodiments of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.
Claims (11)
1.A node splitting method in a two-party decision tree training process is used for splitting a certain node in a decision tree, wherein the characteristic value and the label value of one or more characteristics of each sample in a sample set are vertically distributed in the two parties, the method is executed by equipment of a first party, the first party is either one of the two parties, and a second party is the other one of the two parties; the method comprises the following steps:
obtaining a first slice of a first gradient vector and a first slice of a second gradient vector of the node; the first gradient vector comprises a first gradient corresponding to samples belonging to the respective node, and the second gradient vector comprises a second gradient corresponding to samples belonging to the respective node;
for any feature of the first party:
sorting the sample set according to the characteristic value of the characteristic and obtaining a first arrangement vector, wherein the arrangement vector is used for identifying the operation of sorting the equal-length sequence, and the element of the arrangement vector indicates the position of the data corresponding to the position in the equal-length sequence in a sorting result sequence; based on a first fragment of a first gradient vector, the first arrangement vector and a second fragment of a second party based on the first gradient vector, arranging the first gradient vector according to the characteristic value by a safe arrangement method to obtain a first fragment of the first gradient arrangement result vector, and simultaneously obtaining a second fragment of the first gradient arrangement result vector by the second party; based on the first fragment of the second gradient vector, the first arrangement vector and the second fragment of the second party based on the second gradient vector, arranging the second gradient vector according to the characteristic value by a safe arrangement method to obtain the first fragment of the second gradient arrangement result vector, and simultaneously obtaining the second fragment of the second gradient arrangement result vector by the second party;
dividing elements in a first segment of the first gradient sorting result vector and elements in a first segment of the second gradient sorting result vector according to a preset dividing mode to obtain a plurality of first groups of the first segment of the first gradient vector and a plurality of first groups of the first segment of the second gradient vector, and for each of the plurality of first groups: calculating the sum of elements included in the first group of the first segment of the first gradient vector to obtain a first segment of a first gradient sum corresponding to the first group; calculating the sum of elements included in the first group of the first segment of the second gradient vector to obtain a first segment of a second gradient sum corresponding to the first group;
for any feature of the second party:
arranging the first gradient vector according to the characteristic value of the characteristic by a safe arrangement method based on the first fragment of the first gradient vector and the second fragment and the second arrangement vector of the second party based on the first gradient vector to obtain the first fragment of the first gradient ordering result vector, and simultaneously obtaining the second fragment of the first gradient ordering result vector by the second party; based on the first fragment of the second gradient vector, and the second fragment and the second arrangement vector of the second party based on the second gradient vector, arranging the second gradient vector according to the characteristic value by a safe arrangement method to obtain the first fragment of the second gradient ordering result vector, and simultaneously obtaining the second fragment of the second gradient ordering result vector by the second party; the second arrangement vector is obtained by ordering the sample set by a second party according to the characteristic value of the characteristic;
dividing elements in a first segment of the first gradient sorting result vector and elements in a first segment of the second gradient sorting result vector according to a preset dividing mode to obtain a plurality of second groups of the first segment of the first gradient vector and a plurality of second groups of the first segment of the second gradient vector, and for each of the plurality of second groups: calculating the sum of the elements included in the second grouping of the first patches of the first gradient vector to obtain first patches of the first gradient sum corresponding to the second grouping; calculating the sum of the elements included in the second grouping of the first patches of the second gradient vector to obtain first patches of second gradient sums corresponding to the second grouping;
interacting with equipment of a second party according to a multi-party safety calculation protocol to calculate first fragments of splitting gains corresponding to the groups under the characteristics respectively on the basis of first fragments of a first gradient sum and first fragments of a second gradient sum corresponding to the groups under the characteristics respectively;
and interacting with equipment of a second party according to a multi-party safety comparison protocol, determining the maximum splitting gain based on the first fragments of the splitting gains respectively corresponding to the groups under the characteristics, and recording the splitting information of the nodes according to the characteristics and the groups corresponding to the maximum splitting gain.
2. The method of claim 1, further comprising:
obtaining a first slice of a token vector for the node, the token vector indicating samples belonging to a respective node;
when the maximum splitting gain corresponds to a feature of a first party, generating a left sub-tree vector and a right sub-tree vector of the node, the left sub-tree vector indicating samples in a left subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the right sub-tree vector indicating samples in a right subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the left subset corresponding to the left sub-tree, and the right subset corresponding to the right sub-tree; splitting the left sub-tree vector into a first slice and a second slice, and sending the second slice of the left sub-tree vector to a device of a second party; splitting the right sub-tree vector into a first slice and a second slice, and sending the second slice of the right sub-tree vector to a device of a second party; receiving, from a device of a second party, a first tile of a left sub-tree vector and a first tile of a right sub-tree vector of the node when the maximum splitting gain corresponds to a feature of the second party;
interacting with a device of a second party according to a multi-party secure computing protocol to compute a first shard of a landmark vector of the node based on the first shard of the landmark vector of the node and the first shard of the left sub-tree vector; interacting with a device of a second party in accordance with a multi-party secure computation protocol to compute a first tile of a landmark vector of a right subtree of the node based on the first tile of the landmark vector of the node and the first tile of the right subtree vector.
3. The method of claim 2, further comprising:
interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a first gradient vector of a left sub-tree of the node based on the first tile of the first gradient vector of the node and a first tile of a landmark vector of the left sub-tree of the node; interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a second gradient vector of a left sub-tree of the node based on the first tile of the second gradient vector of the node and the first tile of the landmark vector of the left sub-tree of the node;
interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a first gradient vector of a right sub-tree of the node based on the first tile of the first gradient vector of the node and a first tile of a landmark vector of the right sub-tree of the node; interacting with a device of a second party in accordance with a multi-party secure computing protocol to compute a first tile of a second gradient vector of a right sub-tree of the node based on the first tile of the second gradient vector of the node and the first tile of the landmark vector of the right sub-tree of the node.
4. The method of claim 1, wherein the predetermined division comprises division at a predetermined frequency.
5. The method of claim 1, wherein interacting with the device of the second party according to the multi-party secure computation protocol to compute the first fraction of the splitting gain corresponding to each packet under each feature based on the first fraction of the first gradient sum and the first fraction of the second gradient sum corresponding to each packet under each feature comprises:
for either feature:
calculating first slices of first gradient sums of left subtrees and first slices of first gradient sums of right subtrees respectively corresponding to the groups under the characteristics based on first slices of first gradient sums respectively corresponding to the groups under the characteristics; wherein the left sub-tree first gradient sum is equal to the sum of the elements corresponding to the samples belonging to the left subset in the first and second slices of the first gradient vector, and the right sub-tree first gradient sum is equal to the sum of the elements corresponding to the samples belonging to the right subset in the first and second slices of the first gradient vector;
calculating a first fragment of a second gradient sum of a left sub-tree and a first fragment of a second gradient sum of a right sub-tree corresponding to each group under the characteristic respectively based on a first fragment of a second gradient sum corresponding to each group under the characteristic respectively; wherein the left sub-tree second gradient sum is equal to the sum of the elements corresponding to the samples belonging to the left subset in the first and second slices of the second gradient vector, and the right sub-tree second gradient sum is equal to the sum of the elements corresponding to the samples belonging to the right subset in the first and second slices of the second gradient vector;
for any grouping of any feature:
interacting with a device of a second party according to a multi-party secure computing protocol A first slice of the splitting gain corresponding to the packet under the feature is calculated, wherein,representing the corresponding splitting gain, G, of the packet under the characteristicLThe sum of the first and second patches representing the first gradient sum of the left subtree corresponding to the group under the feature, GRThe sum of the first and second slices, H, representing the first gradient sum of the right subtree corresponding to the group under the featureLThe sum of the first and second slices, H, representing the sum of the second gradients of the left subtree corresponding to the group under the featureRA sum of the first tile and the second tile representing a second gradient sum of the right subtree corresponding to the group under the feature,representing a preset coefficient.
6. The method of claim 1, wherein the secure ordering method comprises:
a first participant obtaining, from a third party, a first intermediate permutation vector, a first tile of a first data column, and a first tile of a first sequence of results ordering the first data column based on the first intermediate permutation vector; the second participant acquires a second segment of the first data column and a second segment of the first result sequence from the trusted third party;
the first participant orders the first intermediate permutation vector based on the target permutation vector to obtain a second intermediate permutation vector, and sends the second intermediate permutation vector to the second participant;
the first participant makes a difference between the first fragment of the data column to be sorted and the first fragment of the first data column to obtain the first fragment of the second data column, and simultaneously obtains the second fragment of the second data column from the second party; the second fragment of the second data column is obtained by the second participant subtracting the second fragment of the data column to be sorted from the second fragment of the first data column;
the first participant obtains a second data column based on the first fragment of the second data column and the second fragment of the second data column, and sorts the second data column based on the target permutation vector to obtain a second result data column; sorting the first slices of the first resulting data column based on a second intermediate permutation vector to obtain a third resulting data column; summing the second result data column and the third result data column to obtain a first fragment of the target data column;
the second participant sorts the second slice of the first result sequence based on the second intermediate permutation vector to obtain a second slice of the target data column; the target data column is equal to a result sequence of sorting the data columns to be sorted based on the target permutation vector;
when a first segment based on a first gradient vector, the first arrangement vector and a second segment based on the first gradient vector of a second party are arranged according to the characteristic value by a safe arrangement method to obtain the first segment of the first gradient arrangement result vector, and simultaneously the second party obtains the second segment of the first gradient arrangement result vector: the first party is used as a first participant in the secure sorting method, a first fragment of a first gradient vector is used as a first fragment of a data column to be sorted in the secure sorting method, a first arrangement vector is used as a target arrangement vector in the secure sorting method, the second party is used as a second participant in the secure sorting method, and a second fragment of the first gradient vector is used as a second fragment of the data column to be sorted in the secure sorting method;
when the first segment based on the second gradient vector, the first arrangement vector and the second segment based on the second gradient vector of the second party are arranged according to the characteristic value by a safe arrangement method to obtain the first segment of the second gradient arrangement result vector, and the second party obtains the second segment of the second gradient arrangement result vector at the same time: the first party is used as a first participant in the secure sorting method, a first fragment of a second gradient vector is used as a first fragment of a data column to be sorted in the secure sorting method, a first arranging vector is used as a target arranging vector in the secure sorting method, the second party is used as a second participant in the secure sorting method, and a second fragment of the second gradient vector is used as a second fragment of the data column to be sorted in the secure sorting method;
when a first segment based on a first gradient vector, a second segment based on the first gradient vector and a second arrangement vector of a second party are arranged on the first gradient vector according to the characteristic value of the characteristic through a safe arrangement method to obtain a first segment of the first gradient ordering result vector, and simultaneously the second party obtains a second segment of the first gradient ordering result vector: the first party is used as a second party in the safe sorting method, a first fragment of a first gradient vector is used as a second fragment of a data column to be sorted in the safe sorting method, the second party is used as a first party in the safe sorting method, the second permutation vector is used as a target permutation vector in the safe sorting method, and a second fragment of the first gradient vector is used as a first fragment of the data column to be sorted in the safe sorting method;
when the first segment based on the second gradient vector, the second segment based on the second gradient vector and the second arrangement vector of the second party are arranged according to the characteristic value by a safe arrangement method to obtain the first segment of the second gradient ordering result vector, and the second party obtains the second segment of the second gradient ordering result vector at the same time: the first party is used as a second party in the safe sorting method, a first fragment of a second gradient vector is used as a second fragment of a data column to be sorted in the safe sorting method, the second party is used as a first party in the safe sorting method, the second permutation vector is used as a target permutation vector in the safe sorting method, and a second fragment of the second gradient vector is used as a first fragment of the data column to be sorted in the safe sorting method.
7. A node splitting system for two-party decision tree training is used for splitting a certain node in a decision tree, wherein characteristic values and label values of one or more characteristics of each sample in a sample set are vertically distributed in the two parties, the system is implemented on equipment of a first party, the first party is either one of the two parties, and a second party is the other one of the two parties; the system comprises a first obtaining module, a first arranging module, a first grouping gradient and fragment calculating module, a second arranging module, a second grouping gradient and fragment calculating module, a splitting gain fragment calculating module and a splitting gain comparing module;
the first obtaining module is configured to obtain a first slice of a first gradient vector and a first slice of a second gradient vector of the node; the first gradient vector comprises a first gradient corresponding to samples belonging to the respective node, and the second gradient vector comprises a second gradient corresponding to samples belonging to the respective node;
for any feature held by the first party, the first arrangement module is to: sorting the sample set according to the characteristic value of the characteristic and obtaining a first arrangement vector, wherein the arrangement vector is used for identifying the operation of sorting the equal-length sequence, and the element of the arrangement vector indicates the position of the data corresponding to the position in the equal-length sequence in a sorting result sequence; based on a first fragment of a first gradient vector, the first arrangement vector and a second fragment of a second party based on the first gradient vector, arranging the first gradient vector according to the characteristic value by a safe arrangement method to obtain a first fragment of the first gradient arrangement result vector, and simultaneously obtaining a second fragment of the first gradient arrangement result vector by the second party; based on the first fragment of the second gradient vector, the first arrangement vector and the second fragment of the second party based on the second gradient vector, arranging the second gradient vector according to the characteristic value by a safe arrangement method to obtain the first fragment of the second gradient arrangement result vector, and simultaneously obtaining the second fragment of the second gradient arrangement result vector by the second party;
for any feature held by the first party, the first packet gradient and fragmentation computation module is to: dividing elements in a first segment of the first gradient sorting result vector and elements in a first segment of the second gradient sorting result vector according to a preset dividing mode to obtain a plurality of first groups of the first segment of the first gradient vector and a plurality of first groups of the first segment of the second gradient vector, and for each of the plurality of first groups: calculating the sum of elements included in the first group of the first segment of the first gradient vector to obtain a first segment of a first gradient sum corresponding to the first group; calculating the sum of elements included in the first group of the first segment of the second gradient vector to obtain a first segment of a second gradient sum corresponding to the first group;
for any feature held by a second party, the second ranking module is to: arranging the first gradient vector according to the characteristic value of the characteristic by a safe arrangement method based on the first fragment of the first gradient vector and the second fragment and the second arrangement vector of the second party based on the first gradient vector to obtain the first fragment of the first gradient ordering result vector, and simultaneously obtaining the second fragment of the first gradient ordering result vector by the second party; based on the first fragment of the second gradient vector, and the second fragment and the second arrangement vector of the second party based on the second gradient vector, arranging the second gradient vector according to the characteristic value by a safe arrangement method to obtain the first fragment of the second gradient ordering result vector, and simultaneously obtaining the second fragment of the second gradient ordering result vector by the second party; the second arrangement vector is obtained by ordering the sample set by a second party according to the characteristic value of the characteristic;
for any feature held by the second party, the second packet gradient and shard calculation module is to: dividing elements in a first segment of the first gradient sorting result vector and elements in a first segment of the second gradient sorting result vector according to a preset dividing mode to obtain a plurality of second groups of the first segment of the first gradient vector and a plurality of second groups of the first segment of the second gradient vector, and for each of the plurality of second groups: calculating the sum of the elements included in the second grouping of the first patches of the first gradient vector to obtain first patches of the first gradient sum corresponding to the second grouping; calculating the sum of the elements included in the second grouping of the first patches of the second gradient vector to obtain first patches of second gradient sums corresponding to the second grouping;
the split gain slice calculation module is used for interacting with equipment of a second party according to a multi-party security calculation protocol so as to calculate first slices of split gains corresponding to groups under each characteristic respectively based on first slices of a first gradient sum and first slices of a second gradient sum corresponding to the groups under each characteristic respectively;
the splitting gain comparison module is used for interacting with equipment of a second party according to a multi-party safety comparison protocol, determining the maximum splitting gain based on the first fragments of the splitting gains corresponding to the groups under the characteristics, and recording the splitting information of the nodes according to the characteristics and the groups corresponding to the maximum splitting gain.
8. A node splitting apparatus for two-way decision tree training, comprising at least one storage medium and at least one processor, the at least one storage medium for storing computer instructions; the at least one processor is configured to execute the computer instructions to implement the method of any of claims 1-6.
9. A two-party tree model training method, wherein a tree model comprises one or more decision trees, and the eigenvalues and label values of one or more characteristics of each sample in a sample set are distributed vertically in the two parties, the method is executed by a device of a first party, the first party is either one of the two parties, and the second party is the other one of the two parties; the method comprises the following steps: performing node splitting on each node in a decision tree in the tree model according to the method of any one of claims 1 to 6 until a growth termination condition is met.
10. The method of claim 9, further comprising:
calculating a first fragment of a weight of each leaf node in a single decision tree in an equivalent model as a weight of each leaf node in a corresponding decision tree in a tree model of a first party, wherein the equivalent model has the same structure as the tree model of the first party and the tree model of a second party:
interacting with a device of a second party according to a multi-party secure computing protocol to base a fourth of the first gradient vector of the leaf nodeA slice and a first slice of a second gradient vector based onCalculating a first fragment of the weight of the leaf node in the equivalent model; wherein,the sum of the first tile and the second tile representing the weight of the leaf node in the equivalent model,the sum of the elements in the first tile and the second tile of the first gradient vector representing the leaf node,Hthe sum of the elements in the first tile and the second tile of the second gradient vector representing the leaf node,representing a preset coefficient.
11. The method of claim 10, further comprising:
for the tth decision tree:
interacting with equipment of a second party according to a multi-party safety calculation protocol, and calculating first fragments of weights of leaf nodes on a tth decision tree in an equivalent model to which each sample belongs on the basis of the weights of the leaf nodes on the tth decision tree in the tree model of the first party and the first fragments of the flag vectors;
for any sample, accumulating the first fragment of the weight of the leaf node on the t decision tree in the equivalent model to which the sample belongs to the first fragment of the predicted score of the sample to update the first fragment of the predicted score of the sample; when t =1, the first slice of the prediction score of the sample before updating is equal to the base score of the tree model of the first party, whenOf the predicted score of the sample before updatingThe first fragment is equal to the sum of the first fragment of the weight of the leaf node of the sample on the first t-1 decision trees in the equivalent model and the basic score of the tree model of the first party;
calculating a first segment of a first gradient vector of a root node of the t +1 decision tree based on the updated first segment of the prediction score of each sample and the updated first segment of the label value of each sample;
interacting with a device of a second party according to a multi-party security computing protocol to compute a first segment of a second gradient vector of a root node of the t +1 th decision tree based on the updated first segment of the predicted score of each sample and the updated first segment of the label value of each sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210196287.4A CN114282688B (en) | 2022-03-02 | 2022-03-02 | Two-party decision tree training method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210196287.4A CN114282688B (en) | 2022-03-02 | 2022-03-02 | Two-party decision tree training method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114282688A true CN114282688A (en) | 2022-04-05 |
CN114282688B CN114282688B (en) | 2022-06-03 |
Family
ID=80882186
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210196287.4A Active CN114282688B (en) | 2022-03-02 | 2022-03-02 | Two-party decision tree training method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114282688B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116910818A (en) * | 2023-09-13 | 2023-10-20 | 北京数牍科技有限公司 | Data processing method, device, equipment and storage medium based on privacy protection |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140214736A1 (en) * | 2013-01-30 | 2014-07-31 | Technion Research & Development Foundation Limited | Training ensembles of randomized decision trees |
CN104915608A (en) * | 2015-05-08 | 2015-09-16 | 南京邮电大学 | Privacy protection type data classification method for information physical fusion system |
CN107886243A (en) * | 2017-11-10 | 2018-04-06 | 阿里巴巴集团控股有限公司 | Risk identification model construction and Risk Identification Method, device and equipment |
US20190026489A1 (en) * | 2015-11-02 | 2019-01-24 | LeapYear Technologies, Inc. | Differentially private machine learning using a random forest classifier |
CN110413647A (en) * | 2019-07-08 | 2019-11-05 | 上海鸿翼软件技术股份有限公司 | A kind of quick computing system of high dimension vector Length discrepancy sequence similarity |
CN111738359A (en) * | 2020-07-24 | 2020-10-02 | 支付宝(杭州)信息技术有限公司 | Two-party decision tree training method and system |
CN111738360A (en) * | 2020-07-24 | 2020-10-02 | 支付宝(杭州)信息技术有限公司 | Two-party decision tree training method and system |
CN111784078A (en) * | 2020-07-24 | 2020-10-16 | 支付宝(杭州)信息技术有限公司 | Distributed prediction method and system for decision tree |
CN111813580A (en) * | 2020-07-24 | 2020-10-23 | 成都信息工程大学 | Matrix representation-based distributed model training optimization method |
CN112464287A (en) * | 2020-12-12 | 2021-03-09 | 同济大学 | Multi-party XGboost safety prediction model training method based on secret sharing and federal learning |
US20220036250A1 (en) * | 2020-07-30 | 2022-02-03 | Huakong Tsingjiao Information Science (Beijing) Limited | Method and device for training tree model |
-
2022
- 2022-03-02 CN CN202210196287.4A patent/CN114282688B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140214736A1 (en) * | 2013-01-30 | 2014-07-31 | Technion Research & Development Foundation Limited | Training ensembles of randomized decision trees |
CN104915608A (en) * | 2015-05-08 | 2015-09-16 | 南京邮电大学 | Privacy protection type data classification method for information physical fusion system |
US20190026489A1 (en) * | 2015-11-02 | 2019-01-24 | LeapYear Technologies, Inc. | Differentially private machine learning using a random forest classifier |
CN107886243A (en) * | 2017-11-10 | 2018-04-06 | 阿里巴巴集团控股有限公司 | Risk identification model construction and Risk Identification Method, device and equipment |
CN110413647A (en) * | 2019-07-08 | 2019-11-05 | 上海鸿翼软件技术股份有限公司 | A kind of quick computing system of high dimension vector Length discrepancy sequence similarity |
CN111738359A (en) * | 2020-07-24 | 2020-10-02 | 支付宝(杭州)信息技术有限公司 | Two-party decision tree training method and system |
CN111738360A (en) * | 2020-07-24 | 2020-10-02 | 支付宝(杭州)信息技术有限公司 | Two-party decision tree training method and system |
CN111784078A (en) * | 2020-07-24 | 2020-10-16 | 支付宝(杭州)信息技术有限公司 | Distributed prediction method and system for decision tree |
CN111813580A (en) * | 2020-07-24 | 2020-10-23 | 成都信息工程大学 | Matrix representation-based distributed model training optimization method |
US20220036250A1 (en) * | 2020-07-30 | 2022-02-03 | Huakong Tsingjiao Information Science (Beijing) Limited | Method and device for training tree model |
CN112464287A (en) * | 2020-12-12 | 2021-03-09 | 同济大学 | Multi-party XGboost safety prediction model training method based on secret sharing and federal learning |
Non-Patent Citations (6)
Title |
---|
SIDDALINGESHWAR PATIL ET AL: "Accuracy Prediction for Distributed Decision Tree using Machine Learning approach", 《2019 3RD INTERNATIONAL CONFERENCE ON TRENDS IN ELECTRONICS AND INFORMATICS (ICOEI)》 * |
SIDDALINGESHWAR PATIL ET AL: "Accuracy Prediction for Distributed Decision Tree using Machine Learning approach", 《2019 3RD INTERNATIONAL CONFERENCE ON TRENDS IN ELECTRONICS AND INFORMATICS (ICOEI)》, 10 October 2019 (2019-10-10), pages 1365 - 1371 * |
沈建涛: "基于分布式运算的决策树算法的研究与实现", 《南通职业大学学报》 * |
沈建涛: "基于分布式运算的决策树算法的研究与实现", 《南通职业大学学报》, no. 01, 28 April 2017 (2017-04-28), pages 80 - 83 * |
陆旭等: "一种面向大数据分析的快速并行决策树算法", 《云南大学学报(自然科学版)》 * |
陆旭等: "一种面向大数据分析的快速并行决策树算法", 《云南大学学报(自然科学版)》, no. 02, 10 March 2020 (2020-03-10), pages 50 - 57 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116910818A (en) * | 2023-09-13 | 2023-10-20 | 北京数牍科技有限公司 | Data processing method, device, equipment and storage medium based on privacy protection |
CN116910818B (en) * | 2023-09-13 | 2023-11-21 | 北京数牍科技有限公司 | Data processing method, device, equipment and storage medium based on privacy protection |
Also Published As
Publication number | Publication date |
---|---|
CN114282688B (en) | 2022-06-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111738359B (en) | Two-party decision tree training method and system | |
CN109299728B (en) | Sample joint prediction method, system and medium based on construction of gradient tree model | |
CN111738360B (en) | Two-party decision tree training method and system | |
Fu et al. | MILP-based automatic search algorithms for differential and linear trails for speck | |
Blanton et al. | Secure and efficient outsourcing of sequence comparisons | |
CN110969264B (en) | Model training method, distributed prediction method and system thereof | |
CN111639368A (en) | Incremental learning distributed computing method, system and node based on block chain | |
CN116049909B (en) | Feature screening method, device, equipment and storage medium in federal feature engineering | |
CN114282688B (en) | Two-party decision tree training method and system | |
CN114282076B (en) | Sorting method and system based on secret sharing | |
CN114327371B (en) | Secret sharing-based multi-key sorting method and system | |
WO2023174018A1 (en) | Vertical federated learning methods, apparatuses, system and device, and storage medium | |
CN112561085A (en) | Multi-classification model training method and system based on multi-party safety calculation | |
CN114003744A (en) | Image retrieval method and system based on convolutional neural network and vector homomorphic encryption | |
CN109274504B (en) | Multi-user big data storage sharing method and system based on cloud platform | |
US10673624B2 (en) | Communication control device, communication control method, and computer program product | |
Dong et al. | Veridl: Integrity verification of outsourced deep learning services | |
CN111784078B (en) | Distributed prediction method and system for decision tree | |
CN114282255B (en) | Sorting sequence merging method and system based on secret sharing | |
CN113542271B (en) | Network background flow generation method based on generation of confrontation network GAN | |
CN114338017B (en) | Sorting method and system based on secret sharing | |
CN104978382A (en) | Clustering method based on local density on MapReduce platform | |
CN115150152A (en) | Method for rapidly reasoning actual authority of network user based on authority dependency graph reduction | |
CN115221244A (en) | Block chain cross-chain method and device, computer equipment and storage medium | |
Nishida et al. | Efficient secure neural network prediction protocol reducing accuracy degradation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |