CN111738360B - Two-party decision tree training method and system - Google Patents

Two-party decision tree training method and system Download PDF

Info

Publication number
CN111738360B
CN111738360B CN202010723916.5A CN202010723916A CN111738360B CN 111738360 B CN111738360 B CN 111738360B CN 202010723916 A CN202010723916 A CN 202010723916A CN 111738360 B CN111738360 B CN 111738360B
Authority
CN
China
Prior art keywords
vector
party
node
gradient
fragment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010723916.5A
Other languages
Chinese (zh)
Other versions
CN111738360A (en
Inventor
方文静
王力
周俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010723916.5A priority Critical patent/CN111738360B/en
Publication of CN111738360A publication Critical patent/CN111738360A/en
Application granted granted Critical
Publication of CN111738360B publication Critical patent/CN111738360B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the specification discloses a two-party decision tree training method and a two-party decision tree training system, so that data privacy of two parties is protected. For any feature, the devices of the two parties interact according to a multi-party security computing protocol to compute the gradient sum fragment corresponding to the packet under the feature based on the gradient vector fragment and the identification vector corresponding to any packet. The devices of the two parties interact according to a multi-party security calculation protocol, and the fragmentation of the splitting gain corresponding to each grouping under each characteristic is calculated based on the gradient sum fragmentation corresponding to each grouping under each characteristic. Furthermore, the devices of the two sides determine the characteristics and the groups corresponding to the maximum splitting gain through a multi-party safety comparison protocol, and split the nodes according to the characteristics and the groups corresponding to the maximum splitting gain.

Description

Two-party decision tree training method and system
Technical Field
The present disclosure relates to the field of information technology, and in particular, to a two-way decision tree training method and system.
Background
In order to protect the privacy of data of all parties, a distributed training scheme is adopted in the field of machine learning. That is, any participant can train the model belonging to itself without revealing sample data held by any participant.
It is currently desirable to provide a distributed training scheme for decision trees.
Disclosure of Invention
One of the embodiments of the present specification provides a two-party decision tree training method, where the method is performed by a device of a first party, the first party is any one of a party holding a feature value and a label value of at least one feature of each sample in a sample set and a party holding a feature value of at least one feature of each sample in the sample set, and a second party is the other party; the method comprises the following steps:
splitting any node according to the following splitting steps:
obtaining a first slice of a landmark vector, a first slice of a first gradient vector, and a first slice of a second gradient vector of the node; the flag vector indicates samples belonging to the respective node, the first gradient vector includes a first gradient corresponding to the samples belonging to the respective node, and the second gradient vector includes a second gradient corresponding to the samples belonging to the respective node.
For any feature held by the first party:
for each of a plurality of first groupings by dividing the sample set by the feature value of the feature: generating an identification vector corresponding to the first group according to the characteristic value of the characteristic of each sample, wherein the identification vector indicates the samples belonging to the first group; splitting the identification vector corresponding to the first packet into a first fragment and a second fragment, and sending the second fragment of the identification vector corresponding to the first packet to the equipment of the second party; interacting with equipment of a second party according to a multi-party security computing protocol to compute a first fragment of a first gradient vector corresponding to the first group based on the first fragment of the first gradient vector of the node and the first fragment of the identification vector corresponding to the first group, wherein the first gradient corresponding to the first group and the inner product of the sum of the first fragment and the second fragment of the first gradient vector of the node and the sum of the first fragment and the second fragment of the identification vector corresponding to the first group are obtained; and interacting with equipment of a second party according to a multi-party security computing protocol to compute a first fragment of a second gradient sum corresponding to the first group based on the first fragment of the second gradient vector of the node and the first fragment of the identification vector corresponding to the first group, wherein the inner product of the second gradient sum corresponding to the first group and the sum of the first fragment and the second fragment of the second gradient vector of the node and the sum of the first fragment and the second fragment of the identification vector corresponding to the first group is obtained.
For any feature held by the second party:
for each of a plurality of second groupings that divide the sample set by the characteristic: obtaining, from the device of the second party, a first tile of an identification vector corresponding to the second packet, the identification vector indicating samples belonging to the second packet; interacting with a device of a second party according to a multi-party security computation protocol to compute a first tile of a first gradient sum corresponding to the second packet based on the first tile of the first gradient vector of the node and the first tile of the identification vector corresponding to the second packet, the first gradient corresponding to the second packet being obtained from an inner product of the first gradient sum of the first tile and the second tile of the first gradient vector of the node and the first tile and the second tile sum of the identification vector corresponding to the second packet; interacting with a device of a second party according to a multi-party security computation protocol to compute a first tile of a second gradient sum corresponding to the second packet based on the first tile of the second gradient vector of the node and the first tile of the identification vector corresponding to the second packet, the second gradient corresponding to the second packet being derived from an inner product of the sum of the first tile and the second tile of the second gradient vector of the node and the sum of the first tile and the second tile of the identification vector corresponding to the second packet.
And interacting with the equipment of the second party according to a multi-party security computing protocol to compute the first fragment of the splitting gain corresponding to each group under each characteristic based on the first fragment of the first gradient sum and the first fragment of the second gradient sum corresponding to each group under each characteristic.
And interacting with equipment of a second party according to a multi-party safety comparison protocol, determining the maximum splitting gain based on the first fragments of the splitting gains respectively corresponding to the groups under the characteristics, and recording the splitting information of the nodes according to the characteristics and the groups corresponding to the maximum splitting gain.
When the maximum splitting gain corresponds to a feature of a first party, generating a left sub-tree vector and a right sub-tree vector of the node, the left sub-tree vector indicating samples in a left subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the right sub-tree vector indicating samples in a right subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the left subset corresponding to the left sub-tree, and the right subset corresponding to the right sub-tree; splitting the left sub-tree vector into a first slice and a second slice, and sending the second slice of the left sub-tree vector to a device of a second party; splitting the right sub-tree vector into a first slice and a second slice, and sending the second slice of the right sub-tree vector to a device of a second party; receiving, from a device of a second party, a first tile of a left sub-tree vector and a first tile of a right sub-tree vector of the node when the maximum splitting gain corresponds to a feature of the second party.
Interacting with a device of a second party according to a multi-party secure computing protocol to compute a first shard of a landmark vector of the node based on the first shard of the landmark vector of the node and the first shard of the left sub-tree vector; interacting with a device of a second party in accordance with a multi-party secure computation protocol to compute a first tile of a landmark vector of a right subtree of the node based on the first tile of the landmark vector of the node and the first tile of the right subtree vector.
Interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a first gradient vector of a left sub-tree of the node based on the first tile of the first gradient vector of the node and a first tile of a landmark vector of the left sub-tree of the node; interacting with a device of a second party in accordance with a multi-party secure computing protocol to compute a first tile of a second gradient vector of a left sub-tree of the node based on the first tile of the second gradient vector of the node and the first tile of the landmark vector of the left sub-tree of the node.
Interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a first gradient vector of a right sub-tree of the node based on the first tile of the first gradient vector of the node and a first tile of a landmark vector of the right sub-tree of the node; interacting with a device of a second party in accordance with a multi-party secure computing protocol to compute a first tile of a second gradient vector of a right sub-tree of the node based on the first tile of the second gradient vector of the node and the first tile of the landmark vector of the right sub-tree of the node.
One of the embodiments of the present specification provides a two-party decision tree training system, where the system is implemented on a device of a first party, the first party is any one of a party holding a feature value and a label value of at least one feature of each sample in a sample set and a party holding a feature value of at least one feature of each sample in the sample set, and the second party is the other party; the system comprises a first obtaining module, a first grouping gradient and fragment calculation module, a second grouping gradient and fragment calculation module, a splitting gain comparison module, a left sub-tree vector fragment and right sub-tree vector fragment obtaining module, a sub-node sign vector fragment calculation module and a sub-node gradient vector fragment calculation module.
For any node that is split:
the first obtaining module is used for obtaining a first segment of the mark vector of the node, a first segment of a first gradient vector and a first segment of a second gradient vector; the flag vector indicates samples belonging to the respective node, the first gradient vector includes a first gradient corresponding to the samples belonging to the respective node, and the second gradient vector includes a second gradient corresponding to the samples belonging to the respective node.
For any feature held by the first party, the first packet gradient and fragmentation computation module is to: for each of a plurality of first groups obtained by dividing a sample set according to the feature value of the feature, generating an identification vector corresponding to the first group according to the feature value of the feature of each sample, wherein the identification vector indicates the samples belonging to the first group; splitting the identification vector corresponding to the first packet into a first fragment and a second fragment, and sending the second fragment of the identification vector corresponding to the first packet to the equipment of the second party; interacting with equipment of a second party according to a multi-party security computing protocol to compute a first fragment of a first gradient vector corresponding to the first group based on the first fragment of the first gradient vector of the node and the first fragment of the identification vector corresponding to the first group, wherein the first gradient corresponding to the first group and the inner product of the sum of the first fragment and the second fragment of the first gradient vector of the node and the sum of the first fragment and the second fragment of the identification vector corresponding to the first group are obtained; and interacting with equipment of a second party according to a multi-party security computing protocol to compute a first fragment of a second gradient sum corresponding to the first group based on the first fragment of the second gradient vector of the node and the first fragment of the identification vector corresponding to the first group, wherein the inner product of the second gradient sum corresponding to the first group and the sum of the first fragment and the second fragment of the second gradient vector of the node and the sum of the first fragment and the second fragment of the identification vector corresponding to the first group is obtained.
For any feature held by the second party, the second packet gradient and shard calculation module is to: for each of a plurality of second packets obtained by dividing the sample set by the characteristic, obtaining, from the device of the second party, a first slice of an identification vector corresponding to the second packet, the identification vector indicating samples belonging to the second packet; interacting with a device of a second party according to a multi-party security computation protocol to compute a first tile of a first gradient sum corresponding to the second packet based on the first tile of the first gradient vector of the node and the first tile of the identification vector corresponding to the second packet, the first gradient corresponding to the second packet being obtained from an inner product of the first gradient sum of the first tile and the second tile of the first gradient vector of the node and the first tile and the second tile sum of the identification vector corresponding to the second packet; interacting with a device of a second party according to a multi-party security computation protocol to compute a first tile of a second gradient sum corresponding to the second packet based on the first tile of the second gradient vector of the node and the first tile of the identification vector corresponding to the second packet, the second gradient corresponding to the second packet being derived from an inner product of the sum of the first tile and the second tile of the second gradient vector of the node and the sum of the first tile and the second tile of the identification vector corresponding to the second packet.
The split gain slice calculation module is used for interacting with equipment of a second party according to a multi-party security calculation protocol so as to calculate first slices of split gains corresponding to groups under each characteristic based on first slices of a first gradient sum and first slices of a second gradient sum corresponding to the groups under each characteristic.
The splitting gain comparison module is used for interacting with equipment of a second party according to a multi-party safety comparison protocol, determining the maximum splitting gain based on the first fragments of the splitting gains corresponding to the groups under the characteristics, and recording the splitting information of the nodes according to the characteristics and the groups corresponding to the maximum splitting gain.
The left and right subtree vector sharding obtaining module is used for: when the maximum splitting gain corresponds to a feature of a first party, generating a left sub-tree vector and a right sub-tree vector of the node, the left sub-tree vector indicating samples in a left subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the right sub-tree vector indicating samples in a right subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the left subset corresponding to the left sub-tree, and the right subset corresponding to the right sub-tree; splitting the left sub-tree vector into a first slice and a second slice, and sending the second slice of the left sub-tree vector to a device of a second party; splitting the right sub-tree vector into a first slice and a second slice, and sending the second slice of the right sub-tree vector to a device of a second party; receiving, from a device of a second party, a first tile of a left sub-tree vector and a first tile of a right sub-tree vector of the node when the maximum splitting gain corresponds to a feature of the second party.
The child node marker vector fragment calculation module is configured to: interacting with a device of a second party according to a multi-party secure computing protocol to compute a first shard of a landmark vector of the node based on the first shard of the landmark vector of the node and the first shard of the left sub-tree vector; interacting with a device of a second party in accordance with a multi-party secure computation protocol to compute a first tile of a landmark vector of a right subtree of the node based on the first tile of the landmark vector of the node and the first tile of the right subtree vector.
The child node gradient vector segment calculation module is configured to: interacting with a device of a second party according to a multi-party secure computing protocol to compute a first shard of a first gradient vector of the node based on the first shard of the first gradient vector of the node and a first shard of a landmark vector of the left subtree of the node, interacting with a device of the second party according to the multi-party secure computing protocol to compute a first shard of a second gradient vector of the left subtree of the node based on the first shard of the second gradient vector of the node and the first shard of the landmark vector of the left subtree of the node; interacting with a device of a second party according to a multi-party secure computing protocol to compute a first shard of a first gradient vector of the node based on the first shard of the first gradient vector of the node and a first shard of a landmark vector of a right subtree of the node, interacting with a device of the second party according to the multi-party secure computing protocol to compute a first shard of a second gradient vector of the right subtree of the node based on the first shard of the second gradient vector of the node and the first shard of the landmark vector of the right subtree of the node.
One of the embodiments of the present specification provides a two-way decision tree training apparatus, which includes a processor and a storage device, where the storage device is used to store instructions, and when the processor executes the instructions, the two-way decision tree training apparatus implements a two-way decision tree training method according to any embodiment of the present specification.
Drawings
The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:
FIG. 1 is a schematic diagram of an application scenario of a model training system according to some embodiments of the present description;
FIG. 2 is a schematic diagram of a tree model of parties A and B and their corresponding equivalent models, shown in accordance with some embodiments of the present description;
FIG. 3 is a schematic diagram of inputs and variable initialization for two-way decision tree training in accordance with some embodiments of the present description;
4-6 are exemplary flow diagrams of node splitting shown in accordance with some embodiments of the present description;
FIG. 7 is an exemplary flow diagram illustrating the computation of a shard of leaf node weights for an equivalent model according to some embodiments of the present description;
FIG. 8 is an exemplary flow diagram illustrating computing a patch of gradient vectors for training a next tree in accordance with some embodiments of the present description;
FIG. 9 is a schematic diagram of partitioning left and right subsets, according to some embodiments of the present description;
FIG. 10 is a block diagram of a two-way decision tree training system in accordance with some embodiments of the present description.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.
It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.
As used in this specification, the terms "a", "an" and/or "the" are not intended to be inclusive of the singular, but rather are intended to be inclusive of the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.
Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.
First, the relevant knowledge of the decision tree is introduced.
The nodes of the decision tree may be divided into split nodes having child nodes (e.g., left and right sub-trees) and leaf nodes (leaf nodes), and each split node may correspond to a feature, which may be referred to as an associated feature of the split node. The parameters of a split node (which may be referred to as split parameters) may include a threshold for determining to which child node a sample/prediction object belongs, the threshold being related to the associated feature of the split node, e.g., a certain feature value of the associated feature of the split node may be used as the threshold.
The decision tree based model (i.e., tree model) may include one or more decision trees, wherein the tree model including a plurality (two or more) of decision trees includes a tree model under an XGB (eXtreme Gradient Boosting) framework.
For the regression problem, each leaf node may correspond to a score (which may be referred to as a leaf node score/leaf node weight), the weights of all leaf nodes on a single decision tree constitute a leaf node weight vector of the decision tree, and the weight of a leaf node (which may be referred to as a predicted leaf node) where a predicted object arrives along a predicted path on the decision tree may be obtained based on the inner product of the leaf node weight vector of the decision tree and the predicted leaf node vector of the predicted object corresponding to the decision tree. The predicted leaf node vector of the predicted object corresponding to the decision tree indicates the weight of the leaf node reached by the predicted object along the predicted path on the decision tree, and it can be understood that the bit number (dimension) of the predicted leaf node vector corresponding to a single decision tree is consistent with the number of leaf nodes of the decision tree, and usually the predicted leaf node vector position 1 corresponding to the leaf node reached by the predicted object along the predicted path on the decision tree and the rest of the predicted leaf node vector positions 0 are used, so that the weight of the predicted leaf node is equal to the inner product of the leaf node weight vector and the predicted leaf node vector. Further, one tree model may correspond to one base score. Then, the prediction score of the predicted object (e.g., sample) may be calculated
Figure 304867DEST_PATH_IMAGE001
Calculating, where pred denotes the prediction score of the predicted object, f0Representing the base score of the tree model, T (being ≧ 1) representing the number of decision trees contained by the tree model, WtLeaf node weight vector, S, representing a single decision treetA predicted leaf node vector representing the predicted object for a single decision tree,
Figure 222007DEST_PATH_IMAGE002
Indicating that the vector inner product is calculated.
In some embodiments, the prediction score of the prediction object may be used as the prediction value of the prediction object. In other embodiments, the prediction score of the predicted object may be processed using a non-linear function, the output of which is the predicted value of the predicted object.
In some embodiments, a single decision tree may be trained as a tree model, with the prediction score of a predicted object being equal to the sum of the weights of the leaf nodes that the predicted object reaches along the predicted path on the decision tree and the base score of the tree model. In some embodiments, for example, under an XGB (eXtreme Gradient Boosting) framework, a plurality of decision trees may be trained as a tree model, and a prediction score of a prediction object is equal to a sum of a weight of a leaf node of the prediction object arriving along a prediction path on the plurality of decision trees and a base score of the tree model.
As mentioned previously, a split node may be split into left and right subtrees based on some feature value of some feature. Accordingly, starting from the root node, the samples in the sample set are divided into various child nodes until the samples are divided into leaf nodes, at which point the predicted scores of the samples may be determined based on the base scores of the decision tree and the weights of the leaf nodes at which the samples arrive. The splitting gains corresponding to the splitting of the same splitting node according to different characteristic values of different characteristics are different, so that the splitting gain corresponding to which characteristic value of which characteristic a node is split is better judged according to the magnitude of the splitting gain corresponding to different characteristic values of different characteristics respectively in the training process.
In particular, the splitting gain may reflect a decreasing value of the objective function when a node is split by a certain eigenvalue of a certain characteristic. The splitting (or training) of the decision tree aims to make the objective function value after splitting smaller than the objective function value before splitting as much as possible and make the difference between the two as large as possible, so that the feature and the feature value corresponding to the larger splitting gain can be selected to split the node in the training process. The objective function is derived based on at least the loss function (reflecting the difference between the predicted value and the tag value) of each sample in the sample set, e.g. the objective function may be the sum of the loss functions of each sample. The loss function can be further described by using a first gradient and a second gradient, the corresponding objective function is equivalent to the sum of sub-objective functions respectively corresponding to each leaf node on a single decision tree, and the sub-objective function corresponding to each leaf node can be obtained based on the first gradient sum and the second gradient sum corresponding to the leaf node. Wherein the first gradient is a first order gradient related to a loss function of the sample, the second gradient is a second order gradient related to a loss function of the sample, the first gradient sum is a sum of first gradients of the samples belonging to the corresponding node, and the second gradient sum is a sum of second gradients of the samples belonging to the corresponding node. It is understood that the original node of the split and the left and right subtrees of the split are regarded as leaf nodes at different time instants when the split gain is calculated, i.e. the leaf nodes regarded as the decision tree are increased by one.
When a certain node is split, two left and right subtrees can grow under an original node, and the gain of the objective function can be interpreted as a reduction value of an objective function value corresponding to the split decision tree relative to an objective function value corresponding to the original decision tree. Combining the splitting gain, the objective function, the sub-objective functions, and the relationship between the gradient sums, the splitting gain can be obtained based on the first gradient sum and the second gradient sum corresponding to the left sub-tree and the first gradient sum and the second gradient sum corresponding to the right sub-tree.
For more details on the splitting gain, reference may be made to the related description below.
FIG. 1 is a schematic diagram of an application scenario of a model training system according to some embodiments of the present description. As shown in fig. 1, the system 100 may include an a-party device 110, a B-party device 120, a third party server 130, and a network 140.
The party A holds the feature value of at least one feature of each sample in the sample set and the label value of each sample in the sample set, and the party B holds the feature value of at least one feature of each sample in the sample set. It is understood that the features, feature values, and tag values held by party a belong to party a's privacy and the features and feature values held by party B belong to party B's privacy.
During the two-party model training process, neither party a nor party B wishes to expose their private data to the other. In order to protect data privacy of both parties, input (such as label values of samples) and output (such as gradient vectors, predicted scores of samples, leaf node weights and the like) of a plurality of computing links involved in a training process are all stored in equipment of both parties in a slicing mode, and the party A and the party B respectively execute one slice.
For the decision tree, the tree models trained by the a-party and the B-party may have the same structure, such as the number of nodes, the connection relationship between the nodes, the positions of the nodes, and the like. However, the tree models of party a and party B have different parameters. For convenience of illustration, the description often refers to equivalent models corresponding to a tree model of party a and a tree model of party B, which can be obtained by performing centralized training based on sample data held by party a and party B, respectively. The equivalent model has complete parameters, and the parameters of the tree model of the A side/B side are equivalently split from the parameters of the equivalent model.
Specifically, the method comprises the following steps: the tree model of any party only has the parameters of partial split nodes, namely, the tree model of any party comprises local split nodes with the parameters and non-local split nodes without the parameters; the weight of a leaf node in any one of the tree models is equivalent to a fragment of the weight of a corresponding leaf node in the equivalent model, the basic score of any one of the tree models is equivalent to a fragment of the basic score of the equivalent model, and from the vector perspective, the leaf node weight vector of any one of the tree models is equivalent to a fragment of the leaf node weight vector of the equivalent model.
Referring to FIG. 2, the split nodes are represented by circles and the leaf nodes are represented by rectangles. Party a holds feature X1 and party B holds feature X2, and accordingly, the tree model for party a has local split nodes and parameters p1 corresponding to feature X1 and the tree model for party B has local split nodes and parameters p2 corresponding to feature X2. As shown in fig. 2, the local split node of the a-side can be denoted as (X1, p1), the local split node of the B-side can be denoted as (X2, p2), the leaf nodes can be denoted as leaf, and the leaf nodes in the same position have the same number. Taking a binary tree as an example, the parameters of a split node may include a threshold associated with a feature corresponding to the node, e.g., the parameters of a node corresponding to a feature of age may include a threshold for distinguishing age groups.
As shown in fig. 2: for party A, the weight of leaf1 is
Figure 188695DEST_PATH_IMAGE003
The weight of leaf2 is
Figure 583905DEST_PATH_IMAGE004
The weight of leaf3 is
Figure 398277DEST_PATH_IMAGE005
The basic score of the tree model is< f0>1(ii) a For the B-side, the weight of leaf1 is
Figure 220739DEST_PATH_IMAGE006
The weight of leaf2 is
Figure 425456DEST_PATH_IMAGE007
The weight of leaf3 is w32The basic score of the tree model is< f0>2(ii) a For the equivalent model, the weight of leaf1 is w1The weight of leaf2 is w2The weight of leaf3 is w3The basic score is f0. In some embodiments, w may be satisfied11+ w12= w1、w21+ w22= w2、w31+ w32= w3And< f0>1+< f0>2= f0. From the leaf node weight vector's perspective, satisfy (w)11, w21, w31)+ (w12, w22, w32)= (w1, w2, w3)。
Assume that the predicted leaf node vector of the prediction object is(s)1,s2,s3) Then predicting the predicted score of the object
Figure 624356DEST_PATH_IMAGE008
Wherein
Figure 293235DEST_PATH_IMAGE009
I.e. leaf node weight vector (w)1, w2, w3) And predicting leaf node vectors(s)1,s2,s3) The inner product of (d). Assuming leaf2 is a leaf node, the leaf node vector for the prediction object is (0,1,0), and the prediction score for the prediction object is (0,1,0)
Figure 286598DEST_PATH_IMAGE010
When the tree model and the equivalent model of any one party both comprise T (more than or equal to 2) decision trees, for the equivalent model, each tree corresponds to an inner product of a leaf node weight vector and a predicted leaf node vector, and the inner products respectively corresponding to the T trees and the basic score of the equivalent model are summed to obtain the predicted score of the predicted object.
Devices 110/120 may include various types of computing devices with information transceiving capabilities, such as smart phones, laptop computers, desktop computers, servers, and the like.
In some embodiments, the servers may be independent servers or groups of servers, which may be centralized or distributed. In some embodiments, the server may be regional or remote. In some embodiments, the server may execute on a cloud platform. For example, the cloud platform may include one or any combination of a private cloud, a public cloud, a hybrid cloud, a community cloud, a decentralized cloud, an internal cloud, and the like.
The third party server 130 may assist the a-party device 110 and the B-party device 120 in running a two-party secure multiplication protocol. Multiplication is often involved in the two-party model training process, when one factor of the product belongs to the private data of the A party and the other factor belongs to the private data of the B party, the device of any party cannot directly calculate the product, and the product can interact with the computing device of the other party according to the two-party safe multiplication protocol to obtain one fragment of the product based on the private data calculation of the own party. That is, party a and party B each obtain one slice of the product.
Network 140 connects the various components of the system so that communication can occur between the various components. The network between the various parts in the system may include wired networks and/or wireless networks. For example, network 140 may include a cable network, a wired network, a fiber optic network, a telecommunications network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Public Switched Telephone Network (PSTN), a bluetooth network, a ZigBee network (ZigBee), Near Field Communication (NFC), an intra-device bus, an intra-device line, a cable connection, and the like, or any combination thereof. The network connection between each two parts may be in one of the above-mentioned ways, or in a plurality of ways.
Fig. 3-8 are exemplary flow diagrams of a two-way decision tree training method according to some embodiments described herein.
First, the representation in fig. 3 to 8 will be explained: (1) the symbol { } represents a vector, a matrix or a set, and when the dimension of the vector/set is M or a certain dimension of the matrix is M, it indicates that the M dimensions correspond to M samples in a sample set one by one, for convenience of description, the M samples may be represented by numbers 1 to M, and the dimensions of the vector, the matrix and the set are corresponding to the numbers of the M samples one by one; (2) 3-8 have a central dotted line, the steps indicated in the rounded boxes to the left of the dotted line can be performed independently by party A, the steps indicated in the rounded boxes to the right of the dotted line can be performed independently by party B, and the steps in the rounded boxes passing through the dotted line are performed cooperatively by party A and party B and each obtain a slice of sliced data; (3) the feature is denoted as j and the feature value is denoted as k, and since each group corresponds to one feature value, the group can also be denoted as k.
As shown in FIG. 3, the input held by party A includes the feature data of the M samples, and the feature data of each sample includes NAThe feature value of each feature, party a, also holds the label values of the M samples. Wherein, the storage form of the characteristic data of the M samples held by the party A can be M × NAFeature matrix of dimension (note as
Figure 40928DEST_PATH_IMAGE011
And X _ a contains the feature values of the M samples for the a-side feature), and the storage form of the tag values of the M samples held by the a-side may be a tag vector (denoted as M × 1 dimension)
Figure 466355DEST_PATH_IMAGE012
And y represents the label value of the sample). The input held by the party B comprises the characteristic data of the M samples, and the characteristic data of each sample comprises NBThe feature value of each feature, and the storage form of the feature data of M samples held by the B party may be M × NBA feature matrix of the dimension.
The initialization phase in figure 3 shows the preparation steps that parties a and B do for the splitting phase.
The device on the part A of the initialization phase can calculate the basic score (marked as the equivalent model)
Figure 989740DEST_PATH_IMAGE013
) Initialization is performed. In some embodiments, party a may average the label values of the M samples, and use the average as the base score of the equivalent model. It is worth noting that the base score of the equivalent model can be used as an initial sample prediction value, and the A-side device can calculate a first gradient vector of the root node based on the initial sample prediction value and the sample label value
Figure 419585DEST_PATH_IMAGE014
And a second gradient vector
Figure 661210DEST_PATH_IMAGE015
. The first gradient vector of any node comprises a first gradient g corresponding to the samples belonging to the node, and the second gradient vector of any node comprises a second gradient h corresponding to the samples belonging to the node. It will be appreciated that the number of bits (dimension) of the first/second gradient vector may be M, each bit corresponding to one of the M samples, the gradient vector bit corresponding to a sample not belonging to the respective node being 0. It should be noted that although sample i does not belong to a certain sampleAfter the node, the element corresponding to the sample i in the first/second gradient vector of the node is selected from the first gradient corresponding to the sample i
Figure 874017DEST_PATH_IMAGE016
Second gradient
Figure 517488DEST_PATH_IMAGE017
Becomes 0, but the element corresponding to sample i in the first/second gradient vector of the node is still available for ease of description
Figure 852654DEST_PATH_IMAGE018
To indicate.
For example, the first/second gradient vector of the first split root node includes the first/second gradients corresponding to the M samples, and it is not assumed that M =4, and then the first gradient vector of the root node can be recorded as
Figure 581576DEST_PATH_IMAGE019
The second gradient vector of the root node can be recorded as
Figure 660390DEST_PATH_IMAGE020
. If the root node is split according to a certain eigenvalue of a certain characteristic, such that samples numbered 1 and 4 are divided into the left sub-tree of the root node, and samples numbered 2 and 3 are divided into the right sub-tree of the root node, the first gradient vector and the second gradient vector of the left sub-tree of the root node are respectively the first gradient vector and the second gradient vector
Figure 345318DEST_PATH_IMAGE021
Figure 116965DEST_PATH_IMAGE022
The first gradient vector and the second gradient vector of the right subtree of the root node are respectively
Figure 67604DEST_PATH_IMAGE023
Initialization phase A-side device can be used for marking vectors of nodes
Figure 950109DEST_PATH_IMAGE024
Initialization is performed, and the flag vector of the initialized node is the flag vector of the root node. The flag vector for any node indicates the samples belonging to that node. It will be appreciated that the number of bits (dimension) of the flag vector for a node may be M, each bit corresponding to one of the M samples, typically the flag vector bit for the sample belonging to the respective node is 1 and the flag vector bit for the sample not belonging to the respective node is 0. For example, the flag vector bits of the root node are all 1. If the root node is split according to a certain eigenvalue of a certain characteristic such that samples numbered 1 and 4 are divided into the left subtree of the root node and samples numbered 2 and 3 are divided into the right subtree of the root node, the flag vector of the left subtree of the root node is (1, 0, 0, 1) and the flag vector of the right subtree of the root node is (0,1, 1, 0).
To avoid divulgence, the device of party A in initialization phase can assign the base score of the equivalent model
Figure 240276DEST_PATH_IMAGE025
First gradient vector of root node
Figure 182824DEST_PATH_IMAGE026
Second gradient vector of root node
Figure 620759DEST_PATH_IMAGE027
Root node's token vector
Figure 41376DEST_PATH_IMAGE028
Label vector
Figure 202361DEST_PATH_IMAGE029
Split into two slices allocated to both parties respectively. It should be understood that splitting a vector or matrix includes splitting each element of the vector or matrix, i.e., each element of the vector or matrix is also split into two slices that are assigned to both parties. Likewise, encrypting/decrypting the vector or matrix includes encrypting/decrypting each element of the vector or matrix. As shown in FIG. 3, subscript A corresponds to the score assigned to party AThe subscript B corresponds to the slice allocated to party B, which the A-party device needs to allocate to
Figure 315811DEST_PATH_IMAGE030
Figure 241041DEST_PATH_IMAGE031
Sending to the B-party's equipment and retaining at least the fragments allocated to the own party
Figure 465349DEST_PATH_IMAGE032
Figure 464529DEST_PATH_IMAGE033
. Wherein the content of the first and second substances,
Figure 748880DEST_PATH_IMAGE032
can be used as an initial score for the tree model for party a,
Figure 161407DEST_PATH_IMAGE030
can be used as an initial score of the tree model of party B.
It should be noted that, for a tree model under an XGB (eXtreme Gradient Boosting) framework, the initialized first/second Gradient vector is the first/second Gradient vector of the root node of the first trained tree, and the flag vector bits of the root node of each tree are all 1.
Fig. 4 to 6 respectively show three links involved in (decision tree) node splitting in the training of the two-sided model: calculating the fragmentation of the fragmentation gain, comparing the fragmentation gain and recording the fragmentation information. These three links are described in turn below.
Referring to fig. 4, fig. 4 illustrates a flow of computing a slice of a gradient sum and computing a slice of a splitting gain based on the slice of the gradient sum by taking a B-side feature as an example.
For any feature j of the B party, the equipment of the B party can collect the samples according to the feature value of the feature j
Figure 189406DEST_PATH_IMAGE034
Divided into a plurality of packets. To make it convenient forIn the description, the number of groups obtained by dividing the sample set according to the feature value of any feature is denoted as K, and the number of the K groups is 1 to K. In some embodiments, the grouping may be accomplished by a feature binning algorithm, e.g., an equal frequency binning, equal width binning, or equal bin binning algorithm. In some embodiments, a threshold corresponding to each group may also be determined based on the feature value of the sample in the group (e.g., equal to the mean of the feature values, or equal to the feature value of a certain sample in the group), and the threshold corresponding to the group may be used as a splitting parameter according to which a subsequent node is split.
Each group corresponds to a first gradient sum and a second gradient sum, the first gradient sum corresponding to any group is the sum of elements corresponding to the samples belonging to the group in the first gradient vector, and the second gradient sum corresponding to any group is the sum of elements corresponding to the samples belonging to the group in the second gradient vector.
For ease of understanding, it is now assumed that party B possesses the complete first gradient vector
Figure 370988DEST_PATH_IMAGE035
And a second gradient vector
Figure 747612DEST_PATH_IMAGE036
Then party B may calculate the inner product of the first/second gradient vector and the identification vector corresponding to any group to obtain the first/second gradient sum corresponding to that group. For any packet (denoted as k), the identification vector corresponding to packet k indicates the samples belonging to packet k. It will be appreciated that the number of bits (dimension) of the identification vector may be M, each bit corresponding to one of the M samples. In general, the identification vector bit corresponding to the sample belonging to the corresponding group is 1 and the identification vector bit corresponding to the sample not belonging to the corresponding group is 0, and the apparatus on the basis of this B-party can calculate the first gradient sum and the second gradient sum corresponding to the group k in the manner as shown in fig. 4
Figure 647435DEST_PATH_IMAGE037
Figure 213545DEST_PATH_IMAGE038
Of course, the first gradient vector
Figure 515214DEST_PATH_IMAGE035
And a second gradient vector
Figure 875788DEST_PATH_IMAGE036
Distributed in the form of fragments between the devices of party A and the devices of party B, based on which the identification vector corresponding to the group k
Figure 200590DEST_PATH_IMAGE039
The identifier vector may also be distributed between the device on the a side and the device on the B side in a fragmented manner, that is, the device on the B side may identify the identifier vector corresponding to the packet k
Figure 570391DEST_PATH_IMAGE039
Splitting into slices
Figure 460987DEST_PATH_IMAGE040
And slicing
Figure 258042DEST_PATH_IMAGE041
To divide into pieces
Figure 555293DEST_PATH_IMAGE040
Send to the device of party A and reserve (at least) the slice
Figure 728786DEST_PATH_IMAGE041
. In this manner, the device of party a and the device of party B may interact according to a multi-party secure computing protocol to base the fragmentation of the first/second gradient vectors and the identification vector corresponding to packet k
Figure 739467DEST_PATH_IMAGE039
The slice of (b) calculates a slice of the first/second gradient sum corresponding to the group k.
Taking the first gradient as an example, the device of party a and the device of party B may interact according to a multi-party secure computing protocol,so as to: based on the first gradient vector
Figure 707423DEST_PATH_IMAGE035
And the identification vector corresponding to the slice and the group k
Figure 803555DEST_PATH_IMAGE039
Computing the slice of the first/second gradient sum corresponding to the group k. For example, will
Figure 452842DEST_PATH_IMAGE042
Substitution into
Figure 318030DEST_PATH_IMAGE043
Its development is as follows:
Figure 456887DEST_PATH_IMAGE044
the expansion involves the calculation of two types of product terms: one type can be called as a local product term, and two factors of the local product term belong to the fragment data of the same party (party A/party B), so that the calculation can be independently completed by equipment of one party; another type may be referred to as a cross product term, where two factors of the cross product term belong to the fragmented data of different parties, i.e. one factor belongs to the fragmented data of party a and the other factor belongs to the fragmented data of party B, so to protect the data privacy of both parties, party a and party B may calculate two fragments of the cross product term by running a two-party secure multiplication protocol, and party a and party B each execute one fragment to obtain an output fragment.
In particular, the amount of the solvent to be used,
Figure 40315DEST_PATH_IMAGE045
it can be calculated locally at the a-party,
Figure 476982DEST_PATH_IMAGE046
output fragmentation as party A
Figure 462255DEST_PATH_IMAGE047
A part of (a) of (b),
Figure 772014DEST_PATH_IMAGE048
it can be calculated locally on the B-party,
Figure 842738DEST_PATH_IMAGE049
output fragmentation as party B
Figure 896145DEST_PATH_IMAGE050
A part of, cross product terms
Figure 673608DEST_PATH_IMAGE051
And
Figure 154268DEST_PATH_IMAGE052
can be calculated by two-party secure multiplication protocol, obtained by party A
Figure 446709DEST_PATH_IMAGE053
The segment of the sum is used as the output segment of the A side
Figure 303806DEST_PATH_IMAGE054
A part of, obtained by party B
Figure 998093DEST_PATH_IMAGE055
As output slices of party B
Figure 332210DEST_PATH_IMAGE056
A part of (a).
For any node, the A party and the B party can interact according to a multi-party security computation protocol, so that the fragmentation of the splitting gain corresponding to each group under each characteristic is computed based on the fragmentation of the first gradient sum and the fragmentation of the second gradient sum corresponding to each group under each characteristic, and the A party and the B party respectively execute one fragmentation of the same splitting gain. Wherein, since each group corresponds to one eigenvalue, each splitting gain actually corresponds to one eigenvalue k of one eigenvalue and one eigenvalue j. And the splitting gain corresponding to the characteristic j and the characteristic value k of the characteristic j reflects the descending value of the objective function after the corresponding node is split according to the characteristic j and the characteristic value k of the characteristic j.
It was mentioned before that the splitting gain can be based on the left sub-tree first gradient sum (denoted as
Figure 377527DEST_PATH_IMAGE057
) Left subtree second gradient sum (denoted as
Figure 772736DEST_PATH_IMAGE058
) Right subtree first gradient sum (denoted as
Figure 259212DEST_PATH_IMAGE059
) Right subtree second gradient sum (denoted as
Figure 347254DEST_PATH_IMAGE060
) It is obtained that if the equivalent model is trained on a device on one side (not marked as C), referring to FIG. 4, the device on the C side can be initialized
Figure 614287DEST_PATH_IMAGE061
According to the first/second gradient sum of the left subtree
Figure 813187DEST_PATH_IMAGE062
The first/second gradient sum corresponding to a certain group (marked as k) is accumulated
Figure 216487DEST_PATH_IMAGE063
The first/second gradient sum is respectively obtained from the right subtree
Figure 662380DEST_PATH_IMAGE064
Up-subtracting the first/second gradient sum corresponding to the group k
Figure 416710DEST_PATH_IMAGE065
In the method, the gradient sum of the left subtree corresponding to each group under the feature j (i.e. corresponding to each eigenvalue of the feature j) is obtained in turn
Figure 153722DEST_PATH_IMAGE066
And the right subtree gradient sum
Figure 677107DEST_PATH_IMAGE067
As shown in fig. 9, the sample set is divided into K groups according to the feature value of the feature j, and since the samples are sorted according to the feature value of the feature j before the grouping result is obtained (the sequence of the samples in the groups may be disordered), the K groups present a certain order according to the feature value of the samples in each group, and the increase of the feature value of the samples in the group is not represented by the increment of the group sequence number (1-K) (indicated by a horizontal arrow in fig. 9), the feature value of any sample in the 2 nd group is not less than the feature value of any sample in the 1 st group, the feature value of any sample in the 3 rd group is not less than the feature value of any sample in the 2 nd group. It can be seen that if the grouping is taken as the minimum unit, and the sample set is divided into a left subset corresponding to the left subtree and a right subset corresponding to the right subtree, there are K division cases (indicated by vertical arrows in fig. 9), in other words, there are K splitting possibilities for the node to be split under the feature j, where each splitting possibility corresponds to one feature value (i.e., a feature value corresponding to each grouping).
The device at the side C needs to calculate the K possible splitting gains, that is, the splitting gains corresponding to the K groups under the characteristic j, in order to determine the optimal splitting parameter of the node to be split. With combined reference to FIGS. 4 and 9, taking the first gradient as an example, the C-side device may first sum the first gradient of the left sub-tree
Figure 106951DEST_PATH_IMAGE068
Initialize to 0, and sum the first gradient of the right subtree
Figure 20680DEST_PATH_IMAGE069
Initialise to the first gradient sum G of the node to be split. The first gradient sum is a sum of elements in the first gradient vector corresponding to samples belonging to the respective node. Then, the device on the side C can sequentially obtain the first gradients and the first sums of the left subtrees corresponding to the K groups respectively in an 'increasing-decreasing' mode
Figure 561383DEST_PATH_IMAGE068
And right subtree first gradient sum
Figure 204854DEST_PATH_IMAGE069
: grouping the 1 st gradient
Figure 540021DEST_PATH_IMAGE070
Adding to the left subtree the first gradient sum
Figure 957358DEST_PATH_IMAGE071
Obtaining the gradient sum of the left subtree corresponding to the 1 st grouping
Figure 36172DEST_PATH_IMAGE072
And, summing the first gradients of the right subtree
Figure 534150DEST_PATH_IMAGE073
Subtract the first gradient sum corresponding to the 1 st packet
Figure 40217DEST_PATH_IMAGE070
Obtaining the first gradient sum of the right subtree corresponding to the 1 st grouping
Figure 256435DEST_PATH_IMAGE074
(ii) a Grouping the 2 nd into corresponding first gradient sums
Figure 76623DEST_PATH_IMAGE075
Adding to the left subtree the first gradient sum
Figure 429107DEST_PATH_IMAGE076
Obtaining the gradient sum of the left subtree corresponding to the 1 st grouping
Figure 371656DEST_PATH_IMAGE077
And, summing the first gradients of the right subtree
Figure 809590DEST_PATH_IMAGE078
Subtract the first gradient corresponding to the 2 nd packetAnd
Figure 417158DEST_PATH_IMAGE079
obtaining the first gradient sum of the right subtree corresponding to the 2 nd grouping
Figure 889728DEST_PATH_IMAGE080
(ii) a ...; and the like until the first gradient sum of the left subtree corresponding to the kth group is obtained
Figure 737598DEST_PATH_IMAGE081
And right subtree first gradient sum
Figure 662829DEST_PATH_IMAGE082
. Of course, the above calculation can be performed
Figure 887136DEST_PATH_IMAGE083
Is suitably modified, e.g. the left sub-tree is given a first gradient sum
Figure 151896DEST_PATH_IMAGE084
Initializing to a first gradient sum G of the node to be split, and summing the first gradient sum of the right subtree
Figure 436247DEST_PATH_IMAGE085
The initialization is 0. Accordingly, it is necessary to
Figure 848773DEST_PATH_IMAGE084
Sequentially subtracting the first gradient sums respectively corresponding to the K groups to obtain the first gradient sums respectively corresponding to the left subtrees of the K groups, and
Figure 611193DEST_PATH_IMAGE085
and sequentially adding first gradient sums corresponding to the K groups respectively to obtain first gradient sums corresponding to the K groups respectively in the right subtree. It should be understood that the equation is bilateral
Figure 792776DEST_PATH_IMAGE084
For the left sub-tree corresponding to the adjacent groupA sum of gradients, equality of both sides
Figure 936443DEST_PATH_IMAGE085
The first gradient sum is for the corresponding right subtree of the adjacent group.
Similarly, the C-side device may obtain the left sub-tree second gradient sum and the right sub-tree second gradient sum corresponding to the K groups under the feature j.
Respectively corresponding left subtree first gradient sum based on K groups under characteristic j
Figure 836266DEST_PATH_IMAGE084
Right subtree second gradient sum
Figure 402377DEST_PATH_IMAGE085
Left subtree second gradient sum
Figure 438466DEST_PATH_IMAGE086
Right subtree second gradient sum
Figure 2302DEST_PATH_IMAGE087
The device on the C side may calculate the splitting gains Gain corresponding to the K groups under the feature j, respectively.
It should be noted that some binning algorithms, such as equal frequency binning, may cause samples with equal eigenvalues to appear in adjacent bins, resulting in the left/right subtree gradient sum calculated as described above
Figure 389421DEST_PATH_IMAGE088
There is some error, but such error is negligible in engineering terms.
The above describes how to compute the left/right subtree gradients and the splitting gain by training the equivalent model on one side (not denoted as C) of the device, and on the basis of this, we will continue to describe how to break the above computation process on the two sides of the device.
First, the initialization of the gradient sum of the left and right subtrees is carried out to
Figure 493644DEST_PATH_IMAGE084
Initialized to 0 and will
Figure 649818DEST_PATH_IMAGE085
Initialization to G for example
Figure 446873DEST_PATH_IMAGE084
=0, the a-party device may generate random numbers as initialized
Figure 242660DEST_PATH_IMAGE084
Is divided into
Figure 416152DEST_PATH_IMAGE089
And will be
Figure 426833DEST_PATH_IMAGE090
Sent to the B-party's device, which can calculate the satisfaction
Figure 394789DEST_PATH_IMAGE091
Or, satisfaction can be generated by a third party device
Figure 490921DEST_PATH_IMAGE092
And divide the pieces into pieces
Figure 874629DEST_PATH_IMAGE090
Device to send to party A and to fragment
Figure 5396DEST_PATH_IMAGE093
To the device of party B. For the
Figure 144254DEST_PATH_IMAGE094
The device on the a-side may locally compute a slice of the first gradient vector
Figure 727682DEST_PATH_IMAGE095
Sum of the elements in (1) to obtain a first gradient and a fragment of G
Figure 977397DEST_PATH_IMAGE096
As an initialisation
Figure 651087DEST_PATH_IMAGE097
Is divided into
Figure 960845DEST_PATH_IMAGE098
And, the B-party's device may compute the slice of the first gradient vector locally
Figure 31569DEST_PATH_IMAGE099
Sum of the elements in (1) to obtain a first gradient and a fragment of G
Figure 84976DEST_PATH_IMAGE100
Fragmentation as an initialized GR
Figure 862439DEST_PATH_IMAGE101
. Similarly, the device on party a may compute the slice of the first gradient vector locally
Figure 343099DEST_PATH_IMAGE102
The sum of the elements in the first gradient and the fraction of H
Figure 635540DEST_PATH_IMAGE103
Fragmentation as an initialized HL
Figure 492638DEST_PATH_IMAGE104
And, the B-party's device may compute the slice of the first gradient vector locally
Figure 186924DEST_PATH_IMAGE105
The sum of the elements in the first gradient and the fraction of H
Figure 25436DEST_PATH_IMAGE106
Slicing as an initialized HR
Figure 70752DEST_PATH_IMAGE107
Then, the calculation of the sum of the gradients corresponding to the K groups, for example, the first gradient, party A/party BThe device can press
Figure 465962DEST_PATH_IMAGE108
And (4) performing iterative computation to obtain the fragments of the first gradient sum of the left sub-tree and the fragments of the first gradient sum of the right sub-tree corresponding to the K groups respectively. Similarly, the A/B devices may press
Figure 14755DEST_PATH_IMAGE109
And (4) performing iterative computation to obtain the fragments of the second gradient sum of the left sub-tree and the fragments of the second gradient sum of the right sub-tree corresponding to the K groups respectively.
To obtain
Figure 102796DEST_PATH_IMAGE110
After fragmentation, the a-side/B-side device may calculate a fragmentation of the fragmentation gain. In some embodiments, the sub-goal functions corresponding to the nodes may be as follows:
Figure 307513DEST_PATH_IMAGE111
wherein λ represents a preset coefficient, G represents the sum of the first gradients of all samples at the node, and H represents the sum of the second gradients of all samples at the node. Correspondingly, for the node, after dividing the samples belonging to the node into the left and right subtrees based on the feature value corresponding to a certain group under a certain feature, the calculation formula of the splitting Gain is as follows:
Figure 240834DEST_PATH_IMAGE112
wherein, when difference comparison is carried out on different splitting gains Gain
Figure 909712DEST_PATH_IMAGE113
As a constant, the constant is cancelled, so that the constant in the Gain can be split during actual operation
Figure 168656DEST_PATH_IMAGE114
Are ignored. In addition, theSince the splitting of data in this specification is performed based on addition, the computational expression before splitting is equivalent to the taylor expansion expression in all places where the splitting of data into fragments is involved. E.g. constants in the calculation of the split gain here
Figure 657406DEST_PATH_IMAGE115
On the basis of neglect, the first and second pairs
Figure 342553DEST_PATH_IMAGE116
Taylor expansion is performed to obtain an equivalent calculation of the splitting gain as shown in FIG. 4 as follows:
Figure 865938DEST_PATH_IMAGE117
the subscript k of Gain may be used as an identifier of a corresponding group, or may be used as a feature value corresponding to the group. Inputs in the above-mentioned calculation (e.g. for
Figure 30203DEST_PATH_IMAGE118
) Output (i.e. output)
Figure 271829DEST_PATH_IMAGE119
) Both contain two splits, one for each of party a and party B. After the input slices are substituted into the above calculation formula, the expansion formula involves the calculation of two types of product terms: one type can be called as a local product term, and two factors of the local product term belong to the fragment data of the same party (party A/party B), so that the calculation can be independently completed by equipment of one party; another type may be referred to as a cross product term, where two factors of the cross product term belong to the fragmented data of different parties, i.e. one factor belongs to the fragmented data of party a and the other factor belongs to the fragmented data of party B, so to protect the data privacy of both parties, party a and party B may calculate two fragments of the cross product term by running a two-party secure multiplication protocol, and party a and party B each execute one fragment to obtain an output fragment. For example,
Figure 812531DEST_PATH_IMAGE121
wherein, in the step (A),
Figure 128106DEST_PATH_IMAGE122
for the local product term, Party A can compute locally
Figure 728852DEST_PATH_IMAGE123
The B party can compute locally
Figure 192194DEST_PATH_IMAGE124
For the cross product term, the A side and the B side can calculate the fragment of the cross product term through a two-side safe multiplication protocol.
As mentioned previously, the packet is traversed by traversing the packet under the same feature (the packet is identified by the feature value k in FIG. 4) and traversing
Figure 536588DEST_PATH_IMAGE125
A feature (denoted j in fig. 4) is obtained by each of parties a and B
Figure 34565DEST_PATH_IMAGE125
Slicing of the first gradient sum of the group and
Figure 727584DEST_PATH_IMAGE125
and grouping the second gradient sums, wherein each group of the first/second gradient sums comprises the number of the sections equal to the number K of the groups under the corresponding characteristic. Further, each of the parties A and B will obtain
Figure 943801DEST_PATH_IMAGE125
The set of split gain slices, each set of split gain slices comprising a number of slices equal to the number of groups under the corresponding feature, K. Wherein the number K of packets under different characteristics may be different.
Referring to fig. 5, fig. 5 shows a comparison of the magnitude of the splitting gain corresponding to each packet under each feature. The index j of Gain indicates a characteristic value, and the index k indicates a group (also indicates a characteristic value corresponding to the group). By traversing feature pairs
Figure 826307DEST_PATH_IMAGE126
J2 and packet pairs k1, k2, i.e. pairs
Figure 178791DEST_PATH_IMAGE125
And comparing the splitting gains in the component splitting gains in pairs, and selecting the characteristic corresponding to the maximum splitting gain and the component to split. Where k1 is the eigenvalue of signature j1, k2 is the eigenvalue of signature j2, j1 and j2 may be the same signature, but the groupings under that signature k1, k2 are different groupings. Based on the splitting (or training) target of the decision tree, it can be understood that, when the splitting gain is obtained based on the equivalent calculation formula of the splitting gain shown in fig. 4, the larger the splitting gain, the more suitable the corresponding feature and the splitting threshold (or feature value) are as the splitting condition of the node.
Because both the A side and the B side have only
Figure 793443DEST_PATH_IMAGE127
The party a and the party B can compare the magnitude of the two split gains through a multi-party security comparison protocol. After defining the same feature j1 and group k2 corresponding to the two split gains that are compared, as shown in fig. 5, the device on the a-side can calculate
Figure 231377DEST_PATH_IMAGE128
The B-party's device can calculate
Figure 917574DEST_PATH_IMAGE129
. Further, the device of party A and the device of party B can interact according to a multi-party security comparison protocol so as not to reveal
Figure 124564DEST_PATH_IMAGE130
And
Figure 238014DEST_PATH_IMAGE131
on the premise of determining the specific numerical value of
Figure 851660DEST_PATH_IMAGE130
And
Figure 75968DEST_PATH_IMAGE131
the magnitude relationship of (1). If it is
Figure 403044DEST_PATH_IMAGE130
Greater than (or not less than)
Figure 687395DEST_PATH_IMAGE131
Then the feature j1 is "retained" along with the feature value k 1. Otherwise, feature j2 and the feature value k2 are "retained". And comparing the two sides with the next characteristic and the characteristic value thereof based on the reserved characteristic and the fragment of the characteristic value, and repeating the steps until the comparison of the splitting gains corresponding to all the characteristics and the characteristic values is finished, and finally, the reserved characteristic and the fragment of the characteristic value thereof of the two sides correspond to the optimal splitting condition of the node. Of course, it may also be the A-party's device computing
Figure 834342DEST_PATH_IMAGE132
And device calculation of B party
Figure 800024DEST_PATH_IMAGE133
. Accordingly, if
Figure 981607DEST_PATH_IMAGE130
Greater than (or not less than)
Figure 436859DEST_PATH_IMAGE131
Then preferentially splitting the node to be split according to the characteristic j2 and the characteristic value k 2. Otherwise, the node to be split is split according to the characteristic j1 and the characteristic value k 1.
Referring to fig. 6, for any node (denoted as X), when the splitting parameter (the feature and the feature value corresponding to the maximum splitting gain) of node X is determined, only one of the parties a and B records the splitting parameter, because node X splits according to only one feature (of one party). Assuming that the determined splitting parameters are j1, k1, and feature j1 is a feature for party a, only the device for party a can record the splitting parameters for node X. For example only, as shown in fig. 6, the splitting information split (X, j1, k1) recorded on the a-side indicates that the splitting parameters of the node X are own-side feature j1 and feature value k1, and the splitting information split (X, dummy, dummy) recorded on the B-side indicates that the node X is a non-local splitting node, i.e., the parameters of the node X in the tree model on the B-side are unknown. It is understood that the a-party and the B-party can respectively determine whether each node is a local split node and determine (split) parameters of the local split node according to the recorded split information of each node.
It should be appreciated that with respect to node splitting, parties a and B may have some consensus: the method includes the steps that 1, an A party and a B party can define the structure of a tree model to be trained, such as the number of decision trees, the number of nodes of each tree, the connection relation among the nodes, the positions of the nodes, the depth and the like, correspondingly, under some scenes, the A party and the B party can determine whether current operation aims at the same decision tree or the same node, for example, the A party and the B party can carry out uniform identification (such as numbering) on each decision tree in the tree model, and the A party and the B party can carry out uniform identification (such as numbering) on each node of the same decision tree; the party a and the party B can agree on the identity of the features of both parties and the identity of each group under each feature without disclosure, for example, assuming that the features of the party a include age and height, and the features of the party B include distance and orientation, the age and height can be identified by a1 and a2, and the distance and orientation can be identified by B1 and B2, so that the party B only knows that a1 and a2 are two features of the party a, and the party a also knows that B1 and B2 are two features of the party B, and specifically, the party a and the party B can determine whether the obtained splitting gain corresponds to the same feature and group (feature value) or not and determine which feature the splitting gain corresponds to which one through the feature identity and the group identity.
Still taking the example that the node X is split according to the feature j1 and the feature value k1 of the a-side, the device of the a-side needs to further divide the sample set { i } of the M samples into a left subset corresponding to the left subtree and a right subset corresponding to the right subtree according to the magnitude relationship between the feature value of the feature j1 and the feature value k1 of the M samples, so that the two parties can obtain the shards of the first/second gradient vectors and the shards of the flag vectors of the child nodes (i.e., the left and right subtrees) of the node X. If the child node continues to split, the first/second gradient vector segment and the flag vector segment of the child node may be used for the computation related to the splitting of the child node, and specific details may refer to fig. 4 to 6 and the related description thereof. If the child node does not continue to split, i.e. becomes a leaf node, the a-party and the B-party may cooperatively calculate two segments of the leaf node weight based on the segments of the first/second gradient vectors of the child node (i.e. the leaf node), and each of the a-party and the B-party may execute one segment, and specific details may refer to fig. 8 and its related description.
As shown in FIG. 6, the device of party A may generate a left sub-tree vector for node X
Figure 336682DEST_PATH_IMAGE134
And right subtree vector
Figure 902792DEST_PATH_IMAGE135
Wherein, the left sub-tree vector
Figure 125832DEST_PATH_IMAGE134
Indicating samples in a left subset, right sub-tree vectors, obtained by dividing a sample set according to features and feature values corresponding to a maximum splitting gain
Figure 751986DEST_PATH_IMAGE135
And indicating the samples in the right subset obtained by dividing the sample set according to the characteristic and the characteristic value corresponding to the maximum splitting gain. It will be appreciated that the number of bits (dimension) of the left/right sub-tree vector may be M, each bit corresponding to one sample, typically the left subset vector position 1 and the remaining left subset vector positions all being 0 for the samples belonging to the left subset, and similarly the right subset vector position 1 and the remaining right subset vector positions all being 0 for the samples belonging to the right subset. Since the samples belong to either the left or right subset, satisfy
Figure 873525DEST_PATH_IMAGE136
. The device of party A will be the left sub-tree vector of node X
Figure 243327DEST_PATH_IMAGE137
Splitting into slices
Figure 399502DEST_PATH_IMAGE138
And slicing
Figure 134240DEST_PATH_IMAGE139
To divide into pieces
Figure 743075DEST_PATH_IMAGE139
Sending to B-party's device and retaining (at least) the slice
Figure 650989DEST_PATH_IMAGE138
. Similarly, the right subtree vector of node X
Figure 661670DEST_PATH_IMAGE135
Divided into fragments by party A
Figure 318041DEST_PATH_IMAGE140
And slicing
Figure 414173DEST_PATH_IMAGE141
To divide into pieces
Figure 125777DEST_PATH_IMAGE142
Storing in prescription A, slicing
Figure 256545DEST_PATH_IMAGE143
Stored in prescription B.
Following the description above, taking the left sub-tree as an example, assume that the flag vector for the left sub-tree of node X is computed on the C-party's device
Figure 395402DEST_PATH_IMAGE144
First gradient vector
Figure 650934DEST_PATH_IMAGE145
Second gradient vector
Figure 166229DEST_PATH_IMAGE146
As shown in fig. 6, the device on the C side may: computing a token vector for node X
Figure 151502DEST_PATH_IMAGE147
And left sub-tree vector
Figure 461261DEST_PATH_IMAGE148
Obtaining the sign vector of the left sub-tree of the node X according to the bit multiplication result
Figure 266406DEST_PATH_IMAGE144
(ii) a Computing a first gradient vector for node X
Figure 506763DEST_PATH_IMAGE149
And the flag vector of the left sub-tree of node X
Figure 346543DEST_PATH_IMAGE144
The result of the bit-wise multiplication is used for obtaining a first gradient vector of the left sub-tree of the node X
Figure 827203DEST_PATH_IMAGE145
(ii) a Computing a second gradient vector for node X
Figure 385223DEST_PATH_IMAGE150
And the flag vector of the left sub-tree of node X
Figure 976742DEST_PATH_IMAGE144
The result of the bit-wise multiplication is used for obtaining a second gradient vector of the left sub-tree of the node X
Figure 874291DEST_PATH_IMAGE146
Similarly, the device on the C side can calculate the flag vector of the right subtree of the node X
Figure 525852DEST_PATH_IMAGE151
First gradient vector
Figure 305589DEST_PATH_IMAGE152
And a second gradient vector
Figure 700798DEST_PATH_IMAGE153
Referring to the foregoing related description, taking the left sub-tree as an example, the device of the a-party and the device of the B-party may interact according to a multi-party security computing protocol to: node X based token vector
Figure 515171DEST_PATH_IMAGE147
Shard and left sub-tree vector of
Figure 26049DEST_PATH_IMAGE154
Computing a token vector of the left sub-tree of node X
Figure 293082DEST_PATH_IMAGE144
Slicing; first gradient vector based on node X
Figure 491982DEST_PATH_IMAGE149
Shard of (2) and a flag vector of the left sub-tree of node X
Figure 160861DEST_PATH_IMAGE144
Computing a first gradient vector of the left sub-tree of node X
Figure 154225DEST_PATH_IMAGE145
Slicing; second gradient vector based on node X
Figure 846237DEST_PATH_IMAGE150
Shard of (2) and a flag vector of the left sub-tree of node X
Figure 848828DEST_PATH_IMAGE144
Computing a second gradient vector of the left sub-tree of node X
Figure 372213DEST_PATH_IMAGE146
To be divided into pieces. For example, will
Figure 536478DEST_PATH_IMAGE155
And
Figure 778104DEST_PATH_IMAGE156
substitution into
Figure 505757DEST_PATH_IMAGE157
To obtain
Figure 883649DEST_PATH_IMAGE158
Its development is as follows:
Figure 484395DEST_PATH_IMAGE159
wherein the content of the first and second substances,
Figure 947737DEST_PATH_IMAGE160
can be calculated locally on A side and used as output fragment of A side
Figure 229814DEST_PATH_IMAGE161
A part of (a) of (b),
Figure 462212DEST_PATH_IMAGE162
can be locally calculated at the B party and used as output fragments of the B party
Figure 233859DEST_PATH_IMAGE163
A part of, cross product terms
Figure 184497DEST_PATH_IMAGE164
Can be calculated by two-party secure multiplication protocol, obtained by party A
Figure 749559DEST_PATH_IMAGE165
As output shards of party A
Figure 102043DEST_PATH_IMAGE166
A part of, obtained by party B
Figure 44591DEST_PATH_IMAGE167
And
Figure 482526DEST_PATH_IMAGE168
as output slices of party B
Figure 106405DEST_PATH_IMAGE169
A part of (a).
For a single decision tree, the splitting of each node can be performed in sequence according to the splitting links shown in fig. 4 to 6 until the growth termination condition is satisfied. It should be understood that growth termination means that no child nodes are again split on a single tree, i.e. all leaf nodes on a single tree have been obtained. In some embodiments, the growth termination condition may include a depth of the single tree reaching a preset depth.
Referring to fig. 7, fig. 7 illustrates a method of calculating a slice of a leaf node weight. For the same leaf node obtained by training, the device of the A-party can calculate the first gradient vector of the leaf node
Figure 313395DEST_PATH_IMAGE170
Is divided into
Figure 426845DEST_PATH_IMAGE171
The device on the B side can calculate the first gradient vector of the leaf node
Figure 352076DEST_PATH_IMAGE172
Is divided into
Figure 763334DEST_PATH_IMAGE173
The sum of the elements in (1) to obtain the first gradient of the leaf node and the slice of G
Figure 90410DEST_PATH_IMAGE174
. Similarly, the device of party A may calculate a second gradient vector for the leaf node
Figure 109182DEST_PATH_IMAGE175
Is divided into
Figure 521709DEST_PATH_IMAGE176
The sum of each element in the leaf node, the second gradient of the leaf node and the slicing of H
Figure 549708DEST_PATH_IMAGE177
The B-party device may calculate a second gradient vector for the leaf node
Figure 668973DEST_PATH_IMAGE175
Is divided into
Figure 124225DEST_PATH_IMAGE178
The sum of each element in the leaf node, the second gradient of the leaf node and the slicing of H
Figure 758469DEST_PATH_IMAGE179
. In some embodiments, the leaf node weights are calculated as follows:
Figure 324580DEST_PATH_IMAGE180
where w represents a leaf node weight, G represents a first gradient sum of the leaf node, H represents a second gradient sum of the leaf node, and λ represents a preset coefficient.
It should be noted that, since the splitting of data is based on addition, the fragmentation of the leaf node weight w
Figure 314664DEST_PATH_IMAGE181
And
Figure 940817DEST_PATH_IMAGE182
can be calculated by disassembling the Taylor expansion type
Figure 62357DEST_PATH_IMAGE183
To obtain a mixture of, among others,
Figure 432158DEST_PATH_IMAGE181
as weights for leaf nodes in the tree model for party a,
Figure 588333DEST_PATH_IMAGE182
as the weight of the same leaf node in the tree model of party B. The preceding description of the present application relates to the disassembly of expressions involving multiplication operationsIn the embodiments, a plurality of descriptions are provided, and the descriptions will be omitted herein and later.
Under the XGB framework, the tree model and the equivalent model of any party comprise a plurality of trees. Referring to fig. 8, when each of the a-party and the B-party completes the training of one tree, the slice of the prediction scores of the M samples may be updated to calculate the first gradient vector of the root node of the next tree
Figure 323071DEST_PATH_IMAGE184
And a second gradient vector
Figure 931907DEST_PATH_IMAGE185
To be divided into pieces. In fig. 8, predi, t represents the weight of the leaf node (i.e. the leaf node to which the sample i belongs, also referred to as the predicted leaf node of the sample i) reached by the sample i (i.e. the sample numbered i) along the predicted path in the t-th tree in the equivalent model. For the t-th tree, the tree,
Figure 839820DEST_PATH_IMAGE186
wherein, in the step (A),
Figure 850501DEST_PATH_IMAGE187
mark vector representing leaf node numbered n in t tree in equivalent model
Figure 818457DEST_PATH_IMAGE188
Corresponding to sample i, and Nt represents the number of leaf nodes of the t-th tree. The weights of the predicted leaf nodes of the M samples in the t-th tree of the equivalent model can form a predicted weight vector
Figure 101540DEST_PATH_IMAGE189
The device of the A party and the device of the B party can interact according to a multi-party security computing protocol to base the mark vector of each leaf node of the t tree
Figure 813144DEST_PATH_IMAGE188
The leaf node weight vector of the t tree in the sharding and equivalent model
Figure 678332DEST_PATH_IMAGE190
Is the leaf node weight vector of the t-th tree in the tree model of the A side/B side, the calculation
Figure 817189DEST_PATH_IMAGE191
To be divided into pieces. Further, as shown in FIG. 8, the A-party device may be a personal computer
Figure 338300DEST_PATH_IMAGE191
Is divided into
Figure 853595DEST_PATH_IMAGE192
Adding up to the prediction score of the current sample i
Figure 573290DEST_PATH_IMAGE193
Is divided into
Figure 883048DEST_PATH_IMAGE194
To update the predicted score of sample i
Figure 953772DEST_PATH_IMAGE193
Is divided into
Figure 695595DEST_PATH_IMAGE194
. It will be appreciated that the predicted score for the current sample i
Figure 535375DEST_PATH_IMAGE193
Is the sum of the base score of the equivalent model and the weight of the predicted leaf nodes of sample i on the first t-1 trees of the equivalent model. In particular, the predicted score of sample i
Figure 16034DEST_PATH_IMAGE193
Is the basic score of the equivalent model
Figure 574055DEST_PATH_IMAGE195
I.e. fragmentation of party A
Figure 165573DEST_PATH_IMAGE194
Is the initial score of the tree model of party A
Figure 797543DEST_PATH_IMAGE196
Fragmentation of party B
Figure 714683DEST_PATH_IMAGE197
Is the initial score of the tree model of party B
Figure 494420DEST_PATH_IMAGE198
. The prediction scores of the M samples may constitute a prediction score vector
Figure 889630DEST_PATH_IMAGE193
Anda device for obtaining an updated prediction score vector, respectively
Figure 704002DEST_PATH_IMAGE193
Quick check
Figure 713415DEST_PATH_IMAGE199
B-side equipment can obtain updated predicted score vector
Figure 980448DEST_PATH_IMAGE193
Quick check
Figure 179348DEST_PATH_IMAGE197
}。
In some embodiments, the predicted values of the samples
Figure 848227DEST_PATH_IMAGE200
Where pred represents the prediction score for that sample. Accordingly, since the splitting of data is based on addition, taylor expansion can be adopted
Figure 841591DEST_PATH_IMAGE201
. For two-party training, the device of party A may press
Figure 533603DEST_PATH_IMAGE202
Updating the prediction value of sample iThe B-party device can press
Figure 270615DEST_PATH_IMAGE203
And updating the slice of the predicted value of the sample i.
In some embodiments, the loss function of the sample
Figure 794001DEST_PATH_IMAGE204
From this, a first gradient corresponding to the sample i can be derived
Figure 223845DEST_PATH_IMAGE205
And a second gradient corresponding to sample i
Figure 465470DEST_PATH_IMAGE206
. For two-party training, the device of party A may press
Figure 429009DEST_PATH_IMAGE207
Updating the fragment of the first gradient corresponding to the sample i for training the next tree, thereby obtaining the fragment of the first gradient vector of the root node of the next tree
Figure 72480DEST_PATH_IMAGE208
. Similarly, the B-party's device may press
Figure 407647DEST_PATH_IMAGE209
Updating the slice of the first gradient corresponding to the sample i to obtain a slice of the first gradient vector of the root node of the next tree to be trained
Figure 136568DEST_PATH_IMAGE210
. In view of
Figure 153066DEST_PATH_IMAGE211
The decomposition of (a) involves cross product terms, and the device of party a and the device of party B may interact according to a multi-party secure computing protocol to segment and tag values based on the predicted value of the updated sample i
Figure 651043DEST_PATH_IMAGE212
Computing the fragment of the second gradient corresponding to the sample i used for training the next tree, thereby obtaining the fragment of the second gradient vector of the root node of the next tree
Figure 422690DEST_PATH_IMAGE176
It should be noted that the above description of the flow is for illustration and description only and does not limit the scope of the application of the present specification. Various modifications and alterations to the flow may occur to those skilled in the art, given the benefit of this description. However, such modifications and variations are intended to be within the scope of the present description.
FIG. 10 is a block diagram of a two-way decision tree training system in accordance with some embodiments of the present description. The system 200 may be implemented on a device of a first party, which may be either of the a-party and the B-party, the second party being the other of the a-party and the B-party. As shown in fig. 10, the system 200 may include a first obtaining module 210, a first packet gradient and fragmentation computation module 220, a second packet gradient and fragmentation computation module 230, a split gain fragmentation computation module 240, a split gain comparison module 250, a left and right sub-tree vector fragmentation obtaining module 260, a child node landmark vector fragmentation computation module 270, and a child node gradient vector fragmentation computation module 280.
For any node (denoted as X) that is split, the functions of the modules in system 200 are as follows:
the first obtaining module 210 may be configured to obtain a first slice of the landmark vector of node X, a first slice of the first gradient vector, and a first slice of the second gradient vector. Wherein the flag vector indicates samples belonging to the respective node, the first gradient vector comprises a first gradient corresponding to the samples belonging to the respective node, and the second gradient vector comprises a second gradient corresponding to the samples belonging to the respective node.
For any feature held by the first party, the first packet gradient and fragmentation calculation module 220 may be to: for each of a plurality of first groups obtained by dividing a sample set according to the feature value of the feature, generating an identification vector corresponding to the first group according to the feature value of the feature of each sample, wherein the identification vector indicates the samples belonging to the first group; splitting the identification vector corresponding to the first packet into a first fragment and a second fragment, and sending the second fragment of the identification vector corresponding to the first packet to the equipment of the second party; interacting with equipment of a second party according to a multi-party safety calculation protocol, and calculating a first fragment of a first gradient sum corresponding to a first group based on a first fragment of a first gradient vector of a node X and a first fragment of an identification vector corresponding to the first group, wherein the first gradient sum corresponding to the first group is obtained by inner product of the sum of the first fragment and a second fragment of the first gradient vector of the node X and the sum of the first fragment and the second fragment of the identification vector corresponding to the first group; and interacting with equipment of a second party according to a multi-party security computing protocol to compute a first fragment of a second gradient vector corresponding to the first group based on the first fragment of the second gradient vector of the node X and the first fragment of the identification vector corresponding to the first group, wherein the inner product of the second gradient corresponding to the first group and the sum of the first fragment and the second fragment of the second gradient vector of the node X and the sum of the first fragment and the second fragment of the identification vector corresponding to the first group is obtained.
For any feature held by the second party, the second packet gradient and sharding calculation module 230 may be operable to: for each of a plurality of second packets obtained by dividing the sample set by the characteristic, obtaining, from the device of the second party, a first slice of an identification vector corresponding to the second packet, the identification vector indicating samples belonging to the second packet; interacting with a device of a second party according to a multi-party security computation protocol to compute a first tile of a first gradient vector for the second packet based on a first tile of the first gradient vector for node X and a first tile of an identification vector for the second packet, the first tile of a first gradient sum for the second packet being obtained by an inner product of the first gradient sum of the first tile and the second tile of the first gradient vector for node X and the first tile and the second tile sum of the identification vector for the second packet; interacting with a device of a second party according to a multi-party security computation protocol to compute a first tile of a second gradient vector for the second packet based on the first tile of the second gradient vector for node X and the first tile of the identification vector corresponding to the second packet, the second gradient corresponding to the second packet and an inner product of a sum of the first tile and the second tile of the second gradient vector for node X and a sum of the first tile and the second tile of the identification vector corresponding to the second packet.
The split gain slice calculation module 240 may be configured to interact with a device of the second party according to a multi-party security calculation protocol, so as to calculate a first slice of a split gain corresponding to each packet under each feature based on a first slice of a first gradient sum and a first slice of a second gradient sum corresponding to each packet under each feature.
The splitting gain comparison module 250 may be configured to interact with the device of the second party according to a multi-party security comparison protocol, so as to determine a maximum splitting gain based on a first slice of the splitting gain corresponding to each packet under each feature, and record splitting information of the node X according to the feature corresponding to the maximum splitting gain and the packet.
The left and right subtree vector shards obtaining module 260 may be configured to: when the maximum splitting gain corresponds to a feature of a first party, generating a left sub-tree vector and a right sub-tree vector of a node X, wherein the left sub-tree vector indicates samples in a left subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the right sub-tree vector indicates samples in a right subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the left subset corresponds to the left sub-tree, and the right subset corresponds to the right sub-tree; splitting the left sub-tree vector into a first slice and a second slice, and sending the second slice of the left sub-tree vector to a device of a second party; splitting the right sub-tree vector into a first slice and a second slice, and sending the second slice of the right sub-tree vector to a device of a second party; receiving, from a device of a second party, a first tile of a left sub-tree vector and a first tile of a right sub-tree vector of node X when the maximum splitting gain corresponds to a feature of the second party.
The child node token vector shard calculation module 270 may be configured to: interacting with a device of a second party according to a multi-party secure computation protocol to compute a first fragment of a landmark vector of a left subtree of node X based on the first fragment of the landmark vector of node X and the first fragment of the left subtree vector; interacting with a device of a second party according to a multi-party secure computation protocol to compute a first fragment of a landmark vector of a right subtree of node X based on the first fragment of the landmark vector of node X and the first fragment of the right subtree vector.
The child gradient vector patch computation module 280 may be configured to: interacting with a device of a second party according to a multi-party secure computing protocol to compute a first segment of a first gradient vector of node X based on the first segment of the first gradient vector of node X and a first segment of a landmark vector of a left subtree of node X, interacting with a device of the second party according to the multi-party secure computing protocol to compute a first segment of a second gradient vector of node X based on the first segment of the second gradient vector of node X and the first segment of the landmark vector of the left subtree of node X; interacting with a device of a second party according to a multi-party secure computing protocol to compute a first shard of a first gradient vector of node X based on the first shard of the first gradient vector of node X and a first shard of a landmark vector of a right subtree of node X, interacting with a device of the second party according to the multi-party secure computing protocol to compute a first shard of a second gradient vector of node X based on the first shard of the second gradient vector of node X and the first shard of the landmark vector of the right subtree of node X.
For more details of the system 200 and its modules, reference may be made to fig. 3-8 and their associated description.
It should be understood that the system and its modules shown in FIG. 10 may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).
It should be noted that the above description of the system and its modules is for convenience only and should not limit the present disclosure to the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, in some embodiments, the first packet gradient and fragmentation computation module 220 and the second packet gradient and fragmentation computation module 230 may be different modules in a system or may be a single module that implements the functionality of both modules. Such variations are within the scope of the present disclosure.
The beneficial effects that may be brought by the embodiments of the present description include, but are not limited to: (1) a two-party decision tree training method is provided, and data privacy of two parties can be protected. (2) The model effect can be improved by training with the sample data of the two parties. It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.
Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the embodiments herein. Various modifications, improvements and adaptations to the embodiments described herein may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the embodiments of the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.
Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.
Moreover, those skilled in the art will appreciate that aspects of the embodiments of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of embodiments of the present description may be carried out entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the embodiments of the present specification may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.
The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.
Computer program code required for operation of various portions of the embodiments of the present description may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, VisualBasic, Fortran2003, Perl, COBOL2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or processing device. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).
In addition, unless explicitly stated in the claims, the order of processing elements and sequences, use of numbers and letters, or use of other names in the embodiments of the present specification are not intended to limit the order of the processes and methods in the embodiments of the present specification. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing processing device or mobile device.
Similarly, it should be noted that in the preceding description of embodiments of the specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more embodiments of the invention. This method of disclosure, however, is not intended to imply that more features are required than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.
For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are possible within the scope of the embodiments of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims (10)

1.A two-party decision tree training method, wherein the method is performed by a device of a first party, the first party being any one of a party holding a feature value and a label value of at least one feature of each sample in a sample set and a party holding a feature value of at least one feature of each sample in the sample set, the second party being the other party; the method comprises the following steps:
splitting any node according to the following splitting steps:
obtaining a first slice of a landmark vector, a first slice of a first gradient vector, and a first slice of a second gradient vector of the node; the flag vector indicates samples belonging to the respective node, the first gradient vector includes a first gradient corresponding to the samples belonging to the respective node, and the second gradient vector includes a second gradient corresponding to the samples belonging to the respective node;
for any feature held by the first party:
for each of a plurality of first groupings by dividing the sample set by the feature value of the feature: generating an identification vector corresponding to the first group according to the characteristic value of the characteristic of each sample, wherein the identification vector indicates the samples belonging to the first group; splitting the identification vector corresponding to the first packet into a first fragment and a second fragment, and sending the second fragment of the identification vector corresponding to the first packet to the equipment of the second party; interacting with equipment of a second party according to a multi-party security computing protocol to compute a first fragment of a first gradient vector corresponding to the first group based on the first fragment of the first gradient vector of the node and the first fragment of the identification vector corresponding to the first group, wherein the first gradient corresponding to the first group and the inner product of the sum of the first fragment and the second fragment of the first gradient vector of the node and the sum of the first fragment and the second fragment of the identification vector corresponding to the first group are obtained; interacting with equipment of a second party according to a multi-party security computing protocol to compute a first fragment of a second gradient vector corresponding to the first group based on the first fragment of the second gradient vector of the node and the first fragment of the identification vector corresponding to the first group, wherein the inner product of the second gradient corresponding to the first group and the sum of the first fragment and the second fragment of the second gradient vector of the node and the sum of the first fragment and the second fragment of the identification vector corresponding to the first group is obtained;
for any feature held by the second party:
for each of a plurality of second groupings that divide the sample set by the characteristic: obtaining, from the device of the second party, a first tile of an identification vector corresponding to the second packet, the identification vector indicating samples belonging to the second packet; interacting with a device of a second party according to a multi-party security computation protocol to compute a first tile of a first gradient sum corresponding to the second packet based on the first tile of the first gradient vector of the node and the first tile of the identification vector corresponding to the second packet, the first gradient corresponding to the second packet being obtained from an inner product of the first gradient sum of the first tile and the second tile of the first gradient vector of the node and the first tile and the second tile sum of the identification vector corresponding to the second packet; interacting with a device of a second party according to a multi-party security computation protocol to compute a first tile of a second gradient vector for the second packet based on the first tile of the second gradient vector for the node and the first tile of the identification vector corresponding to the second packet, the second gradient corresponding to the second packet being obtained from an inner product of a sum of the first tile and the second tile of the second gradient vector for the node and a sum of the first tile and the second tile of the identification vector corresponding to the second packet;
interacting with equipment of a second party according to a multi-party safety calculation protocol to calculate first fragments of splitting gains corresponding to the groups under the characteristics respectively on the basis of first fragments of a first gradient sum and first fragments of a second gradient sum corresponding to the groups under the characteristics respectively;
interacting with equipment of a second party according to a multi-party safety comparison protocol, determining the maximum splitting gain based on the first fragments of the splitting gains corresponding to the groups under the characteristics, and recording the splitting information of the nodes according to the characteristics and the groups corresponding to the maximum splitting gain;
when the maximum splitting gain corresponds to a feature of a first party, generating a left sub-tree vector and a right sub-tree vector of the node, the left sub-tree vector indicating samples in a left subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the right sub-tree vector indicating samples in a right subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the left subset corresponding to the left sub-tree, and the right subset corresponding to the right sub-tree; splitting the left sub-tree vector into a first slice and a second slice, and sending the second slice of the left sub-tree vector to a device of a second party; splitting the right sub-tree vector into a first slice and a second slice, and sending the second slice of the right sub-tree vector to a device of a second party; receiving, from a device of a second party, a first tile of a left sub-tree vector and a first tile of a right sub-tree vector of the node when the maximum splitting gain corresponds to a feature of the second party;
interacting with a device of a second party according to a multi-party secure computing protocol to compute a first shard of a landmark vector of the node based on the first shard of the landmark vector of the node and the first shard of the left sub-tree vector; interacting with a device of a second party according to a multi-party secure computation protocol to compute a first shard of a landmark vector of the node based on the first shard of the landmark vector of the node and the first shard of the right subtree vector;
interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a first gradient vector of a left sub-tree of the node based on the first tile of the first gradient vector of the node and a first tile of a landmark vector of the left sub-tree of the node; interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a second gradient vector of a left sub-tree of the node based on the first tile of the second gradient vector of the node and the first tile of the landmark vector of the left sub-tree of the node;
interacting with a device of a second party according to a multi-party secure computing protocol to compute a first tile of a first gradient vector of a right sub-tree of the node based on the first tile of the first gradient vector of the node and a first tile of a landmark vector of the right sub-tree of the node; interacting with a device of a second party in accordance with a multi-party secure computing protocol to compute a first tile of a second gradient vector of a right sub-tree of the node based on the first tile of the second gradient vector of the node and the first tile of the landmark vector of the right sub-tree of the node.
2. The method of claim 1, wherein interacting with the device of the second party according to the multi-party secure computation protocol to compute the first fraction of the splitting gain corresponding to each packet under each feature based on the first fraction of the first gradient sum and the first fraction of the second gradient sum corresponding to each packet under each feature comprises:
for either feature:
interacting with equipment of a second party according to a multi-party security calculation protocol so as to calculate first fragments of first gradient sums of left subtrees and first fragments of first gradient sums of right subtrees, which respectively correspond to the groups under the characteristics, based on first fragments of first gradient sums respectively corresponding to the groups under the characteristics; wherein the left sub-tree first gradient sum is equal to the sum of the elements corresponding to the samples belonging to the left subset in the first and second slices of the first gradient vector, and the right sub-tree first gradient sum is equal to the sum of the elements corresponding to the samples belonging to the right subset in the first and second slices of the first gradient vector;
interacting with equipment of a second party according to a multi-party security calculation protocol so as to calculate first fragments of second gradient sums of left sub-trees and first fragments of second gradient sums of right sub-trees respectively corresponding to the groups under the characteristics based on first fragments of second gradient sums respectively corresponding to the groups under the characteristics; wherein the left sub-tree second gradient sum is equal to the sum of the elements corresponding to the samples belonging to the left subset in the first and second slices of the second gradient vector, and the right sub-tree second gradient sum is equal to the sum of the elements corresponding to the samples belonging to the right subset in the first and second slices of the second gradient vector;
for any grouping of any feature:
according to
Figure 689556DEST_PATH_IMAGE001
A first slice of the splitting gain corresponding to the packet under the feature is calculated, wherein,
Figure 606697DEST_PATH_IMAGE002
representing the corresponding splitting gain, G, of the packet under the characteristicLThe sum of the first and second patches representing the first gradient sum of the left subtree corresponding to the group under the feature, GRThe sum of the first and second slices, H, representing the first gradient sum of the right subtree corresponding to the group under the featureLThe sum of the first and second slices, H, representing the sum of the second gradients of the left subtree corresponding to the group under the featureRAnd the sum of the first fragment and the second fragment of the second gradient sum of the right subtree corresponding to the group under the characteristic is represented, and lambda represents a preset coefficient.
3. The method of claim 1, wherein the determining a maximum splitting gain based on the first slice of the splitting gain corresponding to each packet under each characteristic comprises:
traversing the grouping k1 corresponding to the feature pair j1, j2, and j1, and the grouping k2 corresponding to the grouping k1, j2 based on:
calculating a first difference value of the first slice of the splitting gain corresponding to j1 and k1 and the first slice of the splitting gain corresponding to j2 and k 2;
interacting with the device of the second party according to a multi-party safety comparison protocol to determine the magnitude relation between the first difference value and the second difference value of the second split gain slice corresponding to j2 and k2 and the second split gain slice corresponding to j1 and k1 of the device of the second party;
a maximum splitting gain is determined based on the determined sets of magnitude relationships.
4. The method of claim 1, wherein when the maximum splitting gain corresponds to a characteristic of a first party, when a sample in the sample set belongs to the left subset, the left sub-tree vector bit corresponding to the sample is 1, and the right sub-tree vector bit corresponding to the sample is 0; otherwise, the left sub-tree vector bit corresponding to the sample is 0, and the right sub-tree vector bit corresponding to the sample is 1;
the interacting with a device of a second party in accordance with a multi-party secure computing protocol to compute a first shard of a landmark vector of the node based on the first shard of the landmark vector of the node and the first shard of the left sub-tree vector, comprising:
interacting with equipment of a second party according to a multi-party safety calculation protocol, and calculating the sum of a first fragment and a second fragment of the mark vector of the node and the first fragment of a bitwise multiplication result of the sum of the first fragment and the second fragment of the left subtree vector as the first fragment of the mark vector of the left subtree of the node;
the interacting with a device of a second party in accordance with a multi-party secure computing protocol to compute a first shard of a landmark vector of the right subtree of the node based on the first shard of the landmark vector of the node and the first shard of the right subtree vector, comprising:
and interacting with equipment of a second party according to a multi-party safety calculation protocol, and calculating the sum of a first fragment and a second fragment of the mark vector of the node and the first fragment of a bitwise multiplication result of the sum of the first fragment and the second fragment of the right subtree vector as the first fragment of the mark vector of the right subtree of the node.
5. The method of claim 4, wherein the interacting with the device of the second party in accordance with the multi-party secure computing protocol to compute the first tile of the first gradient vector of the left sub-tree of the node based on the first tile of the first gradient vector of the node and the first tile of the landmark vector of the left sub-tree of the node comprises:
interacting with equipment of a second party according to a multi-party security calculation protocol, and calculating a first fragment of a bitwise multiplication result of a sum of a first fragment and a second fragment of a first gradient vector of the node and a sum of the first fragment and the second fragment of a flag vector of a left subtree of the node as a first fragment of the first gradient vector of the left subtree of the node;
the interacting with a device of a second party in accordance with a multi-party secure computing protocol to compute a first tile of a second gradient vector of a left sub-tree of the node based on a first tile of the second gradient vector of the node and a first tile of a landmark vector of the left sub-tree of the node, comprising:
interacting with equipment of a second party according to a multi-party security calculation protocol, and calculating a first fragment of a bitwise multiplication result of the sum of a first fragment and a second fragment of a second gradient vector of the node and the sum of the first fragment and the second fragment of a flag vector of a left subtree of the node as a first fragment of a second gradient vector of the left subtree of the node;
the interacting with a device of a second party in accordance with a multi-party secure computing protocol to compute a first tile of a first gradient vector of a right sub-tree of the node based on a first tile of the first gradient vector of the node and a first tile of a landmark vector of the right sub-tree of the node, comprising:
interacting with equipment of a second party according to a multi-party security calculation protocol, and calculating a first fragment of a bitwise multiplication result of the sum of a first fragment and a second fragment of a first gradient vector of the node and the sum of the first fragment and the second fragment of a flag vector of a right subtree of the node as a first fragment of the first gradient vector of the right subtree of the node;
the interacting with a device of a second party in accordance with a multi-party secure computing protocol to compute a first tile of a second gradient vector of a right sub-tree of the node based on the first tile of the second gradient vector of the node and a first tile of a landmark vector of the right sub-tree of the node, comprising:
and interacting with equipment of a second party according to a multi-party security calculation protocol, and calculating a first fragment of a bitwise multiplication result of the sum of the first fragment and the second fragment of the second gradient vector of the node and the sum of the first fragment and the second fragment of the sign vector of the right subtree of the node as a first fragment of the second gradient vector of the right subtree of the node.
6. The method of claim 1, wherein for a single decision tree, splitting of each node is performed sequentially by the splitting step until a growth termination condition is satisfied.
7. The method of claim 6, further comprising:
calculating a first fragment of a weight of each leaf node in a single decision tree in an equivalent model as a weight of each leaf node in a corresponding decision tree in a tree model of a first party, wherein the equivalent model has the same structure as the tree model of the first party and the tree model of a second party:
interacting with a device of a second party according to a multi-party secure computing protocol to base a first slice of a first gradient vector and a first slice of a second gradient vector of the leaf node on
Figure 979909DEST_PATH_IMAGE004
Calculating a first fragment of the weight of the leaf node in the equivalent model; wherein the content of the first and second substances,
Figure 375118DEST_PATH_IMAGE005
the sum of the first tile and the second tile representing the weight of the leaf node in the equivalent model,
Figure 189491DEST_PATH_IMAGE006
the sum of the elements in the first tile and the second tile of the first gradient vector representing the leaf node,Hthe sum of the elements in the first tile and the second tile of the second gradient vector representing the leaf node,
Figure 11953DEST_PATH_IMAGE007
representing a preset coefficient.
8. The method of claim 7, further comprising:
for the tth decision tree:
interacting with equipment of a second party according to a multi-party safety calculation protocol, and calculating first fragments of weights of leaf nodes on a tth decision tree in an equivalent model to which each sample belongs on the basis of the weights of the leaf nodes on the tth decision tree in the tree model of the first party and the first fragments of the flag vectors;
for any sample, the weight of leaf node in the t decision tree in the equivalent model to which the sample belongsA slice is accumulated to a first slice of the sample's predicted score to update the first slice of the sample's predicted score; when t =1, the first slice of the prediction score of the sample before updating is equal to the base score of the tree model of the first party, when
Figure 278987DEST_PATH_IMAGE008
When the first fragment of the prediction score of the sample before updating is equal to the sum of the first fragment of the weight of the leaf node of the sample on the first t-1 decision trees in the equivalent model and the basic score of the tree model of the first party;
calculating a first segment of a first gradient vector of a root node of the t +1 decision tree based on the updated first segment of the prediction score of each sample and the updated first segment of the label value of each sample;
interacting with a device of a second party according to a multi-party security computing protocol to compute a first segment of a second gradient vector of a root node of the t +1 th decision tree based on the updated first segment of the predicted score of each sample and the updated first segment of the label value of each sample.
9. A two-way decision tree training system, wherein the system is implemented on a device of a first party, the first party being any one of a party holding a feature value and a label value of at least one feature of each sample in a sample set and a party holding a feature value of at least one feature of each sample in the sample set, the second party being the other of the two parties; the system comprises a first obtaining module, a first grouping gradient and fragment computing module, a second grouping gradient and fragment computing module, a splitting gain comparison module, a left sub-tree vector fragment obtaining module, a right sub-tree vector fragment obtaining module, a sub-node sign vector fragment computing module and a sub-node gradient vector fragment computing module; for any node that is split:
the first obtaining module is configured to obtain a first slice of a flag vector of the node, a first slice of a first gradient vector, and a first slice of a second gradient vector, where the flag vector indicates samples belonging to the corresponding node, the first gradient vector includes a first gradient corresponding to the samples belonging to the corresponding node, and the second gradient vector includes a second gradient corresponding to the samples belonging to the corresponding node;
for any feature held by the first party, the first packet gradient and fragmentation computation module is to: for each of a plurality of first groups obtained by dividing a sample set according to the feature value of the feature, generating an identification vector corresponding to the first group according to the feature value of the feature of each sample, wherein the identification vector indicates the samples belonging to the first group; splitting the identification vector corresponding to the first packet into a first fragment and a second fragment, and sending the second fragment of the identification vector corresponding to the first packet to the equipment of the second party; interacting with equipment of a second party according to a multi-party security computing protocol to compute a first fragment of a first gradient vector corresponding to the first group based on the first fragment of the first gradient vector of the node and the first fragment of the identification vector corresponding to the first group, wherein the first gradient corresponding to the first group and the inner product of the sum of the first fragment and the second fragment of the first gradient vector of the node and the sum of the first fragment and the second fragment of the identification vector corresponding to the first group are obtained; interacting with equipment of a second party according to a multi-party security computing protocol to compute a first fragment of a second gradient vector corresponding to the first group based on the first fragment of the second gradient vector of the node and the first fragment of the identification vector corresponding to the first group, wherein the inner product of the second gradient corresponding to the first group and the sum of the first fragment and the second fragment of the second gradient vector of the node and the sum of the first fragment and the second fragment of the identification vector corresponding to the first group is obtained;
for any feature held by the second party, the second packet gradient and shard calculation module is to: for each of a plurality of second packets obtained by dividing the sample set by the characteristic, obtaining, from the device of the second party, a first slice of an identification vector corresponding to the second packet, the identification vector indicating samples belonging to the second packet; interacting with a device of a second party according to a multi-party security computation protocol to compute a first tile of a first gradient sum corresponding to the second packet based on the first tile of the first gradient vector of the node and the first tile of the identification vector corresponding to the second packet, the first gradient corresponding to the second packet being obtained from an inner product of the first gradient sum of the first tile and the second tile of the first gradient vector of the node and the first tile and the second tile sum of the identification vector corresponding to the second packet; interacting with a device of a second party according to a multi-party security computation protocol to compute a first tile of a second gradient vector for the second packet based on the first tile of the second gradient vector for the node and the first tile of the identification vector corresponding to the second packet, the second gradient corresponding to the second packet being obtained from an inner product of a sum of the first tile and the second tile of the second gradient vector for the node and a sum of the first tile and the second tile of the identification vector corresponding to the second packet;
the split gain slice calculation module is used for interacting with equipment of a second party according to a multi-party security calculation protocol so as to calculate first slices of split gains corresponding to groups under each characteristic respectively based on first slices of a first gradient sum and first slices of a second gradient sum corresponding to the groups under each characteristic respectively;
the splitting gain comparison module is used for interacting with equipment of a second party according to a multi-party safety comparison protocol, determining the maximum splitting gain based on the first fragmentation of the splitting gain corresponding to each group under each characteristic, and recording the splitting information of the node according to the characteristic and the group corresponding to the maximum splitting gain;
the left and right subtree vector sharding obtaining module is used for: when the maximum splitting gain corresponds to a feature of a first party, generating a left sub-tree vector and a right sub-tree vector of the node, the left sub-tree vector indicating samples in a left subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the right sub-tree vector indicating samples in a right subset obtained by dividing the sample set according to the feature corresponding to the maximum splitting gain and groups, the left subset corresponding to the left sub-tree, and the right subset corresponding to the right sub-tree; splitting the left sub-tree vector into a first slice and a second slice, and sending the second slice of the left sub-tree vector to a device of a second party; splitting the right sub-tree vector into a first slice and a second slice, and sending the second slice of the right sub-tree vector to a device of a second party; receiving, from a device of a second party, a first tile of a left sub-tree vector and a first tile of a right sub-tree vector of the node when the maximum splitting gain corresponds to a feature of the second party;
the child node marker vector fragment calculation module is configured to: interacting with a device of a second party according to a multi-party secure computing protocol to compute a first shard of a landmark vector of the node based on the first shard of the landmark vector of the node and the first shard of the left sub-tree vector; interacting with a device of a second party according to a multi-party secure computation protocol to compute a first shard of a landmark vector of the node based on the first shard of the landmark vector of the node and the first shard of the right subtree vector;
the child node gradient vector segment calculation module is configured to: interacting with a device of a second party according to a multi-party secure computing protocol to compute a first shard of a first gradient vector of the node based on the first shard of the first gradient vector of the node and a first shard of a landmark vector of the left subtree of the node, interacting with a device of the second party according to the multi-party secure computing protocol to compute a first shard of a second gradient vector of the left subtree of the node based on the first shard of the second gradient vector of the node and the first shard of the landmark vector of the left subtree of the node; interacting with a device of a second party according to a multi-party secure computing protocol to compute a first shard of a first gradient vector of the node based on the first shard of the first gradient vector of the node and a first shard of a landmark vector of a right subtree of the node, interacting with a device of the second party according to the multi-party secure computing protocol to compute a first shard of a second gradient vector of the right subtree of the node based on the first shard of the second gradient vector of the node and the first shard of the landmark vector of the right subtree of the node.
10. A two-way decision tree training apparatus comprising a processor and a storage device for storing instructions which, when executed by the processor, implement the method of any one of claims 1 to 8.
CN202010723916.5A 2020-07-24 2020-07-24 Two-party decision tree training method and system Active CN111738360B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010723916.5A CN111738360B (en) 2020-07-24 2020-07-24 Two-party decision tree training method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010723916.5A CN111738360B (en) 2020-07-24 2020-07-24 Two-party decision tree training method and system

Publications (2)

Publication Number Publication Date
CN111738360A CN111738360A (en) 2020-10-02
CN111738360B true CN111738360B (en) 2020-11-27

Family

ID=72657589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010723916.5A Active CN111738360B (en) 2020-07-24 2020-07-24 Two-party decision tree training method and system

Country Status (1)

Country Link
CN (1) CN111738360B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700031B (en) * 2020-12-12 2023-03-31 同济大学 XGboost prediction model training method for protecting multi-party data privacy
CN112380404B (en) * 2020-12-14 2021-05-11 支付宝(杭州)信息技术有限公司 Data filtering method, device and system
CN114282688B (en) * 2022-03-02 2022-06-03 支付宝(杭州)信息技术有限公司 Two-party decision tree training method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728317A (en) * 2019-09-30 2020-01-24 腾讯科技(深圳)有限公司 Training method and system of decision tree model, storage medium and prediction method
CN111383101A (en) * 2020-03-25 2020-07-07 深圳前海微众银行股份有限公司 Post-loan risk monitoring method, device, equipment and computer-readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9449282B2 (en) * 2010-07-01 2016-09-20 Match.Com, L.L.C. System for determining and optimizing for relevance in match-making systems
CN106250461A (en) * 2016-07-28 2016-12-21 北京北信源软件股份有限公司 A kind of algorithm utilizing gradient lifting decision tree to carry out data mining based on Spark framework
CN106227721B (en) * 2016-08-08 2019-02-01 中国科学院自动化研究所 Chinese Prosodic Hierarchy forecasting system
CN111160573B (en) * 2020-04-01 2020-06-30 支付宝(杭州)信息技术有限公司 Method and device for protecting business prediction model of data privacy joint training by two parties

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728317A (en) * 2019-09-30 2020-01-24 腾讯科技(深圳)有限公司 Training method and system of decision tree model, storage medium and prediction method
CN111383101A (en) * 2020-03-25 2020-07-07 深圳前海微众银行股份有限公司 Post-loan risk monitoring method, device, equipment and computer-readable storage medium

Also Published As

Publication number Publication date
CN111738360A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN111738359B (en) Two-party decision tree training method and system
CN111738360B (en) Two-party decision tree training method and system
US11836583B2 (en) Method, apparatus and system for secure vertical federated learning
Larsen et al. Yes, there is an oblivious RAM lower bound!
Both et al. Decoding linear codes with high error rate and its impact for LPN security
Blanton et al. Secure and efficient outsourcing of sequence comparisons
CN110969264B (en) Model training method, distributed prediction method and system thereof
Chen et al. Secure social recommendation based on secret sharing
JP6973632B2 (en) Secret summation system, secret calculator, secret summation method, and program
CN110210233B (en) Combined construction method and device of prediction model, storage medium and computer equipment
CN111639368A (en) Incremental learning distributed computing method, system and node based on block chain
CN111475854A (en) Collaborative computing method and system for protecting data privacy of two parties
CN112990276A (en) Federal learning method, device, equipment and storage medium based on self-organizing cluster
CN113098691B (en) Digital signature method, signature information verification method, related device and electronic equipment
WO2023174018A1 (en) Vertical federated learning methods, apparatuses, system and device, and storage medium
CN110969243A (en) Method and device for training countermeasure generation network for preventing privacy leakage
CN112860800A (en) Trusted network application method and device based on block chain and federal learning
CN114282688B (en) Two-party decision tree training method and system
CN112801301A (en) Asynchronous calculation method, device, equipment, storage medium and program product
KR20140133032A (en) A system and control method for a meta-heuristic algorithm utilizing similarity for performance enhancement
CN112529102A (en) Feature expansion method, device, medium, and computer program product
CN111784078B (en) Distributed prediction method and system for decision tree
CN110175283B (en) Recommendation model generation method and device
CN109818944B (en) Cloud data outsourcing and integrity verification method and device supporting preprocessing
CN113537333B (en) Method for training optimization tree model and longitudinal federal learning system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant